Re: strange problem of PForDelta decoder

2011-01-03 Thread Li Li
I agree with you that we should not tie concurrency w/in a single search to index segments. That solution is just a hack. will lucene 4 support multithreads search for a single query? I haven't found any patch about this. 2011/1/4 Michael McCandless : > Here's the paper: > >    http://citeseerx.is

Re: Geospatial search in Lucene/Solr

2011-01-03 Thread Lance Norskog
Great! I would suggest a new /modules for gis. It is worthwhile to have a /modules/gis/geonames for large-scale tests/demos/benchmarks, with ant scripts to download datasets and run the tests. About demos: there is a lot of GEO code out there: libraries (http://www.openmap.org/), data (geonames,

[jira] Commented: (SOLR-2116) TikaEntityProcessor does not find parser by default

2011-01-03 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977072#action_12977072 ] Chris A. Mattmann commented on SOLR-2116: - Hey Lance, bq. Speaking of Tika, have yo

[jira] Commented: (SOLR-2116) TikaEntityProcessor does not find parser by default

2011-01-03 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977067#action_12977067 ] Lance Norskog commented on SOLR-2116: - Great! I'll try it out on 3.x and trunk. Speakin

[jira] Commented: (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-01-03 Thread Bill Bell (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977065#action_12977065 ] Bill Bell commented on SOLR-2305: - The best link: http://wiki.apache.org/solr/HowToContribut

[jira] Issue Comment Edited: (SOLR-2116) TikaEntityProcessor does not find parser by default

2011-01-03 Thread Martijn van Groningen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976986#action_12976986 ] Martijn van Groningen edited comment on SOLR-2116 at 1/3/11 5:23 PM: -

[jira] Updated: (SOLR-2116) TikaEntityProcessor does not find parser by default

2011-01-03 Thread Martijn van Groningen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2116: Attachment: SOLR-2116.patch I've encountered the same issue on my Solr setup. After

Re: Lucene-Solr-tests-only-trunk - Build # 3350 - Still Failing

2011-01-03 Thread Simon Willnauer
On Mon, Jan 3, 2011 at 11:01 PM, Apache Hudson Server wrote: > Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3350/ > > All tests passed > > Build Log (for compile errors): > [...truncated 6586 lines...] > > clover.setup: > > clover.info: >     [echo] >     [echo]       C

Lucene-Solr-tests-only-trunk - Build # 3350 - Still Failing

2011-01-03 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3350/ All tests passed Build Log (for compile errors): [...truncated 6586 lines...] clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: com

Lucene-Solr-tests-only-trunk - Build # 3349 - Failure

2011-01-03 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/3349/ 1 tests failed. REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreConta

[jira] Updated: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2846: Fix Version/s: 4.0 > omitTF is viral, but omitNorms is anti-viral. > -

[jira] Commented: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976947#action_12976947 ] Michael McCandless commented on LUCENE-2846: +1 for omitNorms to be viral. >

Re: FYI: Javadoc update needed re: omitTf

2011-01-03 Thread Simon Willnauer
On Mon, Jan 3, 2011 at 8:26 PM, Yonik Seeley wrote: > On Mon, Jan 3, 2011 at 2:03 PM, Simon Willnauer > wrote: >> While we are on it, would it make sense to move omitTfAP into the >> Index enum. It always felt odd that you can omit norms using the enum >> but use a setter to omit TF & Pos. > > I

[jira] Commented: (LUCENE-2840) Multi-Threading in IndexSearcher (after removal of MultiSearcher and ParallelMultiSearcher)

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976928#action_12976928 ] Michael McCandless commented on LUCENE-2840: bq. Using fewer threads per-searc

Re: strange problem of PForDelta decoder

2011-01-03 Thread Michael McCandless
Here's the paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.8091 I haven't read it yet... In general I don't like tying concurrency w/in a single search to index segments; I'd rather they be (relatively?) independent. EG an optimized index would then force single thread qu

[jira] Commented: (SOLR-1782) stats.facet assumes FieldCache.StringIndex - fails horribly on multivalued fields

2011-01-03 Thread Johannes Goll (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976902#action_12976902 ] Johannes Goll commented on SOLR-1782: - Wojtek and Hoss Man - are you planning to release

[jira] Created: (LUCENE-2846) omitTF is viral, but omitNorms is anti-viral.

2011-01-03 Thread Robert Muir (JIRA)
omitTF is viral, but omitNorms is anti-viral. - Key: LUCENE-2846 URL: https://issues.apache.org/jira/browse/LUCENE-2846 Project: Lucene - Java Issue Type: Improvement Reporter: Robert M

Re: FYI: Javadoc update needed re: omitTf

2011-01-03 Thread Yonik Seeley
On Mon, Jan 3, 2011 at 2:03 PM, Simon Willnauer wrote: > While we are on it, would it make sense to move omitTfAP into the > Index enum. It always felt odd that you can omit norms using the enum > but use a setter to omit TF & Pos. I think the attempted move to type safety / enums is what added t

Re: Geospatial search in Lucene/Solr

2011-01-03 Thread Grant Ingersoll
On Dec 28, 2010, at 1:02 PM, Robert Muir wrote: > On Tue, Dec 28, 2010 at 11:59 AM, Smiley, David W. wrote: >> Thanks for letting me know about this Rob. I think geonames is much simpler >> (and much less data) to work with than wikipedia. It's plain tab-delimited >> and I like that it inclu

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976893#action_12976893 ] Michael McCandless commented on LUCENE-2843: bq. Just curious, how would the '

Fwd: [Solr Wiki] Update of "FrontPage" by DavidSmiley

2011-01-03 Thread Grant Ingersoll
Kind of a nit-pick, but I don't think this needs to be limited to just geographical search. We actually have clients who use the spatial filtering in non-lat/lon uses (and it was designed with such in mind, hence the support for n-dimensional distance calculations). Perhaps we should leave it

[jira] Commented: (LUCENE-2845) move contrib/benchmark to modules/benchmark

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976886#action_12976886 ] Michael McCandless commented on LUCENE-2845: +1 > move contrib/benchmark to m

Re: FYI: Javadoc update needed re: omitTf

2011-01-03 Thread Robert Muir
On Mon, Jan 3, 2011 at 1:49 PM, Mark Miller wrote: > > Perhaps should say, *may* silently fail? SpanTermQuery will explicitly throw > an exception. Does PhraseQuery still silently fail these days? not in trunk, its loud too. -

Re: FYI: Javadoc update needed re: omitTf

2011-01-03 Thread Simon Willnauer
On Mon, Jan 3, 2011 at 7:49 PM, Mark Miller wrote: >  /** Expert: >  * >  * If set, omit term freq, positions and payloads from >  * postings for this field. >  * >  * NOTE: While this option reduces storage space >  * required in the index, it also means any query >  * requiring positional inform

FYI: Javadoc update needed re: omitTf

2011-01-03 Thread Mark Miller
/** Expert: * * If set, omit term freq, positions and payloads from * postings for this field. * * NOTE: While this option reduces storage space * required in the index, it also means any query * requiring positional information, such as {...@link * PhraseQuery} or {...@link SpanQ

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

2011-01-03 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976862#action_12976862 ] Mark Miller commented on SOLR-2129: --- bq. I have no problem committing this to contrib so

Re: Testing our code for SolrCloud

2011-01-03 Thread Soheb Mahmood
Hello Mark! Apologies for the late reply! > Do you mind creating a JIRA issue and attaching a patch? That is usually the > best way to go about these discussions. We have done so here: https://issues.apache.org/jira/browse/SOLR-2287. Unfortunately, our test cases are incomplete at the moment, bu

[jira] Updated: (LUCENE-2845) move contrib/benchmark to modules/benchmark

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2845: Attachment: LUCENE-2845.patch patch, apply after doing 'svn move lucene/contrib/benchmark modules'

[jira] Created: (LUCENE-2845) move contrib/benchmark to modules/benchmark

2011-01-03 Thread Robert Muir (JIRA)
move contrib/benchmark to modules/benchmark --- Key: LUCENE-2845 URL: https://issues.apache.org/jira/browse/LUCENE-2845 Project: Lucene - Java Issue Type: Task Components: Build R

[jira] Commented: (LUCENE-2844) benchmark geospatial performance based on geonames.org

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976823#action_12976823 ] Robert Muir commented on LUCENE-2844: - David, I'll first create an issue to propose mo

[jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes

2011-01-03 Thread David Smiley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976820#action_12976820 ] David Smiley commented on SOLR-2155: For evaluating the performance of geospatial search

Re: Geospatial search in Lucene/Solr

2011-01-03 Thread David Smiley (@MITRE.org)
As a follow-up to this thread, I've contributed my geospatial benchmark performance code here: https://issues.apache.org/jira/browse/LUCENE-2844 "benchmark geospatial performance based on geonames.org" - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this me

[jira] Updated: (LUCENE-2844) benchmark geospatial performance based on geonames.org

2011-01-03 Thread David Smiley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-2844: - Attachment: benchmark-geo.patch > benchmark geospatial performance based on geonames.org > -

[jira] Created: (LUCENE-2844) benchmark geospatial performance based on geonames.org

2011-01-03 Thread David Smiley (JIRA)
benchmark geospatial performance based on geonames.org -- Key: LUCENE-2844 URL: https://issues.apache.org/jira/browse/LUCENE-2844 Project: Lucene - Java Issue Type: New Feature Co

[jira] Updated: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2129: -- Attachment: SOLR-2129.patch patch synced to trunk. i also adjusted some minor things: doesn't rely on C

[jira] Commented: (SOLR-2027) SolrJ FacetField should never return null from getValues()

2011-01-03 Thread Sascha Szott (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976792#action_12976792 ] Sascha Szott commented on SOLR-2027: At least a notice should be added to the Javadoc me

[jira] Issue Comment Edited: (SOLR-2155) Geospatial search using geohash prefixes

2011-01-03 Thread David Smiley (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976608#action_12976608 ] David Smiley edited comment on SOLR-2155 at 1/3/11 10:21 AM: - Hi

Re: [jira] Commented: (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets

2011-01-03 Thread Yonik Seeley
On Thu, Nov 11, 2010 at 3:22 PM, Jan Høydahl / Cominvent wrote: > The problem with large "start" is probably worse when sharding is involved. > Anyone know how the shard component goes about fetching start=100&rows=10 > from say 10 shards? Does it have to merge sorted lists of 1mill+10 docsi

[jira] Commented: (LUCENE-1812) Static index pruning by in-document term frequency (Carmel pruning)

2011-01-03 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976752#action_12976752 ] Andrzej Bialecki commented on LUCENE-1812: --- Doron, feel free to work on this -

[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976729#action_12976729 ] Robert Muir commented on LUCENE-2836: - OK, I'll work on getting it into contrib. I t

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-03 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976724#action_12976724 ] Robert Muir commented on LUCENE-2843: - I like this idea, it would be interesting to se

[jira] Commented: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976709#action_12976709 ] Michael McCandless commented on LUCENE-2843: As a first test, I just made a po

[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976707#action_12976707 ] Michael McCandless commented on LUCENE-2836: This is a great speedup for the h

[jira] Updated: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-03 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2843: --- Attachment: LUCENE-2843.patch Attached patch. Still some nocommits but I think it's

[jira] Commented: (LUCENE-2101) Default Stopwords should use specific Version in CharArraySet construtor

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976696#action_12976696 ] Simon Willnauer commented on LUCENE-2101: - I think we can simple move that to v4.

[jira] Created: (LUCENE-2843) Add variable-gap terms index impl.

2011-01-03 Thread Michael McCandless (JIRA)
Add variable-gap terms index impl. -- Key: LUCENE-2843 URL: https://issues.apache.org/jira/browse/LUCENE-2843 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Mi

[jira] Resolved: (LUCENE-1747) Contrib/Spatial needs code cleanup before release

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-1747. - Resolution: Won't Fix I think this is outdated and spatial with rather go away than bein

[jira] Assigned: (SOLR-2026) Need infrastructure support in Solr for requests that perform multiple sequential queries

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned SOLR-2026: - Assignee: (was: Simon Willnauer) moving out > Need infrastructure support in Solr for

[jira] Resolved: (SOLR-2031) QueryComponent's default query parser should be configurable from solrconfig.xml

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved SOLR-2031. --- Resolution: Not A Problem after all this doesn't seem to be really needed > QueryComponent's

[jira] Commented: (SOLR-1942) Ability to select codec per field

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976691#action_12976691 ] Simon Willnauer commented on SOLR-1942: --- bq. updated to trunk - if somebody has time a

[jira] Assigned: (SOLR-1942) Ability to select codec per field

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned SOLR-1942: - Assignee: (was: Simon Willnauer) > Ability to select codec per field > ---

[jira] Resolved: (LUCENE-2612) Add fetch-javacc task to common-build.xml

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2612. - Resolution: Not A Problem doesn't seem to be worth it... > Add fetch-javacc task to com

[jira] Resolved: (LUCENE-2214) Remove deprecated StemExclusionSet setters in contrib/analyzers

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2214. - Resolution: Invalid see LUCENE-2781 > Remove deprecated StemExclusionSet setters in con

[jira] Commented: (LUCENE-2214) Remove deprecated StemExclusionSet setters in contrib/analyzers

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976688#action_12976688 ] Simon Willnauer commented on LUCENE-2214: - This seems to be invalid since LUCENE-2

[jira] Resolved: (LUCENE-2808) Intermitted failure on DocValues branch

2011-01-03 Thread Simon Willnauer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-2808. - Resolution: Fixed this has never happened again since I merged with trunk after LUCENE-2

[jira] Commented: (SOLR-2305) DataImportScheduler - Marko Bonaci

2011-01-03 Thread Marko Bonaci (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976665#action_12976665 ] Marko Bonaci commented on SOLR-2305: I'd like to help, but you'll have to explain me how

[jira] Updated: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2011-01-03 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2657: Attachment: LUCENE-2657.patch All tests pass again with this patch. Solr test resource structrual