Question on Lucene search

2009-01-18 Thread fell
Hi all, I am new to Lucene and I need to know the following: In case I have indexed some data using Lucene and it contains the fields: Location, City, Country. Suppose the data is as follows in the index in each of the above fields: 1) R G Heights 2) London 3) United Kindom If i try to

Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller
One more just for a check with much fewer unique terms (20k). Didn't catch that I didnt clamp down enough on the uniques last one. Back up to 21 segments this time, same wildcard search, 7718 hits, and the new method is still approx 20% faster than the old. The last run was 16 segments though w

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664984#action_12664984 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 8:20 PM:

Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller
Man, I'm not paying attention. I switched the analyzer but didnt take it off UNANALYZED. Here are the correct results. Its actually only like 20-30% faster for the index I used. So a lot of that could be the gains that we were seeing in general anyway. Perhaps a bit more too though. Still a tot

Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller
Oh yeah, thats a constantscore wildcard query. Mark Miller wrote: Just checked it out, and its not a bad win on multi term queries. Its not the same exponential gain as field cache loading, but I bet lots of 2-3x type stuff. You appear to save a decent amount by not applying every term to each

Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller
Just checked it out, and its not a bad win on multi term queries. Its not the same exponential gain as field cache loading, but I bet lots of 2-3x type stuff. You appear to save a decent amount by not applying every term to each segment because of the logarithmic sizing. My query of: new Wildc

Re: Filesystem based bitset

2009-01-18 Thread Paul Elschot
On Friday 09 January 2009 22:30:14 Marvin Humphrey wrote: > On Fri, Jan 09, 2009 at 08:11:31PM +0100, Karl Wettin wrote: > > > SSD is pretty close to RAM when it comes to seeking. Wouldn't that > > mean that a bitset stored on an SSD would be more or less as fast as a > > bitset in RAM? > >

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665008#action_12665008 ] Mark Miller commented on LUCENE-1483: - Okay, I think I have it. I tried to count the t

[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664984#action_12664984 ] markrmil...@gmail.com edited comment on LUCENE-1483 at 1/18/09 2:19 PM:

[jira] Resolved: (LUCENE-1124) short circuit FuzzyQuery.rewrite when input token length is small compared to minSimilarity

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-1124. - Resolution: Fixed Thanks! Committed. > short circuit FuzzyQuery.rewrite when input token length

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664988#action_12664988 ] Michael McCandless commented on LUCENE-1483: {quote} In fact this probably cau

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664984#action_12664984 ] Mark Miller commented on LUCENE-1483: - My previous results had a few oddities going wi

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Yonik Seeley (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664983#action_12664983 ] Yonik Seeley commented on LUCENE-1483: -- bq. think the massive slowness of iterating t

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664979#action_12664979 ] Michael McCandless commented on LUCENE-1483: bq. Even still, you are seeing l

[jira] Updated: (LUCENE-1314) IndexReader.clone

2009-01-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1314: --- Attachment: LUCENE-1314.patch New patch attached. All tests pass. Changes: * S

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664965#action_12664965 ] Mark Miller commented on LUCENE-1483: - I think its pretty costly even for non id type

[jira] Commented: (LUCENE-1483) Change IndexSearcher multisegment searches to search each individual segment using a single HitCollector

2009-01-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664948#action_12664948 ] Michael McCandless commented on LUCENE-1483: bq. As we call next on MultiTerm