[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654751#action_12654751 ] Michael McCandless commented on LUCENE-1483: Mark did you intend to attach the patch here? > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Michael McCandless wrote: I think it does make sense (it's well defined). This is what the SubsearcherTopDocs.convertTopDoc method is doing (in the multisearcher.take2.patch on LUCENE-1471). In fact, returning by document order is a particularly trivial sort, since you'd just have to concatenate the results coming out of the pqueues (ie you wouldn't need a 2nd pqueue). In fact, any SortField[] that contains a SortField.FIELD_DOC could be truncated since that sort order is "total". But these are minor optimizations which we shouldn't worry about for now... Mike Yeah, right again. Just trying to get out of what wasn't working and seemed like it should without work from me. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1483: Attachment: LUCENE-1483.patch I had meant to attach a patch, but then a bunch of stuff wasn't working... This is still a poor mans patch. I need to switch to using the expose subreaders patch. This also doesnt include the multisearcher sort patch yet, because when I tried the first one (2nd rev) everything broke. I'll work on integrating that later. I think all tests pass except for the very last sort test. Some cleanup needed, including the possible drop of using MultiSearcher itself. Basically, its still in a proof of concept stage. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Mark Miller wrote: Michael McCandless wrote: Mark Miller wrote: Mark Miller wrote: Which new sort stuff are you referring to? Is it LUCENE-1471? Yes. First thing I did was try and patch this in, but the sort tests failed. It would be the right order, but like the two center docs would be reversed or something. No time to dig in, so I just switch to the trunk MultiSearcher and all tests passed except for the two with the above issues. Spoke too soon. Wasnt LUCENE-1471's fault, it was just hitting different aspects of an issue thats messed up with the old MultiSearcher as well. OK. If you're building on LUCENE-1471, make sure you start from the first patch. It'd be good to factor that logic (2nd pqueue for merging) out so it can be reused b/w IndexSearcher & MultiSearcher. I actually worked with the second. I'll take a look at the first instead. I'm sticking with using the MultiSearcher for the first patch - it can be worked out later if it speed things up. OK. And, the first now has a 2nd iteration (factors ParallelMultiSearcher to do the merge sort too). Does returning by document id order even make sense with this though? Did it make sense with MultiSearcher? They are pseudo ids (mapped), so it almost seems I can't support that right...it would depend on the order of the readers. I think it does make sense (it's well defined). This is what the SubsearcherTopDocs.convertTopDoc method is doing (in the multisearcher.take2.patch on LUCENE-1471). In fact, returning by document order is a particularly trivial sort, since you'd just have to concatenate the results coming out of the pqueues (ie you wouldn't need a 2nd pqueue). In fact, any SortField[] that contains a SortField.FIELD_DOC could be truncated since that sort order is "total". But these are minor optimizations which we shouldn't worry about for now... Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Mark Miller wrote: Mark Miller wrote: Mark Miller wrote: Which new sort stuff are you referring to? Is it LUCENE-1471? Yes. First thing I did was try and patch this in, but the sort tests failed. It would be the right order, but like the two center docs would be reversed or something. No time to dig in, so I just switch to the trunk MultiSearcher and all tests passed except for the two with the above issues. Got the auto detection working though. Bah, I didn't. Brought up an old bug I've seen before - if you use multisearcher and an index doesn't have the field, AUTO won't work. Advice I always got was don't use AUTO, but even Lucene uses it internally. Thought I had a workarount, but didn't quite work. Not sure what to do about this one - I'll have to mull it and the ids issue over a bit I suppose. Hmm... I think we have to keep the AUTO -> true type resolution that MultiReader would do? Ie, ask MultiReader for the TermEnum, not the first sub-reader, for resolving. In fact we should factor out an explicit method to do this; it's currently in ExtendedFieldCache.autoCache.createValue. As long as you do that resolving up front w/ the MultiReader, and pass only resolved SortField[] to each sub-reader, that should fix it? Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1482: --- Attachment: (was: LUCENE-1482.patch) > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, > slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1471) Faster MultiSearcher.search merge docs
[ https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654755#action_12654755 ] Michael McCandless commented on LUCENE-1471: Luke, it looks like the 2nd patch lost the necessary mods to FieldDocSortedHitQueue -- can you post a new patch that includes it? Thanks. > Faster MultiSearcher.search merge docs > --- > > Key: LUCENE-1471 > URL: https://issues.apache.org/jira/browse/LUCENE-1471 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1471.patch, multisearcher.patch, > multisearcher.take2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > MultiSearcher.search places sorted search results from individual searchers > into a PriorityQueue. This can be made to be more optimal by taking > advantage of the fact that the results returned are already sorted. > The proposed solution places the sub-searcher results iterator into a custom > PriorityQueue that produces the sorted ScoreDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1471) Faster MultiSearcher.search merge docs
[ https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654791#action_12654791 ] Mark Miller commented on LUCENE-1471: - Re: thread, Something makes me think a method more like the IndexWriter merge stuff would be better - a max of 3 or n threads used type of thing. One thread per sub searcher worries me. > Faster MultiSearcher.search merge docs > --- > > Key: LUCENE-1471 > URL: https://issues.apache.org/jira/browse/LUCENE-1471 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1471.patch, multisearcher.patch, > multisearcher.take2.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > MultiSearcher.search places sorted search results from individual searchers > into a PriorityQueue. This can be made to be more optimal by taking > advantage of the fact that the results returned are already sorted. > The proposed solution places the sub-searcher results iterator into a custom > PriorityQueue that produces the sorted ScoreDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1482: --- Attachment: (was: LUCENE-1482.patch) > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1482: --- Attachment: LUCENE-1482.patch Forgot to clean up some code in tests which made use of JDK logging. > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, > slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654798#action_12654798 ] Mark Miller commented on LUCENE-1483: - Quick micro bench - did it twice and both times came out 17% slower. Hopefully get a lot of that back with the new MultiSearcher sort stuff and maybe some optimizations. {code} OLD [java] > Report Sum By (any) Name (12 about 42 out of 43) [java] Operation round 2 mrg runCnt recsPerRun rec/s elapsedSecavgUsedMemavgTotalMem [java] Rounds 0 1 501 2020012 11,532.6 175.1649,688,960200,736,768 [java] Run_4 - - - - - - - 0 1 50 - - 1 - - 2020012 - 11,532.6 - - 175.16 - 49,688,960 - 200,736,768 [java] Populate- - -4 50 21,446.3 93.2681,397,296156,942,336 [java] CreateIndex - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 0.23 - 16,492,194 - 112,984,064 [java] MAddDocs_50 - - -4 50 28,656.2 69.7983,686,552153,223,168 [java] Optimize - - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 23.22 - 101,362,928 - 156,942,336 [java] CloseIndex - - -40 0.00.0081,397,296156,942,336 [java] TestSortSpeed - - - - - - - - - - - - 4 - - - 5003 - - 246.0 - - 81.35 - 98,312,320 - 157,941,760 [java] OpenReader - - -41 266.70.0181,397,296156,942,336 [java] LoadFieldCacheAndSearch - - - - - - - - 4 - - - - 1 - - - 6.2 - - 0.64 - 90,550,496 - 156,942,336 [java] SearchWithSort_5000 - - -4 5000 247.9 80.69 101,017,720157,941,760 [java] CloseReader - - - - - - - - - - - - 4 - - - - 1 - - 4,000.0 - - 0.00 - 95,036,504 - 157,941,760 [java] [java] ### D O N E !!! ### [java] NEW [java] > Report Sum By (any) Name (12 about 42 out of 43) [java] Operation round 2 mrg runCnt recsPerRun rec/s elapsedSecavgUsedMemavgTotalMem [java] Rounds 0 1 501 2020012 10,445.5 193.38 125,468,912208,535,552 [java] Run_4 - - - - - - - 0 1 50 - - 1 - - 2020012 - 10,445.5 - - 193.38 - 125,468,912 - 208,535,552 [java] Populate- - -4 50 20,650.1 96.8584,097,072162,316,288 [java] CreateIndex - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 0.12 - 16,564,602 - 116,604,928 [java] MAddDocs_50 - - -4 50 28,772.4 69.5187,705,952159,956,992 [java] Optimize - - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 27.20 - 99,096,816 - 162,316,288 [java] CloseIndex - - -40 0.00.0084,097,072162,316,288 [java] TestSortSpeed - - - - - - - - - - - - 4 - - - 5003 - - 208.5 - - 95.99 - 98,749,480 - 164,020,224 [java] OpenReader - - -41 222.20.0284,097,072162,316,288 [java] LoadFieldCacheAndSearch - - - - - - - - 4 - - - - 1 - - - 5.0 - - 0.81 - 90,882,496 - 163,725,312 [java] SearchWithSort_5000 - - -4 5000 210.2 95.1795,207,336164,020,224 [java] CloseReader - - - - - - - - - - - - 4 - - - - 1 - - 4,000.0 - - 0.00 - 93,868,880 - 163,905,536 [java] [java] ### D O N E !!! ### [java] {/code} > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Michael McCandless wrote: Mark Miller wrote: Mark Miller wrote: Mark Miller wrote: Which new sort stuff are you referring to? Is it LUCENE-1471? Yes. First thing I did was try and patch this in, but the sort tests failed. It would be the right order, but like the two center docs would be reversed or something. No time to dig in, so I just switch to the trunk MultiSearcher and all tests passed except for the two with the above issues. Got the auto detection working though. Bah, I didn't. Brought up an old bug I've seen before - if you use multisearcher and an index doesn't have the field, AUTO won't work. Advice I always got was don't use AUTO, but even Lucene uses it internally. Thought I had a workarount, but didn't quite work. Not sure what to do about this one - I'll have to mull it and the ids issue over a bit I suppose. Hmm... I think we have to keep the AUTO -> true type resolution that MultiReader would do? Ie, ask MultiReader for the TermEnum, not the first sub-reader, for resolving. In fact we should factor out an explicit method to do this; it's currently in ExtendedFieldCache.autoCache.createValue. As long as you do that resolving up front w/ the MultiReader, and pass only resolved SortField[] to each sub-reader, that should fix it? Mike Your right. I get caught up in the mode of trying to hack it to work quick before I do it right. - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654798#action_12654798 ] [EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 5:41 AM: -- Quick micro bench - did it twice and both times came out 17% slower. Hopefully get a lot of that back with the new MultiSearcher sort stuff and maybe some optimizations. {panel:title=OLD} {noformat} [java] > Report Sum By (any) Name (12 about 42 out of 43) [java] Operation round 2 mrg runCnt recsPerRun rec/s elapsedSecavgUsedMemavgTotalMem [java] Rounds 0 1 501 2020012 11,532.6 175.1649,688,960200,736,768 [java] Run_4 - - - - - - - 0 1 50 - - 1 - - 2020012 - 11,532.6 - - 175.16 - 49,688,960 - 200,736,768 [java] Populate- - -4 50 21,446.3 93.2681,397,296156,942,336 [java] CreateIndex - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 0.23 - 16,492,194 - 112,984,064 [java] MAddDocs_50 - - -4 50 28,656.2 69.7983,686,552153,223,168 [java] Optimize - - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 23.22 - 101,362,928 - 156,942,336 [java] CloseIndex - - -40 0.00.0081,397,296156,942,336 [java] TestSortSpeed - - - - - - - - - - - - 4 - - - 5003 - - 246.0 - - 81.35 - 98,312,320 - 157,941,760 [java] OpenReader - - -41 266.70.0181,397,296156,942,336 [java] LoadFieldCacheAndSearch - - - - - - - - 4 - - - - 1 - - - 6.2 - - 0.64 - 90,550,496 - 156,942,336 [java] SearchWithSort_5000 - - -4 5000 247.9 80.69 101,017,720157,941,760 [java] CloseReader - - - - - - - - - - - - 4 - - - - 1 - - 4,000.0 - - 0.00 - 95,036,504 - 157,941,760 [java] [java] ### D O N E !!! ### [java] {noformat} {panel} {panel:title=NEW} {noformat} [java] > Report Sum By (any) Name (12 about 42 out of 43) [java] Operation round 2 mrg runCnt recsPerRun rec/s elapsedSecavgUsedMemavgTotalMem [java] Rounds 0 1 501 2020012 10,445.5 193.38 125,468,912208,535,552 [java] Run_4 - - - - - - - 0 1 50 - - 1 - - 2020012 - 10,445.5 - - 193.38 - 125,468,912 - 208,535,552 [java] Populate- - -4 50 20,650.1 96.8584,097,072162,316,288 [java] CreateIndex - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 0.12 - 16,564,602 - 116,604,928 [java] MAddDocs_50 - - -4 50 28,772.4 69.5187,705,952159,956,992 [java] Optimize - - - - - - - - - - - - - 4 - - - - 0 - - - 0.0 - - 27.20 - 99,096,816 - 162,316,288 [java] CloseIndex - - -40 0.00.0084,097,072162,316,288 [java] TestSortSpeed - - - - - - - - - - - - 4 - - - 5003 - - 208.5 - - 95.99 - 98,749,480 - 164,020,224 [java] OpenReader - - -41 222.20.0284,097,072162,316,288 [java] LoadFieldCacheAndSearch - - - - - - - - 4 - - - - 1 - - - 5.0 - - 0.81 - 90,882,496 - 163,725,312 [java] SearchWithSort_5000 - - -4 5000 210.2 95.1795,207,336164,020,224 [java] CloseReader - - - - - - - - - - - - 4 - - - - 1 - - 4,000.0 - - 0.00 - 93,868,880 - 163,905,536 [java] [java] ### D O N E !!! ### [java] {noformat} {panel} was (Author: [EMAIL PROTECTED]): Quick micro bench - did it twice and both times came out 17% slower. Hopefully get a lot of that back with the new MultiSearcher sort stuff and maybe some optimizations. {code} OLD [java] > Report Sum By (any) Name (12 about 42 out of 43) [java] Operation round 2 mrg runCnt recsPerRun rec/s elapsedSecavgUsedMemavgTotalMem [java] Rounds 0 1 501 2020012 11,532.6 175.1649,688,960200,736,768 [java] Run_4 - - - - - - - 0 1 50 - - 1 - - 2020012 - 11,532.6 - - 175.16 - 49,688,960 - 200,736,768 [java] Populate
[jira] Updated: (LUCENE-1471) Faster MultiSearcher.search merge docs
[ https://issues.apache.org/jira/browse/LUCENE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Nezda updated LUCENE-1471: --- Attachment: multisearcher.take3.patch Doh. Sorry Michael, I reverted my local changes and tested this patch :). I agree Mark, unbounded number of Threads little worrisome. > Faster MultiSearcher.search merge docs > --- > > Key: LUCENE-1471 > URL: https://issues.apache.org/jira/browse/LUCENE-1471 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1471.patch, multisearcher.patch, > multisearcher.take2.patch, multisearcher.take3.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > MultiSearcher.search places sorted search results from individual searchers > into a PriorityQueue. This can be made to be more optimal by taking > advantage of the fact that the results returned are already sorted. > The proposed solution places the sub-searcher results iterator into a custom > PriorityQueue that produces the sorted ScoreDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1482: --- Attachment: slf4j-nop-1.5.6.jar slf4j-api-1.5.6.jar LUCENE-1482.patch Thanks Doug, I've replaced the JDK14 jar with the NOP jar and deleted the logging test I added (since NOP does not log anything). > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, > slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654805#action_12654805 ] Marvin Humphrey commented on LUCENE-1483: - > Quick micro bench - did it twice and both times came out 17% slower. I'd guess that all the OO construction/destruction costs in this part of your patch are slowing things down. {code} +Searchable[] searchers = new Searchable[readers.length]; +for(int i = 0; i < readers.length; i++) { + searchers[i] = new IndexSearcher(readers[i]); +} + +MultiSearcher multiSearcher = new MultiSearcher(searchers); +return multiSearcher.search(weight, filter, nDocs, sort); {code} > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
Great, because that's prob the main optimation spot we have. I also made things a bit difficult with the 50 merge factory. I'll try a 10 later. - Mark On Dec 9, 2008, at 9:20 AM, "Marvin Humphrey (JIRA)" <[EMAIL PROTECTED]> wrote: [ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654805#action_12654805 ] Marvin Humphrey commented on LUCENE-1483: - Quick micro bench - did it twice and both times came out 17% slower. I'd guess that all the OO construction/destruction costs in this part of your patch are slowing things down. {code} +Searchable[] searchers = new Searchable[readers.length]; +for(int i = 0; i < readers.length; i++) { + searchers[i] = new IndexSearcher(readers[i]); +} + +MultiSearcher multiSearcher = new MultiSearcher(searchers); +return multiSearcher.search(weight, filter, nDocs, sort); {code} Change IndexSearcher to use MultiSearcher semantics for sorted searches --- Key: LUCENE-1483 URL: https://issues.apache.org/jira/browse/LUCENE-1483 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.9 Reporter: Mark Miller Priority: Minor Attachments: LUCENE-1483.patch Here is a quick test patch. FieldCache for sorting is done at the individual IndexReader level and reloading the fieldcache on reopen can be much faster as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
On Tue, Dec 9, 2008 at 9:23 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > Great, because that's prob the main optimation spot we have. I also made > things a bit difficult with the 50 merge factory. I'll try a 10 later. It's useful to report the number of segments in the index too. Even with high merge factors, you can get lucky and have very few segments. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812 ] Mark Miller commented on LUCENE-1483: - I'll be sure to include that info with the next set of results. I don't think those results represent getting lucky though: its 4 rounds and 2 runs with the same results (17% both runs). Nothing scientific, just did it real quick to get a base feel of the slowdown before the patch is finished up. Here is the alg I used: {noformat} merge.factor=mrg:50 compound=false sort.rng=2:1:2:1 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=10 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { "Rounds" { "Run" ResetSystemErase { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 50 -Optimize -CloseIndex } { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 5000 CloseReader } NewRound } : 4 } RepSumByName {noformat} > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812 ] [EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 6:55 AM: -- I'll be sure to include that info with the next set of results. I don't think those results represent getting lucky though: its 4 rounds and 2 runs with the same results (17% both runs). Nothing scientific, just did it real quick to get a base feel of the slowdown before the patch is finished up. Here is the alg I used: {noformat} merge.factor=mrg:50 compound=false sort.rng=2:1:2:1 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=10 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { "Rounds" { "Run" ResetSystemErase { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 50 -CloseIndex } { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 5000 CloseReader } NewRound } : 4 } RepSumByName {noformat} was (Author: [EMAIL PROTECTED]): I'll be sure to include that info with the next set of results. I don't think those results represent getting lucky though: its 4 rounds and 2 runs with the same results (17% both runs). Nothing scientific, just did it real quick to get a base feel of the slowdown before the patch is finished up. Here is the alg I used: {noformat} merge.factor=mrg:50 compound=false sort.rng=2:1:2:1 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=10 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { "Rounds" { "Run" ResetSystemErase { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 50 -Optimize -CloseIndex } { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 5000 CloseReader } NewRound } : 4 } RepSumByName {noformat} > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654812#action_12654812 ] [EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 7:00 AM: -- I'll be sure to include that info with the next set of results. I don't think those results represent getting lucky though: its 4 rounds and 2 runs with the same results (17% both runs). Nothing scientific, just did it real quick to get a base feel of the slowdown before the patch is finished up. *EDIT* Just like I forgot to take the optimize out of the sort alg when I pasted it here, looks like I missed it for the benches as well. Disregard those numbers. Here is the alg I used: {noformat} merge.factor=mrg:50 compound=false sort.rng=2:1:2:1 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=10 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { "Rounds" { "Run" ResetSystemErase { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 50 -CloseIndex } { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 5000 CloseReader } NewRound } : 4 } RepSumByName {noformat} was (Author: [EMAIL PROTECTED]): I'll be sure to include that info with the next set of results. I don't think those results represent getting lucky though: its 4 rounds and 2 runs with the same results (17% both runs). Nothing scientific, just did it real quick to get a base feel of the slowdown before the patch is finished up. Here is the alg I used: {noformat} merge.factor=mrg:50 compound=false sort.rng=2:1:2:1 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory #directory=RamDirectory doc.stored=true doc.tokenized=true doc.term.vector=false doc.add.log.step=10 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.SortableSimpleDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # - { "Rounds" { "Run" ResetSystemErase { "Populate" -CreateIndex { "MAddDocs" AddDoc(100) > : 50 -CloseIndex } { "TestSortSpeed" OpenReader { "LoadFieldCacheAndSearch" SearchWithSort(sort_field:int) > : 1 { "SearchWithSort" SearchWithSort(sort_field) > : 5000 CloseReader } NewRound } : 4 } RepSumByName {noformat} > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1483: Comment: was deleted > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654820#action_12654820 ] Michael McCandless commented on LUCENE-831: --- Marvin, does KS/Lucy have something like FieldCache? If so, what API do you use? Is it iterator-only? > Complete overhaul of FieldCache API/Implementation > -- > > Key: LUCENE-831 > URL: https://issues.apache.org/jira/browse/LUCENE-831 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Hoss Man > Fix For: 3.0 > > Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, > fieldcache-overhaul.diff, fieldcache-overhaul.diff, > LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, > LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, > LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch > > > Motivation: > 1) Complete overhaul the API/implementation of "FieldCache" type things... > a) eliminate global static map keyed on IndexReader (thus > eliminating synch block between completley independent IndexReaders) > b) allow more customization of cache management (ie: use > expiration/replacement strategies, disk backed caches, etc) > c) allow people to define custom cache data logic (ie: custom > parsers, complex datatypes, etc... anything tied to a reader) > d) allow people to inspect what's in a cache (list of CacheKeys) for > an IndexReader so a new IndexReader can be likewise warmed. > e) Lend support for smarter cache management if/when > IndexReader.reopen is added (merging of cached data from subReaders). > 2) Provide backwards compatibility to support existing FieldCache API with > the new implementation, so there is no redundent caching as client code > migrades to new API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654822#action_12654822 ] Mark Miller commented on LUCENE-1483: - Okay, I straightened things out, and now it looks like possibly no loss (for few segments anyway). Last I looked at the index, only 6 segments. I've got to put real time into all this later though. Only been able to give it some very backgroundish time this morning. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654828#action_12654828 ] Yonik Seeley commented on LUCENE-1483: -- bq. Okay, I straightened things out, and now it looks like possibly no loss So if there was a 17% loss on the optimized index, and very little loss on a segmented index, I assume that means that matching/scoring is enough slower on the segmented index that the loss in sorting performance doesn't matter as much? > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654834#action_12654834 ] Mark Miller commented on LUCENE-1483: - Ignore those first results entirely. It turns out I had the latest 1471 patched in. That shouldnt slow down a single segment though. Neither this or 1471 should have slowed things down because they only affect multisegment and multiindex searches I thought. Odd, but I just junked all of that and started fresh, did the tests a little closer to right, and see the numbers looking the same. Didn't want to get too into benching before its sorted out a bit more. I'll try to get enough time to be more rigorous later though. My free moments are under heavy attack by the female that appears to have made herself at home in my house. As a side not, 1471 doesn't work in a couple ways with this patch - it throws both a nullpointer exception and a class cast exception in different circumstances. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654839#action_12654839 ] Michael McCandless commented on LUCENE-1483: I think there should be very little impact to performance, for single or multi segment indices, for the search itself against a warmed reader. (And actually LUCENE-1471 should make things a wee bit faster, especially if n,m are largeish, though this will typically be in the noise). But warming after reopen should be much faster with this patch (we should try to measure that too). > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654842#action_12654842 ] Mark Miller commented on LUCENE-1483: - bq.I think there should be very little impact to performance, for single or multi segment indices, for the search itself against a warmed reader. (And actually LUCENE-1471 should make things a wee bit faster, especially if n,m are largeish, though this will typically be in the noise). That seems to be inline with what I got with 6 segments. I'm running some 30-50 seg range tests on my other laptop now. bq. But warming after reopen should be much faster with this patch (we should try to measure that too). I've got a base alg for that type of thing around somewhere too, from 831. It should be about the same, which means pretty dramatic reopen improvements if you multiple segments, especially if the new segment is small. Its likely to be small in comparison to all of the segs anyway, which means pretty great improvements. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions
See http://lucene.markmail.org/message/fu34tuomnqejchfj?q=RemoteSearchable for just such a proposal On Dec 8, 2008, at 1:52 PM, Doug Cutting (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 #action_12654513 ] Doug Cutting commented on LUCENE-1473: -- Would it take any more lines of code to remove Serializeable from the core classes and re-implement RemoteSearchable in a separate layer on top of the core APIs? That layer could be a contrib module and could get all the externalizeable love it needs. It could support a specific popular subset of query and filter classes, rather than arbitrary Query implementations. It would be extensible, so that if folks wanted to support new kinds of queries, they easily could. This other approach seems like a slippery slope, complicating already complex code with new concerns. It would be better to encapsulate these concerns in a layer atop APIs whose back- compatibility we already make promises about, no? Implement standard Serialization across Lucene versions --- Key: LUCENE-1473 URL: https://issues.apache.org/jira/browse/LUCENE-1473 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Attachments: custom-externalizable-reader.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch Original Estimate: 8h Remaining Estimate: 8h To maintain serialization compatibility between Lucene versions, serialVersionUID needs to be added to classes that implement java.io.Serializable. java.io.Externalizable may be implemented in classes for faster performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654875#action_12654875 ] Doug Cutting commented on LUCENE-1482: -- safeDebugMsg is protected in a public class, which means it will appear in the javadoc, which it should not. Also, logging the thread ID should be done by the logging system, not by Lucene. So that method should just be removed, no? Also, you've added braces to all of the log statements. This is in conformance with our style guidelines, but I prefer that logging add a minimum of vertical space, so that more real logic is visible at once. I suggest you not make this style change in this patch, but propose it separately, if at all. > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: LUCENE-1482.patch, slf4j-api-1.5.6.jar, > slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654895#action_12654895 ] Mark Miller commented on LUCENE-1483: - Ill bench again after this issue is polished up, but it looks like at 100 segments I am seeing the 20% drop. I didn't see any drop at 6 segments in a retest. Ill do some longer, more thought out benchmarks when the patch is in better shape. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1314) IndexReader.clone
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1314: - Summary: IndexReader.clone (was: IndexReader.reopen(boolean force)) > IndexReader.clone > - > > Key: LUCENE-1314 > URL: https://issues.apache.org/jira/browse/LUCENE-1314 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Affects Versions: 2.3.1 >Reporter: Jason Rutherglen >Assignee: Michael McCandless >Priority: Minor > Attachments: lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, > lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, > lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, > lucene-1314.patch > > > Based on discussion > http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem > is reopen returns the same reader if there are no changes, so if docs are > deleted from the new reader, they are also reflected in the previous reader > which is not always desired behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654922#action_12654922 ] Michael McCandless commented on LUCENE-1483: Hmmm. OK I think I see what could explain this: insertion into the pqueue is fairly costly. So, because we now make 100 pqueues, each gathering top N results, we are paying much more insertion cost overall than the single queue that IndexSearcher(MultiReader) uses. So how about still doing the searches per-sub-reader(searcher), but, make a HitCollector that gathers the results into a single pqueue, passing that HitCollector to each sub-searcher? If that turns out OK, then I think it would make LUCENE-1471 moot because we should similarly change MultiSearcher to use a single shared pqueue. Actually I think this approach should be a bit faster, because there is some very small method call overhead to how MultiReader implements TermDocs/Positions by "concatenating" its sub-readers. So by pushing Searcher down onto each SegmentReader we should gain a bit, but it could very well be in the noise. For this reason we may in fact want to do this same thing for the "normal" (sort by relevance) IndexSearcher.search. I wish I thought of this sooner. Sorry for the runaround Mark! > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1484) Remove SegmentReader.document synchronization
Remove SegmentReader.document synchronization - Key: LUCENE-1484 URL: https://issues.apache.org/jira/browse/LUCENE-1484 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4 Reporter: Jason Rutherglen This is probably the last synchronization issue in Lucene. It is the document method in SegmentReader. It is avoidable by using a threadlocal for FieldsReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654923#action_12654923 ] Jason Rutherglen commented on LUCENE-1475: -- RE: "It should return an empty array, not null when there are no sub readers." It should return null because there are no results. An empty array almost implies the SegmentReader can contain other readers or that they may show up in the future. IMO the API is garbage anyways because it should be using an interface like the JDK classes do. MM: "What should be returned if a Multi*Reader has embedded Multi*Readers as sub-readers?" I don't like this approach and the comments seem sound like over engineering a simple solution. If the user wants all the sub of sub readers, they need to write that code externally to Lucene. Otherwise it is not easy to know what the sub readers are for the given reader. > Expose sub-IndexReaders from MultiReader or MultiSegmentReader > -- > > Key: LUCENE-1475 > URL: https://issues.apache.org/jira/browse/LUCENE-1475 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1475.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > MultiReader and MultiSegmentReader are package protected and do not expose > the underlying sub-IndexReaders. A way to expose the sub-readers is to have > an interface that an IndexReader may be cast to that exposes the underlying > readers. > This is for realtime indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654924#action_12654924 ] robert engels commented on LUCENE-1475: --- That is not correct. By returning a non-null array, it is trivially to get an ordered list of all subreaders using simple recursion. It does not need to be an interface... no reason, if adding a new method to IndexReader (and changing the implementations). > Expose sub-IndexReaders from MultiReader or MultiSegmentReader > -- > > Key: LUCENE-1475 > URL: https://issues.apache.org/jira/browse/LUCENE-1475 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: LUCENE-1475.patch > > Original Estimate: 4h > Remaining Estimate: 4h > > MultiReader and MultiSegmentReader are package protected and do not expose > the underlying sub-IndexReaders. A way to expose the sub-readers is to have > an interface that an IndexReader may be cast to that exposes the underlying > readers. > This is for realtime indexing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader
Use OpenBitSet instead of BitVector in SegmentReader Key: LUCENE-1485 URL: https://issues.apache.org/jira/browse/LUCENE-1485 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4 Reporter: Jason Rutherglen Priority: Minor Tried out BitVector.get vs OpenBitSet.get here's the results which are about the same after running 25 times in milliseconds. It is assumed that implementing DocIdSetIterator for in SegmentTermDocs will speed things up more. bit set size: 10,485,760 set bits count: 524,032 openbitset: 68 bitvector: 89 24% speed increase. I will implement a patch that adds the WriteableBitSet interface and make a subclass of OpenBitSet that is writeable to disk. We're working on an isSparse method for OpenBitSet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1485: - Description: Tried out BitVector.get vs OpenBitSet.get here's the results which are about the same after running 25 times in milliseconds. It is assumed that implementing DocIdSetIterator in SegmentTermDocs will speed things up more. bit set size: 10,485,760 set bits count: 524,032 openbitset: 68 bitvector: 89 24% speed increase. I will implement a patch that adds the WriteableBitSet interface and make a subclass of OpenBitSet that is writeable to disk. We're working on an isSparse method for OpenBitSet. was: Tried out BitVector.get vs OpenBitSet.get here's the results which are about the same after running 25 times in milliseconds. It is assumed that implementing DocIdSetIterator for in SegmentTermDocs will speed things up more. bit set size: 10,485,760 set bits count: 524,032 openbitset: 68 bitvector: 89 24% speed increase. I will implement a patch that adds the WriteableBitSet interface and make a subclass of OpenBitSet that is writeable to disk. We're working on an isSparse method for OpenBitSet. > Use OpenBitSet instead of BitVector in SegmentReader > > > Key: LUCENE-1485 > URL: https://issues.apache.org/jira/browse/LUCENE-1485 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Original Estimate: 96h > Remaining Estimate: 96h > > Tried out BitVector.get vs OpenBitSet.get here's the results which are about > the same after running 25 times in milliseconds. It is assumed that > implementing DocIdSetIterator in SegmentTermDocs will speed things up more. > bit set size: 10,485,760 > set bits count: 524,032 > openbitset: 68 > bitvector: 89 > 24% speed increase. > I will implement a patch that adds the WriteableBitSet interface and make a > subclass of OpenBitSet that is writeable to disk. We're working on an > isSparse method for OpenBitSet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1485: - Attachment: TestDeletedDocsSpeed.java TestDeletedDocsSpeed.java Executes get on BitVector and OpenBitSet. FastGet is called on OpenBitSet. > Use OpenBitSet instead of BitVector in SegmentReader > > > Key: LUCENE-1485 > URL: https://issues.apache.org/jira/browse/LUCENE-1485 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: TestDeletedDocsSpeed.java > > Original Estimate: 96h > Remaining Estimate: 96h > > Tried out BitVector.get vs OpenBitSet.get here's the results which are about > the same after running 25 times in milliseconds. It is assumed that > implementing DocIdSetIterator in SegmentTermDocs will speed things up more. > bit set size: 10,485,760 > set bits count: 524,032 > openbitset: 68 > bitvector: 89 > 24% speed increase. > I will implement a patch that adds the WriteableBitSet interface and make a > subclass of OpenBitSet that is writeable to disk. We're working on an > isSparse method for OpenBitSet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654940#action_12654940 ] Doug Cutting commented on LUCENE-1483: -- > make a HitCollector that gathers the results into a single pqueue That's good when everything's local, but bad when things are distributed. If we movE RemoteSearchable to contrib (as discussed in LUCENE-1314) then this may not be a problem, but we might still leave hooks so that someone can write a search that uses a separate top-queue per remote segment. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654962#action_12654962 ] Michael McCandless commented on LUCENE-1483: {quote} >> make a HitCollector that gathers the results into a single pqueue > > That's good when everything's local, but bad when things are distributed. If > we movE RemoteSearchable to contrib (as discussed in LUCENE-1314) then this > may not be a problem, but we might still leave hooks so that someone can > write a search that uses a separate top-queue per remote segment. {quote} Good point; so this means we can't blindly do this optimization to MultiSearcher (w/o having option to do separate queues, merged in the end). But for IndexSearcher(Multi*Reader).search it should be safe? > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654970#action_12654970 ] Doug Cutting commented on LUCENE-1483: -- > But for IndexSearcher(Multi*Reader).search it should be safe? Right. Perhaps this is a reason to encourage folks to use MultiReader instead of MultiSearcher. Are there cases, other than distributed, where MultiSearcher is required? If not, perhaps it could be moved to the contrib/distributed layer too. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Bounds checking in BItVector
I ran another test on the speed of BitVector vs. OpenBitSet. Unless the DocIdSetIterator is faster for OpenBitSet vs an equivalent for BitVector, BV is faster when it's bounds checking is removed. I'm trying to figure out a good way to allow a modified version of BitVector that does not do bounds checking. BitVector is final, but perhaps making it non-final and creating a FastBitVector instantiated by SegmentReader from a (is there an alternative) system property is one way to go.
[jira] Commented: (LUCENE-1485) Use OpenBitSet instead of BitVector in SegmentReader
[ https://issues.apache.org/jira/browse/LUCENE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654994#action_12654994 ] Jason Rutherglen commented on LUCENE-1485: -- This above test was using the -client option in the JVM on Mac OS X. Using -server the numbers look almost the same for OpenBitSet and BitVector with BitVector being slightly faster. > Use OpenBitSet instead of BitVector in SegmentReader > > > Key: LUCENE-1485 > URL: https://issues.apache.org/jira/browse/LUCENE-1485 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen >Priority: Minor > Attachments: TestDeletedDocsSpeed.java > > Original Estimate: 96h > Remaining Estimate: 96h > > Tried out BitVector.get vs OpenBitSet.get here's the results which are about > the same after running 25 times in milliseconds. It is assumed that > implementing DocIdSetIterator in SegmentTermDocs will speed things up more. > bit set size: 10,485,760 > set bits count: 524,032 > openbitset: 68 > bitvector: 89 > 24% speed increase. > I will implement a patch that adds the WriteableBitSet interface and make a > subclass of OpenBitSet that is writeable to disk. We're working on an > isSparse method for OpenBitSet. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655002#action_12655002 ] Uwe Schindler commented on LUCENE-1478: --- Just a note: For the FieldCache it is also important, that the parser is a singleton or implements hashCode() and equals(). If not, each call to sort with another SortField using a different parser instance (but from same class) would create a new FieldCache. This is why I said, that SortComparators and Parsers should generally be made static final class variables (and so singletons) like I have done it in TrieUtils. With that you can be sure, that all SortFields hit the same cache entry when looking up using FieldCacheImpl.Entry. The use of hashCode and equals for parsers is the other variant, but it does not make really sense (as long as parsers and comparators do not have an instance-specific state). > Missing possibility to supply custom FieldParser when sorting search results > > > Key: LUCENE-1478 > URL: https://issues.apache.org/jira/browse/LUCENE-1478 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1478-cleanup.patch, > LUCENE-1478-no-superinterface.patch, LUCENE-1478.patch, LUCENE-1478.patch, > LUCENE-1478.patch, LUCENE-1478.patch, LUCENE-1478.patch > > > When implementing the new TrieRangeQuery for contrib (LUCENE-1470), I was > confronted by the problem that the special trie-encoded values (which are > longs in a special encoding) cannot be sorted by Searcher.search() and > SortField. The problem is: If you use SortField.LONG, you get > NumberFormatExceptions. The trie encoded values may be sorted using > SortField.String (as the encoding is in such a way, that they are sortable as > Strings), but this is very memory ineffective. > ExtendedFieldCache gives the possibility to specify a custom LongParser when > retrieving the cached values. But you cannot use this during searching, > because there is no possibility to supply this custom LongParser to the > SortField. > I propose a change in the sort classes: > Include a pointer to the parser instance to be used in SortField (if not > given use the default). My idea is to create a SortField using a new > constructor > {code}SortField(String field, int type, Object parser, boolean reverse){code} > The parser is "object" because all current parsers have no super-interface. > The ideal solution would be to have: > {code}SortField(String field, int type, FieldCache.Parser parser, boolean > reverse){code} > and FieldCache.Parser is a super-interface (just empty, more like a > marker-interface) of all other parsers (like LongParser...). The sort > implementation then must be changed to respect the given parser (if not > NULL), else use the default FieldCache.get without parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1478) Missing possibility to supply custom FieldParser when sorting search results
[ https://issues.apache.org/jira/browse/LUCENE-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655002#action_12655002 ] thetaphi edited comment on LUCENE-1478 at 12/9/08 2:55 PM: Just a note: For the FieldCache it is also important, that the parser is a singleton or implements hashCode() and equals(). If not, each call to sort with another SortField using a different parser instance (but from the same class) would create a new FieldCache. This is why I said, that SortComparators and Parsers should generally be made static final members (and so singletons) like I have done it in TrieUtils. With that you can be sure, that all SortFields hit the same cache entry when looking up using FieldCacheImpl.Entry. The implementation of hashCode and equals for parsers is the other variant, but it does not make really sense (as long as parsers and comparators do not have an instance-specific state). was (Author: thetaphi): Just a note: For the FieldCache it is also important, that the parser is a singleton or implements hashCode() and equals(). If not, each call to sort with another SortField using a different parser instance (but from same class) would create a new FieldCache. This is why I said, that SortComparators and Parsers should generally be made static final class variables (and so singletons) like I have done it in TrieUtils. With that you can be sure, that all SortFields hit the same cache entry when looking up using FieldCacheImpl.Entry. The use of hashCode and equals for parsers is the other variant, but it does not make really sense (as long as parsers and comparators do not have an instance-specific state). > Missing possibility to supply custom FieldParser when sorting search results > > > Key: LUCENE-1478 > URL: https://issues.apache.org/jira/browse/LUCENE-1478 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.4 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1478-cleanup.patch, > LUCENE-1478-no-superinterface.patch, LUCENE-1478.patch, LUCENE-1478.patch, > LUCENE-1478.patch, LUCENE-1478.patch, LUCENE-1478.patch > > > When implementing the new TrieRangeQuery for contrib (LUCENE-1470), I was > confronted by the problem that the special trie-encoded values (which are > longs in a special encoding) cannot be sorted by Searcher.search() and > SortField. The problem is: If you use SortField.LONG, you get > NumberFormatExceptions. The trie encoded values may be sorted using > SortField.String (as the encoding is in such a way, that they are sortable as > Strings), but this is very memory ineffective. > ExtendedFieldCache gives the possibility to specify a custom LongParser when > retrieving the cached values. But you cannot use this during searching, > because there is no possibility to supply this custom LongParser to the > SortField. > I propose a change in the sort classes: > Include a pointer to the parser instance to be used in SortField (if not > given use the default). My idea is to create a SortField using a new > constructor > {code}SortField(String field, int type, Object parser, boolean reverse){code} > The parser is "object" because all current parsers have no super-interface. > The ideal solution would be to have: > {code}SortField(String field, int type, FieldCache.Parser parser, boolean > reverse){code} > and FieldCache.Parser is a super-interface (just empty, more like a > marker-interface) of all other parsers (like LongParser...). The sort > implementation then must be changed to respect the given parser (if not > NULL), else use the default FieldCache.get without parser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655005#action_12655005 ] Mark Miller commented on LUCENE-1483: - bq. make a HitCollector that gathers the results into a single pqueue Well it certainly made the code cleaner and the patch a bit nicer, but on first quick test, I still see the 20% slowdown with 100 more/less segments. I'm looking through too see where I may have done something funny. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655025#action_12655025 ] Michael McCandless commented on LUCENE-1483: bq. on first quick test, I still see the 20% slowdown with 100 more/less segments. Argh! Can you post your current patch? > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1483: Attachment: LUCENE-1483.patch Here is what I've got. The final sort test still fails, but the rest should pass. - Mark > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655036#action_12655036 ] Mark Miller commented on LUCENE-1483: - I also did a quick reopen alg. The speed gain on this can really vary depending on index access patterns. I tried adding 50 (very small) docs with a random sort field. Then added 5 docs and then reopen 10 times. Repeat all 4 times. Comparison is of the time it takes to load the fieldcache and do one search. With this patch it came out about 40-50% faster. Obviously going to depend on many factors in real world though. In certain applications I'm sure it could be many times faster or slower. > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655048#action_12655048 ] Mark Miller commented on LUCENE-1483: - Doing a little profiling on the new code and off the top results of interest are: FieldSortedHitQueue.lessThan(object,object) approx 12% FieldSortedHitQueue.insertWIthOverflow(object) approx 12% MultiReaderTopFieldDocCollector.collect(int,float) 6.3% FieldSortedHitQueue$4.compare() 5.3 % and on... > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1484) Remove SegmentReader.document synchronization
[ https://issues.apache.org/jira/browse/LUCENE-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-1484: - Attachment: LUCENE-1484.patch LUCENE-1484.patch - FieldsReader implements Cloneable - fieldsReaderLocal added to SegmentReader - TestIndexReader, TestFieldsReader, TestSegmentReader, TestParallelMultiSearcher passes > Remove SegmentReader.document synchronization > - > > Key: LUCENE-1484 > URL: https://issues.apache.org/jira/browse/LUCENE-1484 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.4 >Reporter: Jason Rutherglen > Attachments: LUCENE-1484.patch > > Original Estimate: 96h > Remaining Estimate: 96h > > This is probably the last synchronization issue in Lucene. It is the > document method in SegmentReader. It is avoidable by using a threadlocal for > FieldsReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1483) Change IndexSearcher to use MultiSearcher semantics for sorted searches
[ https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655048#action_12655048 ] [EMAIL PROTECTED] edited comment on LUCENE-1483 at 12/9/08 6:15 PM: -- Doing a little profiling on the new code and off the top results of interest are: FieldSortedHitQueue.lessThan(object,object) approx 12% FieldSortedHitQueue.insertWIthOverflow(object) approx 12% MultiReaderTopFieldDocCollector.collect(int,float) 6.3% FieldSortedHitQueue$4.compare() 5.3 % and on... For Lucene trunk, a day or two ago: FieldSortedHitQueue.insertWIthOverflow(object) approx 11% TopFieldDocCollector.collect(int,float) 7.1% FieldSortedHitQueue.lessThan(object,object) approx 6.7% FieldSortedHitQueue.updateMaxScore 3.2% FieldSortedHitQueue$4.compare() 3.2 % was (Author: [EMAIL PROTECTED]): Doing a little profiling on the new code and off the top results of interest are: FieldSortedHitQueue.lessThan(object,object) approx 12% FieldSortedHitQueue.insertWIthOverflow(object) approx 12% MultiReaderTopFieldDocCollector.collect(int,float) 6.3% FieldSortedHitQueue$4.compare() 5.3 % and on... > Change IndexSearcher to use MultiSearcher semantics for sorted searches > --- > > Key: LUCENE-1483 > URL: https://issues.apache.org/jira/browse/LUCENE-1483 > Project: Lucene - Java > Issue Type: Improvement >Affects Versions: 2.9 >Reporter: Mark Miller >Priority: Minor > Attachments: LUCENE-1483.patch, LUCENE-1483.patch > > > Here is a quick test patch. FieldCache for sorting is done at the individual > IndexReader level and reloading the fieldcache on reopen can be much faster > as only changed segments need to be reloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: jira attachments ?
Hi Robert, I'm part of the JIRA support team and one of our devs brought this up so I've taken a quick look to see if i can try and replicate your problem. I just upgraded to Safari 3.2 on OS X 10.5 and tested on a couple of different versions of JIRA (3.12.2 and 3.13.2) and didn't have any troubles. I also had a go at testing on https://issues.apache.org/jira/browse/TST-76 and that worked without issues Unfortunately I don't have a version of 10.4 handy to test at the moment, so maybe its something to do with the combination of 10.4 and safari 3.2 Apologies if this comes through twice, I hadn't subscribed to the list before so I'm not sure it worked :) Andrew Atlassian Support Begin forwarded message: Original Message Subject:Re: jira attachments ? Date: Thu, 4 Dec 2008 18:07:41 -0600 From: robert engels <[EMAIL PROTECTED]> Reply-To: java-dev@lucene.apache.org To: java-dev@lucene.apache.org References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED] > <[EMAIL PROTECTED]> <[EMAIL PROTECTED] > <[EMAIL PROTECTED]> Could be... I will try next time... Seems a strange (and serious) bug in Jira (I have no problems with other "add attachment" sites) ... On Dec 4, 2008, at 5:59 PM, Michael McCandless wrote: Hmmm the only time I've seen this was also with Safari (though on an older version). It caused me to switch [back] to Firefox. Try Firefox? Mike robert engels wrote: I am using Safari 3.2 (on OSX Tiger). On Dec 4, 2008, at 5:38 PM, Michael McCandless wrote: Robert which browser are you using? Mike robert engels wrote: Dear God, I've been blocked ! What will the Lucene community do ! :) On Dec 4, 2008, at 3:27 PM, Uwe Schindler wrote: Hi Robert, two minutes ago I uploaded a patch... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] From: robert engels [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 9:37 PM To: java-dev@lucene.apache.org Subject: jira attachments ? I am having a problem posting an attachment to Jira. Just spins, and spins... Everything else seems to work fine (comments, etc.). Anyone else experiencing this? Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- --- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: java-dev- [EMAIL PROTECTED] --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
[ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655081#action_12655081 ] Marvin Humphrey commented on LUCENE-831: > Marvin, does KS/Lucy have something like FieldCache? If so, what API do you > use? Is it iterator-only? At present, KS only caches the docID -> ord map as an array. It builds that array by iterating over the terms in the sort field's Lexicon and mapping the docIDs from each term's posting list. Building the docID -> ord array is straightforward for a single-segment SegLexicon. The multi-segment case requires that several SegLexicons be collated using a priority queue. In KS, there's a MultiLexicon class which handles this; I don't believe that Lucene has an analogous class. Relying on the docID -> ord array alone works quite well until you get to the MultiSearcher case. As you know, at that point you need to be able to retrieve the actual field values from the ordinal numbers, so that you can compare across multiple searchers (since the ordinal values are meaningless). {code} Lex_Seek_By_Num(lexicon, term_num); field_val = Lex_Get_Term(lexicon); {code} The problem is that seeking by ordinal value on a MultiLexicon iterator requires a gnarly implementation and is very expensive. I got it working, but I consider it a dead-end design and a failed experiment. The planned replacement for these iterator-based quasi-FieldCaches involves several topics of recent discussion: 1) A "keyword" field type, implemented using a format similar to what Nate and I came up with for the lexicon index. 2) Write per-segment docID -> ord maps at index time for sort fields. 3) Memory mapping. 4) Segment-centric searching. We'd mmap the pre-composed docID -> ord map and use it for intra-segment sorting. The keyword field type would be implemented in such a way that we'd be able to mmap a few files and get a per-segment field cache, which we'd then use to sort hits from multiple segments. > Complete overhaul of FieldCache API/Implementation > -- > > Key: LUCENE-831 > URL: https://issues.apache.org/jira/browse/LUCENE-831 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Hoss Man > Fix For: 3.0 > > Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, > fieldcache-overhaul.diff, fieldcache-overhaul.diff, > LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff, > LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, > LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch > > > Motivation: > 1) Complete overhaul the API/implementation of "FieldCache" type things... > a) eliminate global static map keyed on IndexReader (thus > eliminating synch block between completley independent IndexReaders) > b) allow more customization of cache management (ie: use > expiration/replacement strategies, disk backed caches, etc) > c) allow people to define custom cache data logic (ie: custom > parsers, complex datatypes, etc... anything tied to a reader) > d) allow people to inspect what's in a cache (list of CacheKeys) for > an IndexReader so a new IndexReader can be likewise warmed. > e) Lend support for smarter cache management if/when > IndexReader.reopen is added (merging of cached data from subReaders). > 2) Provide backwards compatibility to support existing FieldCache API with > the new implementation, so there is no redundent caching as client code > migrades to new API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)
[ https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-1482: --- Attachment: LUCENE-1482-2.patch I kept safeDebugMsg because it was used by a class which extended IndexWriter and relied on that method to be called. However, I fixed the class by overriding testPoint instead. So I can now remove safeDebugMsg. As for the output format, I agree that it should be handled by the logging system, but wanted to confirm that with other members before I change it. I'm glad that you agree to that too. Attached is a new patch which removes the method. > Replace infoSteram by a logging framework (SLF4J) > - > > Key: LUCENE-1482 > URL: https://issues.apache.org/jira/browse/LUCENE-1482 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Shai Erera >Priority: Minor > Fix For: 2.4.1, 2.9 > > Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, > slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar > > > Lucene makes use of infoStream to output messages in its indexing code only. > For debugging purposes, when the search application is run on the customer > side, getting messages from other code flows, like search, query parsing, > analysis etc can be extremely useful. > There are two main problems with infoStream today: > 1. It is owned by IndexWriter, so if I want to add logging capabilities to > other classes I need to either expose an API or propagate infoStream to all > classes (see for example DocumentsWriter, which receives its infoStream > instance from IndexWriter). > 2. I can either turn debugging on or off, for the entire code. > Introducing a logging framework can allow each class to control its logging > independently, and more importantly, allows the application to turn on > logging for only specific areas in the code (i.e., org.apache.lucene.index.*). > I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, > as it names states, a facade over different logging frameworks. As such, you > can include the slf4j.jar in your application, and it recognizes at deploy > time what is the actual logging framework you'd like to use. SLF4J comes with > several adapters for Java logging, Log4j and others. If you know your > application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in > your classpath, and your logging statements will use Java logging underneath > the covers. > This makes the logging code very simple. For a class A the logger will be > instantiated like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > } > And will later be used like this: > public class A { > private static final logger = LoggerFactory.getLogger(A.class); > public void foo() { > if (logger.isDebugEnabled()) { > logger.debug("message"); > } > } > } > That's all ! > Checking for isDebugEnabled is very quick, at least using the JDK14 adapter > (but I assume it's fast also over other logging frameworks). > The important thing is, every class controls its own logger. Not all classes > have to output logging messages, and we can improve Lucene's logging > gradually, w/o changing the API, by adding more logging messages to > interesting classes. > I will submit a patch shortly -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]