[jira] Resolved: (LUCENE-1826) All Tokenizer implementations should have constructors that take AttributeSource and AttributeFactory
[ https://issues.apache.org/jira/browse/LUCENE-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1826. --- Resolution: Fixed Committed revision 806942. > All Tokenizer implementations should have constructors that take > AttributeSource and AttributeFactory > - > > Key: LUCENE-1826 > URL: https://issues.apache.org/jira/browse/LUCENE-1826 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Tim Smith >Assignee: Michael Busch > Fix For: 2.9 > > Attachments: lucene-1826.patch > > > I have a TokenStream implementation that joins together multiple sub > TokenStreams (i then do additional filtering on top of this, so i can't just > have the indexer do the merging) > in 2.4, this worked fine. > once one sub stream was exhausted, i just started using the next stream > however, in 2.9, this is very difficult, and requires copying Term buffers > for every token being aggregated > however, if all the sub TokenStreams share the same AttributeSource, and my > "concat" TokenStream shares the same AttributeSource, this goes back to being > very simple (and very efficient) > So for example, i would like to see the following constructor added to > StandardTokenizer: > {code} > public StandardTokenizer(AttributeSource source, Reader input, boolean > replaceInvalidAcronym) { > super(source); > ... > } > {code} > would likewise want similar constructors added to all Tokenizer sub classes > provided by lucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #926
Looks like this build failed because downloads.osafoundation.org is down (we download BDB JARs from there, for contrib/db). This has happened a good number of times now... it'd be great to fix the contrib/db/build.xml to just skip the tests when this download fails. I'll open an issue but I'm not sure how to do this w/ ant. Mike On Sat, Aug 22, 2009 at 10:16 PM, Apache Hudson Server wrote: > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/926/changes > > Changes: > > [gsingers] LUCENE-1841: file format summary info > > [markrmiller] regex has been moved from core - package should have been > removed from test src > > [markrmiller] LUCENE-1827: Make the payload span queries consistent > > [markrmiller] more work on Scorer javadoc in package.html > > [markrmiller] LUCENE-1839: change explain from abstract to throw > UnsupportedOperationException > > [rmuir] LUCENE-1834: Remove unused code in SmartChineseAnalyzer hmm pkg > > [rmuir] LUCENE-1793: Deprecate custom encoding support in Greek and Russian > analyzers > > [markrmiller] LUCENE-1838: BoostingNearQuery must implement clone/toString > > [uschindler] LUCENE-1843: Convert some tests to new TokenStream API, better > support of cross-impl AttributeImpl.copyTo() > > [uschindler] LUCENE-1825: Incorrect usage of > AttributeSource.addAttribute/getAttribute leads to failures when > onlyUseNewAPI=true > > -- > [...truncated 3983 lines...] > > clover: > > compile-core: > > jar-core: > [jar] Building jar: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/common/lucene-analyzers-2.9-SNAPSHOT.jar > > default: > > smartcn: > [echo] Building smartcn... > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > build-lucene: > > build-lucene-tests: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > > jar-core: > [jar] Building jar: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/smartcn/lucene-smartcn-2.9-SNAPSHOT.jar > > default: > > default: > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > build-lucene: > > build-lucene-tests: > > init: > > clover.setup: > > clover.info: > > clover: > > common.compile-core: > > compile-core: > > compile: > > check-files: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > > common.compile-test: > [mkdir] Created dir: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test > [javac] Compiling 12 source files to > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -Xlint:deprecation for details. > [copy] Copying 2 files to > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test > > build-artifacts-and-tests: > [echo] Building collation... > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > compile-misc: > [echo] Building misc... > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > build-lucene: > > build-lucene-tests: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > [mkdir] Created dir: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java > [javac] Compiling 17 source files to > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java > [javac] Note: Some input files use or override a deprecated API. > [javac] Note: Recompile with -Xlint:deprecation for details. > > compile: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > [mkdir] Created dir: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java > [javac] Compiling 4 source files to > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java > > jar-core: > [jar] Building jar: > http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/lucene-collation-2.9-SNAPSHOT.jar > > jar: > > compile-test: > [echo] Building collation... > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > compile-misc: > [echo] Building misc... > > javacc-uptodate-check: > > javacc-notice: > > jflex-uptodate-check: > > jflex-notice: > > common.init: > > build-lucene: > > build-lucene-tests: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > > compile: > > init: > > clover.setup: > > clover.info: > > clover: > > compile-core: > > common.compil
[jira] Created: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
if the build fails to download JARs for contrib/db, just skip its tests --- Key: LUCENE-1845 URL: https://issues.apache.org/jira/browse/LUCENE-1845 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Priority: Minor Every so often our nightly build fails because contrib/db is unable to download the necessary BDB JARs from http://downloads.osafoundation.org. I think in such cases we should simply skip contrib/db's tests, if it's the nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1846) More Locale problems in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746579#action_12746579 ] Uwe Schindler edited comment on LUCENE-1846 at 8/23/09 2:51 AM: Patch. The changes in DateTools may affect users with very strange default locales that indexed with prior Lucene versions, but this is unlikely a problem, as the whole sorting may be broken already. Should I add a note to CHANGES.txt? was (Author: thetaphi): Patch. The changes in DateField may affect users with very strange default locales that indexed with prior Lucene versions, but this is unlikely a problem, as the whole sorting may be broken already. Should I add a note to CHANGES.txt? > More Locale problems in Lucene > -- > > Key: LUCENE-1846 > URL: https://issues.apache.org/jira/browse/LUCENE-1846 > Project: Lucene - Java > Issue Type: Bug > Components: Other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Trivial > Fix For: 2.9 > > Attachments: LUCENE-1846.patch > > > This is a followup to LUCENE-1836: I found some more Locale problems in > Lucene with Date Formats. Even for simple date formats only consisting of > numbers (like ISO dates), you should always give the US locale. Because the > dates in DateTools should sort according to String.compare(), it is > important, that the decimal digits are western ones. In some strange locales, > this may be different. Whenever you want to format dates for internal formats > you exspect to behave somehow, you should at least set the locale to US, > which uses ASCII. Dates entered by users and displayed to users, should be > formatted according to the default or a custom specified locale. > I also looked for DecimalFormat (especially used for padding numbers), but > found no problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1846) More Locale problems in Lucene
More Locale problems in Lucene -- Key: LUCENE-1846 URL: https://issues.apache.org/jira/browse/LUCENE-1846 Project: Lucene - Java Issue Type: Bug Components: Other Reporter: Uwe Schindler Assignee: Uwe Schindler Priority: Trivial Fix For: 2.9 This is a followup to LUCENE-1836: I found some more Locale problems in Lucene with Date Formats. Even for simple date formats only consisting of numbers (like ISO dates), you should always give the US locale. Because the dates in DateTools should sort according to String.compare(), it is important, that the decimal digits are western ones. In some strange locales, this may be different. Whenever you want to format dates for internal formats you exspect to behave somehow, you should at least set the locale to US, which uses ASCII. Dates entered by users and displayed to users, should be formatted according to the default or a custom specified locale. I also looked for DecimalFormat (especially used for padding numbers), but found no problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1846) More Locale problems in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1846: -- Attachment: LUCENE-1846.patch Patch. The changes in DateField may affect users with very strange default locales that indexed with prior Lucene versions, but this is unlikely a problem, as the whole sorting may be broken already. Should I add a note to CHANGES.txt? > More Locale problems in Lucene > -- > > Key: LUCENE-1846 > URL: https://issues.apache.org/jira/browse/LUCENE-1846 > Project: Lucene - Java > Issue Type: Bug > Components: Other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Trivial > Fix For: 2.9 > > Attachments: LUCENE-1846.patch > > > This is a followup to LUCENE-1836: I found some more Locale problems in > Lucene with Date Formats. Even for simple date formats only consisting of > numbers (like ISO dates), you should always give the US locale. Because the > dates in DateTools should sort according to String.compare(), it is > important, that the decimal digits are western ones. In some strange locales, > this may be different. Whenever you want to format dates for internal formats > you exspect to behave somehow, you should at least set the locale to US, > which uses ASCII. Dates entered by users and displayed to users, should be > formatted according to the default or a custom specified locale. > I also looked for DecimalFormat (especially used for padding numbers), but > found no problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1845: Attachment: LUCENE-1845.txt I set the property "ignoreerrors" to true on the get task. This should print out if there is a problem with the download and continue. The sanity check will fail if the jar is not present and unit-test will be skipped. i guess that should do the job though. > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746600#action_12746600 ] Tim Smith commented on LUCENE-1821: --- well, you could go the route similar to the 2.4 TokenStream api (next() vs next(Token)) have Filter.getDocIdSet(IndexSearcher, IndexReader) call Filter.getDocIdSet(IndexReader), and vice versa by default one method or the other would be required to be overridden getDocIdSet(IndexReader) would be deprecated (and removed in 3.0) Since the deprecated method would be removed in 3.0, and since noone would probably be depending on these new semantics right away this should work Also, in general, QueryWrapperFilter performs a bit worse now in 2.9 this is because it creates an IndexSearcher for every query it wraps (which results in doing "gatherSubReaders" and creating the offsets anew each time getDocIdSet(IndexReader) is called so, the new method with the IndexSearcher also passed in is much better for evaluating these Filters > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746605#action_12746605 ] Mark Miller commented on LUCENE-1844: - Should also be able to speed up TestBooleanMinShouldMatch somehow. Its nearly a minute as well. In a loop of 1000 random queries, this is called each time: QueryUtils.check(q1,s); QueryUtils.check(q2,s); Take it out and the test is like 2-5 seconds. Must be some way to optimize this down without losing coverage. > Speed up junit tests > > > Key: LUCENE-1844 > URL: https://issues.apache.org/jira/browse/LUCENE-1844 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Mark Miller > Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png > > > As Lucene grows, so does the number of JUnit tests. This is obviously a good > thing, but it comes with longer and longer test times. Now that we also run > back compat tests in a standard test run, this problem is essentially doubled. > There are some ways this may get better, including running parallel tests. > You will need the hardware to fully take advantage, but it should be a nice > gain. There is already an issue for this, and Junit 4.6, 4.7 have the > beginnings of something we might be able to count on soon. 4.6 was buggy, and > 4.7 still doesn't come with nice ant integration. Parallel tests will come > though. > Beyond parallel testing, I think we also need to concentrate on keeping our > tests lean. We don't want to sacrifice coverage or quality, but I'm sure > there is plenty of fat to skim. > I've started making a list of some of the longer tests - I think with some > work we can make our tests much faster - and then with parallelization, I > think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1844) Speed up junit tests
[ https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746605#action_12746605 ] Mark Miller edited comment on LUCENE-1844 at 8/23/09 7:23 AM: -- Should also be able to speed up TestBooleanMinShouldMatch somehow. Its nearly a minute as well (30s in attached list, but nearly a min on other hardware I have). In a loop of 1000 random queries, this is called each time: QueryUtils.check(q1,s); QueryUtils.check(q2,s); Take it out and the test is like 2-5 seconds. Must be some way to optimize this down without losing coverage. was (Author: markrmil...@gmail.com): Should also be able to speed up TestBooleanMinShouldMatch somehow. Its nearly a minute as well. In a loop of 1000 random queries, this is called each time: QueryUtils.check(q1,s); QueryUtils.check(q2,s); Take it out and the test is like 2-5 seconds. Must be some way to optimize this down without losing coverage. > Speed up junit tests > > > Key: LUCENE-1844 > URL: https://issues.apache.org/jira/browse/LUCENE-1844 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Mark Miller > Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png > > > As Lucene grows, so does the number of JUnit tests. This is obviously a good > thing, but it comes with longer and longer test times. Now that we also run > back compat tests in a standard test run, this problem is essentially doubled. > There are some ways this may get better, including running parallel tests. > You will need the hardware to fully take advantage, but it should be a nice > gain. There is already an issue for this, and Junit 4.6, 4.7 have the > beginnings of something we might be able to count on soon. 4.6 was buggy, and > 4.7 still doesn't come with nice ant integration. Parallel tests will come > though. > Beyond parallel testing, I think we also need to concentrate on keeping our > tests lean. We don't want to sacrifice coverage or quality, but I'm sure > there is plenty of fat to skim. > I've started making a list of some of the longer tests - I think with some > work we can make our tests much faster - and then with parallelization, I > think we could see some really great gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746607#action_12746607 ] Mark Miller commented on LUCENE-1821: - You want to weigh in again Mike ? You still have the same stance as your last comment? > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746608#action_12746608 ] Mark Miller commented on LUCENE-1821: - bq. well, you could go the route similar to the 2.4 TokenStream api (next() vs next(Token)) thats a tough bunch of code to decide to spread ... > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1846) More Locale problems in Lucene
[ https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746611#action_12746611 ] Robert Muir commented on LUCENE-1846: - Uwe, thanks for bringing this issue up! we still have more work to do. Out of curiosity, i looked to see if the old queryparser in core passed under korean locale. it does not... {noformat} setenv ANT_ARGS "-Dargs=-Duser.language=ko -Duser.country=KR" ant -Dtestcase=TestQueryParser test {noformat} > More Locale problems in Lucene > -- > > Key: LUCENE-1846 > URL: https://issues.apache.org/jira/browse/LUCENE-1846 > Project: Lucene - Java > Issue Type: Bug > Components: Other >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Trivial > Fix For: 2.9 > > Attachments: LUCENE-1846.patch > > > This is a followup to LUCENE-1836: I found some more Locale problems in > Lucene with Date Formats. Even for simple date formats only consisting of > numbers (like ISO dates), you should always give the US locale. Because the > dates in DateTools should sort according to String.compare(), it is > important, that the decimal digits are western ones. In some strange locales, > this may be different. Whenever you want to format dates for internal formats > you exspect to behave somehow, you should at least set the locale to US, > which uses ASCII. Dates entered by users and displayed to users, should be > formatted according to the default or a custom specified locale. > I also looked for DecimalFormat (especially used for padding numbers), but > found no problems. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*
[ https://issues.apache.org/jira/browse/LUCENE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1798: --- Attachment: LUCENE-1798.patch Attached patch. I added get/setInfoStream to FieldCache, then, in FieldCacheImpl.Cache.get, if we hit a cache miss and infoStream is enabled, I gather the Insanity[] before the cache entry is added and after, then print out any change involving the entry just added. It produces this output to the infoStream: {noformat} [junit] WARNING: new FieldCache insanity created [junit] Details: VALUEMISMATCH: Multiple distinct value objects for org.apache.lucene.index.directoryrea...@da3a1e+thedouble [junit] 'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=>[F#7896426 (size =~ 3.9 KB) [junit] 'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',double,org.apache.lucene.search.FieldCache.DEFAULT_DOUBLE_PARSER=>[D#5503831 (size =~ 7.8 KB) [junit] 'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',double,null=>[D#5503831 (size =~ 7.8 KB) [junit] [junit] [junit] Stack: [junit] [junit] java.lang.Throwable [junit] at org.apache.lucene.search.FieldCacheImpl$Cache.printNewInsanity(FieldCacheImpl.java:263) [junit] at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:228) [junit] at org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:494) [junit] at org.apache.lucene.search.FieldCacheImpl$FloatCache.createValue(FieldCacheImpl.java:509) [junit] at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:223) [junit] at org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:494) [junit] at org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:487) [junit] at org.apache.lucene.search.TestFieldCache.testInfoStream(TestFieldCache.java:70) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at junit.framework.TestCase.runTest(TestCase.java:164) [junit] at junit.framework.TestCase.runBare(TestCase.java:130) [junit] at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:120) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:230) [junit] at junit.framework.TestSuite.run(TestSuite.java:225) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) {noformat} > FieldCacheSanityChecker called directly by FieldCache.get* > -- > > Key: LUCENE-1798 > URL: https://issues.apache.org/jira/browse/LUCENE-1798 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Hoss Man >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1798.patch > > > As suggested by McCandless in LUCENE-1749, we can make FieldCacheImpl a > client of the FieldCacheSanityChecker and have it sanity check itself each > time it creates a new cache entry, and log a warning if it thinks there is a > problem. (although we'd probably only want to do this if the caller has set > some sort of infoStream/warningStream type property on the FieldCache object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746613#action_12746613 ] Tim Smith commented on LUCENE-1821: --- bq. thats a tough bunch of code to decide to spread ... at least it'll be able to go away real soon with 3.0 > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1836) Flexible QueryParser fails with local different from en_US
[ https://issues.apache.org/jira/browse/LUCENE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746616#action_12746616 ] Robert Muir commented on LUCENE-1836: - Adriano, also as I noted in LUCENE-1846, the old queryparser in core has this same issue. So if you are able to figure out an improvement to the javacc grammar to fix this, I think we should consider applying it there as well. > Flexible QueryParser fails with local different from en_US > -- > > Key: LUCENE-1836 > URL: https://issues.apache.org/jira/browse/LUCENE-1836 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Luis Alves > Fix For: 2.9 > > Attachments: LUCENE-1836.patch, LUCENE-1836.patch, LUCENE-1836.patch > > > I get the following error during the mentioned testcases on my computer, if I > use the Locale de_DE (windows 32): > {code} > [junit] Testsuite: org.apache.lucene.queryParser.standard.TestQPHelper > [junit] Tests run: 29, Failures: 1, Errors: 0, Time elapsed: 1,156 sec > [junit] > [junit] - Standard Output --- > [junit] Result: (fieldX:x fieldy:)^2.0 > [junit] - --- > [junit] Testcase: > testLocalDateFormat(org.apache.lucene.queryParser.standard.TestQPHelper): > FAILED > [junit] expected:<1> but was:<0> > [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0> > [junit] at > org.apache.lucene.queryParser.standard.TestQPHelper.assertHits(TestQPHelper.java:1148) > [junit] at > org.apache.lucene.queryParser.standard.TestQPHelper.testLocalDateFormat(TestQPHelper.java:1005) > [junit] at > org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:201) > [junit] > [junit] > [junit] Test org.apache.lucene.queryParser.standard.TestQPHelper FAILED > [junit] Testsuite: > org.apache.lucene.queryParser.standard.TestQueryParserWrapper > [junit] Tests run: 27, Failures: 1, Errors: 0, Time elapsed: 1,219 sec > [junit] > [junit] - Standard Output --- > [junit] Result: (fieldX:x fieldy:)^2.0 > [junit] - --- > [junit] Testcase: > testLocalDateFormat(org.apache.lucene.queryParser.standard.TestQueryParserWrapper): >FAILED > [junit] expected:<1> but was:<0> > [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0> > [junit] at > org.apache.lucene.queryParser.standard.TestQueryParserWrapper.assertHits(TestQueryParserWrapper.java:1120) > [junit] at > org.apache.lucene.queryParser.standard.TestQueryParserWrapper.testLocalDateFormat(TestQueryParserWrapper.java:985) > [junit] at > org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:201) > [junit] > [junit] > [junit] Test > org.apache.lucene.queryParser.standard.TestQueryParserWrapper FAILED > {code} > With en_US as locale it works. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746617#action_12746617 ] Michael McCandless commented on LUCENE-1821: bq. You want to weigh in again Mike ? I do! I'm trying desperately to catch up over here :) > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746618#action_12746618 ] Michael McCandless commented on LUCENE-1845: Hmm -- I tried applying the patch, then changing the download URL to something bogus that fails, and then "ant test" hits errors during the "compile-core" target. It seems like we have to somehow skip compile-core if the sanity check fails? > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746619#action_12746619 ] Michael McCandless commented on LUCENE-1837: So Mark this will revert LUCENE-1771? Ie no longer pass in the top searcher to weight.explain? > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746621#action_12746621 ] Mark Miller commented on LUCENE-1837: - It won't revert the whole issue. Weight still an abstract class, the sub reader with the doc still the reader passed rather than top level reader. The only revert: Because TermWeight tried to take index level stats from the reader, we passed that searcher (to make the TermWeight explain behavior like it was when we passed top level reader) - its the only place its used currently. But thats illegal now and it was illegal before. You cannot count on having access to the entire index through a Searcher - else we break MultiSearcher and remote use. So passing that Searcher is a recipe for illegal abuse. Same with the other issue Tim brought up - though if we end up passing an IndexSearcher there with all kinds of warnings to abuse at your own peril - I guess we could here. I'm not sure I like it because we encourage code that doesn't work correctly with MultiSearcher. I think if we wan't to go down that road, we should probably try to move away from support remote and multisearcher. > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746622#action_12746622 ] Michael McCandless commented on LUCENE-1837: bq. It won't revert the whole issue. OK got it. bq. Because TermWeight tried to take index level stats from the reader, we passed that searcher (to make the TermWeight explain behavior like it was when we passed top level reader) - its the only place its used currently. PhraseQuery also prints the [top-level] docFreq for each term in the phrase. bq. You cannot count on having access to the entire index through a Searcher - else we break MultiSearcher and remote use. I agree, so our fix in LUCENE-1771 doesn't work w/ MultiSearcher. So we definitely need to do something here... The thing is, it's useful for TermQuery's explain to print out the docFreq/maxDoc, right? (This was the original motivation of LUCENE-1066). But, it has to be the top-level numbers, not the single-segment numbers. Really the Weight should gather & hold all top-level stats it needs on construction? (The MultiSearcher is passed on Weight construction). > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746623#action_12746623 ] Michael McCandless commented on LUCENE-1821: Tim, one option might be to subclass DirectoryReader (though, it's package protected now, and, you'd need to make your own "open" to return your subclass), and override getSequentialSubReaders to return null? Then Lucene would treat it as an atomic reader. Could that work? > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746624#action_12746624 ] Mark Miller commented on LUCENE-1837: - bq. Really the Weight should gather & hold all top-level stats it needs on construction? (The MultiSearcher is passed on Weight construction). Ah - good point. I've said it before myself - index level stats should be taken from the createWeightSearcher - I just don't integrate thoughts well :) So that seems like the right thing to do - only thing I don't like is that this info has to be calculated by calling each Searchable in the MultiSearcher, and then you likely won't ever use it - explain is generally debug stuff. I don't like that. But I guess, if you want the info, you gotto do what you gotto do ... > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746625#action_12746625 ] Michael McCandless commented on LUCENE-1821: bq. for string sorting, it makes a big difference - you now have to do a bunch of String.equals() calls, where you didn't have to in 2.4 (just used the ord index) We actually went through a number of iterations on this, on the first cutover to per-segment collection, and eventually arrived at a decent comparator (StringOrdValComparator) that operates per segment. Have you tested performance of this comparator? > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746628#action_12746628 ] Michael McCandless commented on LUCENE-1837: bq. I don't like is that this info has to be calculated by calling each Searchable in the MultiSearcher, and then you likely won't ever use it - explain is generally debug stuff. I don't like that. But those stats are already being computed (in the default Similarity impl's idf). If we "improved" Similarity.idf so that it returned idf, docFreq and maxDoc in one go, then there's no added cost right? > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Lucene SandBox in documentation
+1 Mike On Thu, Aug 13, 2009 at 4:39 PM, Steven A Rowe wrote: > Tangent: Now that contrib/CHANGES.txt is getting regular updates, I think it > would make sense to generate a Changes.html corresponding to its contents, in > the same way that the core CHANGES.txt is transformed. > > Looks like this Sandbox/Contrib page would be a good place to host it. > > Steve > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Thursday, August 13, 2009 4:29 PM >> To: java-dev@lucene.apache.org >> Subject: Lucene SandBox in documentation >> >> Looks like this page is a bit out of date: >> >> http://lucene.apache.org/java/2_4_1/lucene-sandbox/index.html >> >> been a while since its been the sandbox too ... >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746629#action_12746629 ] Mark Miller commented on LUCENE-1837: - Right - I was just writing to that affect. If we can get that info back too, I think we are golden. What I was writting: It looks like we have to calc twice (not sure about phrasequery - with that it looks like we would have to calc a bunch of info the scorer doesn't even use?) with termweight. We want to grab the info on the TermWeight constructor and store it. That info is already calced, but we don't have access to it: {code} public PhraseWeight(Searcher searcher) throws IOException { this.similarity = getSimilarity(searcher); idf = similarity.idf(terms, searcher); // Similiarity#idf // public float idf(Term term, Searcher searcher) throws IOException { // return idf(searcher.docFreq(term), searcher.maxDoc()); // } } {code} > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746630#action_12746630 ] Yonik Seeley commented on LUCENE-1821: -- bq. Filter.getDocIdSet(IndexSearcher, IndexReader). This suggests that one needs an IndexSearcher to get the ids matching a filter. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746629#action_12746629 ] Mark Miller edited comment on LUCENE-1837 at 8/23/09 9:29 AM: -- Right - I was just writing to that affect. If we can get that info back too, I think we are golden. What I was writting: It looks like we have to calc twice (not sure about phrasequery - with that it looks like we would have to calc a bunch of info the scorer doesn't even use?) with termweight. We want to grab the info on the TermWeight constructor and store it. That info is already calced, but we don't have access to it: {code} public PhraseWeight(Searcher searcher) throws IOException { this.similarity = getSimilarity(searcher); idf = similarity.idf(terms, searcher); // Similiarity#idf // public float idf(Term term, Searcher searcher) throws IOException { // return idf(searcher.docFreq(term), searcher.maxDoc()); // } } {code} *edit* bq. not sure about phrasequery - with that it looks like we would have to calc a bunch of info the scorer doesn't even use? Okay, we do use all of that - again the info is just all hidden behind the Similarity. So we would also want all the docFreq info from every term in: public float idf(Collection terms, Searcher searcher) throws IOException { was (Author: markrmil...@gmail.com): Right - I was just writing to that affect. If we can get that info back too, I think we are golden. What I was writting: It looks like we have to calc twice (not sure about phrasequery - with that it looks like we would have to calc a bunch of info the scorer doesn't even use?) with termweight. We want to grab the info on the TermWeight constructor and store it. That info is already calced, but we don't have access to it: {code} public PhraseWeight(Searcher searcher) throws IOException { this.similarity = getSimilarity(searcher); idf = similarity.idf(terms, searcher); // Similiarity#idf // public float idf(Term term, Searcher searcher) throws IOException { // return idf(searcher.docFreq(term), searcher.maxDoc()); // } } {code} > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746631#action_12746631 ] Mark Miller commented on LUCENE-1837: - And also ;) If a Sim didn't do those calculations (and its an impl detail now), how could we ask for them back? If we tie them to the API, impls will be required to do those calcs for explain - when they didn't need to before. Prob not a huge deal, but ... > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746632#action_12746632 ] Michael McCandless commented on LUCENE-1837: bq. If a Sim didn't do those calculations (and its an impl detail now), how could we ask for them back? We could require only that the thing that's returned can explain itself? > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746633#action_12746633 ] Michael McCandless commented on LUCENE-1821: bq. one used an int[] ord index (the underlaying cache cannot be made per segment) Could you compute the top-level ords, but then break it up per-segment? Ie, create your own map of IndexReader -> offset into that large ord array? This would make it "virtually" per-segment, but allow you to continue computing at the top level. BTW another option is to simply accumulate your own docBase, by adding up the maxDoc() every time an IndexReader is passed to your Weight.scorer(). EG this is what contrib/spatial is now doing. This isn't a long-term solution, since the order in which Lucene visits the readers isn't in general guaranteed, but it will work for 2.9 and buy time to figure out how to switch scoring to per-segment. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746634#action_12746634 ] Michael McCandless commented on LUCENE-1821: bq. Using a per-segment cache will cause some significant performance loss when performing faceting, as it requires creating the facets for each segment, and then merging them (this results in a good deal of extra object overhead/memory overhead/more work where faceting on the multi-reader does not see this) This is a good point... Yonik, how [in general!] is Solr handling the cutover to per-segment, for faceting? > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746636#action_12746636 ] Michael McCandless commented on LUCENE-1821: Net/net, I'm still nervous about pushing down "full context" plus "context free" searcher/reader deep into Lucene's general searching (scorer/filter) APIs. I think these APIs should remain fully context-free (even IndexSearcher still makes me nervous). In some sense, Multi/RemoteSearcher keep us honest, in that they force us to clearly separate out "stuff that has the luxury of full context" (to be done on construction of Weight) from "the heavy lifting that must be context free since it may not have access to the top searcher" (scorer(), getDocIdSet()). > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746635#action_12746635 ] Michael McCandless commented on LUCENE-1821: bq. one used a cached DocIdSet created over the top level MultiReader (should be able to have a DocIdSet per Segment reader here, but this will take some more thinking (source of the matching docids is from a separate index), will also need to know which sub docidset to use based on which IndexReader is passed to scorer() - shouldn't be any big deal) I think, similarly, you could continue to create the top-level DocIdSet, but then make a new DocIdSet that presents one segment's "slice" out of this top-level DocIdSet. Then, pre-build the mapping of IndexReader -> docBase like above, then when scorer() is called in your custom query, just return the "virtual" per-segment DocIdSet. Would this work? > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reopened LUCENE-1843: --- There are some more tests, that fail with onlyUseNewAPI in contrib/analyzers. > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746638#action_12746638 ] Uwe Schindler commented on LUCENE-1843: --- >From a private mail with Robert Muir: yes, all of what you mentioned are problems, and testing for attributes that should be there is good in my opinion too. I noticed the shingle problem as well, it was strange to test termAtt.toString() and expect position increments or types to appear :/ one reason I asked about this, is at some point it would be nice to try to factor test cases in lucene contrib. currently, they all have same helper methods such as assertAnalyzesTo and this is silly in my opinion. On Sun, Aug 23, 2009 at 12:57 PM, Uwe Schindler wrote: > There are more problems. The test with getAttribute is good, if you are > really sure, if the attribute is really available and want assert this. In > all other cases addAttribute should be used to consume a TokenStream. The > changed ones were problematic, because they used foreign TokenStreams, do > not for sure have all these attributes. > > I thought, all tests in contrib use LuceneTestCase as superclass, but they > use the standard junit class. Because of that I did not notice when I put > setOnlyUseNewAPI(true) into LuceneTestCase.setUp(), that they are run with > the default false setting. > > Other problems in these tests are, that some (shingle tests) use > TermAttribute.toString() which looks different if the attribute is > implemented by TermAttributeImpl (newAPI=true) or Token (newAPI=false). > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746639#action_12746639 ] Mark Miller commented on LUCENE-1821: - Cool - I don't like it much either. I say we push this issue from 2.9 for now. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-1845: --- Assignee: Simon Willnauer > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Finishing Lucene 2.9
Right, this (you can jump to 2.9, fix all deprecations, then easily move to 3.0 and see no deprecations) is my understanding too, but I don't see what's particularly useful about that. It does produce a Lucene release that has zero deprecated APIs (assuming we remove all of them), but I don't think that's very important. Also, it's extra work having to do a "no-op, except for deprecations removal and generics addition" release :) Vs say taking our time creating 3.0, letting it have real features, etc. Or, another option would be to simply release 3.0 next. After all, there are some seriously major changes in this release, compilation breakage, etc. ... things you'd more expect (of "traditional" software) in a .0 release. And, then state clearly that all deprecated APIs in 3.0 will be removed in 3.1. While this is technically a change to our back-compat policy, it's also just a number-shifting game since it would just be a rename (2.9 becomes 3.0; 3.0 becomes 3.1). Mike On Thu, Aug 20, 2009 at 8:58 AM, Mark Miller wrote: > Michael McCandless wrote: >> On Wed, Aug 19, 2009 at 6:21 PM, Mark Miller wrote: >> >> >>> I forgot about this oddity. Its so weird. Its like we are doing two >>> releases on top of each other - it just seems confusing. >>> >> >> I'm also not wed to the "fast turnaround" (remove deprecations, switch >> to generics) 3.0 release. >> >> We could, instead, take out time doing the 3.0 release, ie let it >> include new features too. >> >> I thought I had read a motivation for the 1.9 -> 2.0 fast turnaround, >> but I can't remember it nor find it now... >> >> Mike >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > I thought the motivation was to provide a clean upgrade path with the > deprecations - you move to 2.9 and move from all the deprecated methods > - then you move to 3.0 and your good with no deprecations. I'd guess the > worry is that new features in 3.0 would add new deprecations and its not > quite so clean? > > Personally, I think thats fine though. New deprecations will come in 3.1 > anyway. You can still move everything in 2.9, and then move to 3.0 - so > what if something else is now deprecated? You can move again or wait for > 3.9 to move ... > > -- > - Mark > > http://www.lucidimagination.com > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746640#action_12746640 ] Simon Willnauer commented on LUCENE-1845: - Weird! - I changed the URL to http://foo.bar and ant.test succeeds with the expected message. I guess you changed the get url in bdb-je/build.xml but this file (the je.jar) is not the cause of this issue unless I got something wrong. I thought this issues is caused by the fact that http://downloads.osafoundation.org/db/db-${db.version}.jar is not available on a regular basis. Thats why I did not patch this sub-module. simon > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746642#action_12746642 ] Michael McCandless commented on LUCENE-1845: Aha, you're right! Sorry about the confusion. OK so this is good to go. Can you commit? > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Finishing Lucene 2.9
I'm still +1 on calling this 3.0 as I was before when you mentioned it. Its a wakeup call that the upgrade is a bit major in certain areas. In either case - 3.0 is more representative of what this release is IMO. I also think we should allow new features in 3.0 if we release this as 2.9. - Mark Michael McCandless wrote: > Right, this (you can jump to 2.9, fix all deprecations, then easily > move to 3.0 and see no deprecations) is my understanding too, but I > don't see what's particularly useful about that. It does produce a > Lucene release that has zero deprecated APIs (assuming we remove all > of them), but I don't think that's very important. Also, it's extra work > having to do a "no-op, except for deprecations removal and generics > addition" release :) > > Vs say taking our time creating 3.0, letting it have real features, > etc. > > Or, another option would be to simply release 3.0 next. After all, > there are some seriously major changes in this release, compilation > breakage, etc. ... things you'd more expect (of "traditional" > software) in a .0 release. And, then state clearly that all > deprecated APIs in 3.0 will be removed in 3.1. While this is > technically a change to our back-compat policy, it's also just a > number-shifting game since it would just be a rename > (2.9 becomes 3.0; 3.0 becomes 3.1). > > Mike > > On Thu, Aug 20, 2009 at 8:58 AM, Mark Miller wrote: > >> Michael McCandless wrote: >> >>> On Wed, Aug 19, 2009 at 6:21 PM, Mark Miller wrote: >>> >>> >>> I forgot about this oddity. Its so weird. Its like we are doing two releases on top of each other - it just seems confusing. >>> I'm also not wed to the "fast turnaround" (remove deprecations, switch >>> to generics) 3.0 release. >>> >>> We could, instead, take out time doing the 3.0 release, ie let it >>> include new features too. >>> >>> I thought I had read a motivation for the 1.9 -> 2.0 fast turnaround, >>> but I can't remember it nor find it now... >>> >>> Mike >>> >>> - >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> >> I thought the motivation was to provide a clean upgrade path with the >> deprecations - you move to 2.9 and move from all the deprecated methods >> - then you move to 3.0 and your good with no deprecations. I'd guess the >> worry is that new features in 3.0 would add new deprecations and its not >> quite so clean? >> >> Personally, I think thats fine though. New deprecations will come in 3.1 >> anyway. You can still move everything in 2.9, and then move to 3.0 - so >> what if something else is now deprecated? You can move again or wait for >> 3.9 to move ... >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746643#action_12746643 ] Tim Smith commented on LUCENE-1821: --- Lot of new comments to respond to :) will try to cover them all bq. decent comparator (StringOrdValComparator) that operates per segment. Still, the StringOrdValComparator will have to break down and call String.equals() whenever it compars docs in different IndexReaders It also has to do more maintenance in general than would be needed for just a StringOrd comparator that would have a cache across all IndexReaders While the StringOrdValComparator may be faster in 2.9 than string sorting in 2.4, its not as fast as it could be if the cache was created on the IndexSearcher level I looked at the new string sorting stuff last week, and it looks pretty smart to reduce the number of String.equals() calls needed, but this adds extra complexity and will still be reduced to String.equals() calls, which will translate to slower sorting than could be possible bq. one option might be to subclass DirectoryReader The idea of this is to disable per segment searching? I don't actually want to do that. I want to use per segment searching functionality to take advantage of caches on per segment basis where possible, and map docs to the IndexSearcher context when i can't do per segment caching. bq. Could you compute the top-level ords, but then break it up per-segment? I think i see what your getting at here, and i've already thought of this as a potential solution. The cache will always need to be created at the top most level, but it will be pre-broken out into a per-segment cache whose context is the top level IndexSearcher/MultiReader. The biggest problem here is the complexity of actually creating such a cache, which i'm sure will translate to this cache loading slower (hard to say how much slower without implementing) I do plan to try this approach, but i expect this will be at least a week or two out from now. I've currently updated my code for this to work per-segment by adding the docBase when performing the lookup into this cache (which is per-IndexSearcher) I did this using my getIndexReaderBase() funciton i added to my subclass of IndexSearcher during Scorer construction time (I can live with this, however i would like to see getIndexReaderBase() added to IndexSearcher, and the IndexSearcher passed to Weight.scorer() so i don't need to hold onto my IndexSearcher subclass in my Weight implementation) bq. just return the "virtual" per-segment DocIdSet. Thats what i'm doing now. I use the docid base for the IndexReader, along with its maxDoc to have the Scorer represent a virtual slice for just the segment in question The only real problem here is that during Scorer initialization for this i have to call fullDocIdSetIter.advance(docBase) in the Scorer constructor. If advance(int) for the DocIdSet in question is O(N), this adds an extra penalty per segment that did not exist before bq. his isn't a long-term solution, since the order in which Lucene visits the readers isn't in general guaranteed, that's where IndexSearcher.getIndexReaderBase(IndexReader) comes into play. If you call this in your scorer to get the docBase, it doesn't matter what order the segments are searched in (as it'll always return the proper base (in the context of the IndexSearcher that is)) Here's another potential thought (very rough, haven't consulted code to see how feasible this is): what if Similarity had a method called getDocIdBase(IndexReader) then, the searcher implementation could wrap the provided Similarity to provide the proper calculation Similarity is always already passed through this chain of Weight creation and is passed into the Scorer Obviously, a Query Implementation can completely drop the passing of the Searcher's similarity and drop in its own (but this would mean it doesn't care about getting these docid bases) I think this approach would potentially resolve all MultiSearcher difficulties > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed
Re: Finishing Lucene 2.9
just wanted to mention this (i honestly don't have any opinion either way): > Right, this (you can jump to 2.9, fix all deprecations, then easily > move to 3.0 and see no deprecations) is my understanding too, but I > don't see what's particularly useful about that. It does produce a > Lucene release that has zero deprecated APIs (assuming we remove all > of them), but I don't think that's very important. Also, it's extra work > having to do a "no-op, except for deprecations removal and generics > addition" release :) But isn't it also true it could be a bit more than no-op: 1) changing to "better" defaults in cases where back compat prevents this. I think I remember a few of these? 2) bugfixes found after release of 2.9 3) performance improvements, not just from #1 but also from removal of back-compat shims (i.e. tokenstream reflection) I am not saying this stuff is really important to users to merit a release, but I don't think it is a no-op either. -- Robert Muir rcm...@gmail.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746644#action_12746644 ] Yonik Seeley commented on LUCENE-1821: -- bq. This is a good point... Yonik, how [in general!] is Solr handling the cutover to per-segment, for faceting? It doesn't. Faceting is not connected to searching in Solr, and is only done at the top level IndexReader. We obviously want to enable per-segment faceting for more NRT in the future - with the expected disadvantage that it will be somewhat slower for some types of facets. I imagine we will keep the top-level faceting as an option because there will be tradeoffs. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746645#action_12746645 ] Yonik Seeley commented on LUCENE-1821: -- bq. I say we push this issue from 2.9 for now. +1 > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 2.9 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746646#action_12746646 ] Simon Willnauer commented on LUCENE-1845: - bq. OK so this is good to go. Can you commit? will do! > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1837: Attachment: LUCENE-1837.patch Okay, very rough patch. No concern for back compat or anything. Added: place holder class {code} public static abstract class SimExplain { abstract float getIdf(); abstract String explain(); } {code} {code} public SimExplain idfExplain(Term term, Searcher searcher) throws IOException { {code} {code} public SimExplain idfExplain(Collection terms, Searcher searcher) throws IOException{code} Removed Searcher from explain method. So I think this is the right path - still a few issues to jump through though, and still some ugliness I've left in. > Remove Searcher from explain and idf/maxDoc info from explain > - > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1837.patch > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Finishing Lucene 2.9
On Sun, Aug 23, 2009 at 7:38 PM, Robert Muir wrote: > just wanted to mention this (i honestly don't have any opinion either way): > >> Right, this (you can jump to 2.9, fix all deprecations, then easily >> move to 3.0 and see no deprecations) is my understanding too, but I >> don't see what's particularly useful about that. It does produce a >> Lucene release that has zero deprecated APIs (assuming we remove all >> of them), but I don't think that's very important. Also, it's extra work >> having to do a "no-op, except for deprecations removal and generics >> addition" release :) > > But isn't it also true it could be a bit more than no-op: > 1) changing to "better" defaults in cases where back compat prevents > this. I think I remember a few of these? > 2) bugfixes found after release of 2.9 > 3) performance improvements, not just from #1 but also from removal of > back-compat shims (i.e. tokenstream reflection) > > I am not saying this stuff is really important to users to merit a > release, but I don't think it is a no-op either. I agree with robert that this is very likely not to be a no-op release. Changing to 1.5 brings in generics and lots of other stuff which could bring improvements. All the concurrent improvements, VarArgs and Utils in classes like Integer (valueOf) etc. I believe that we find may places in the code where existing stuff could be improved with the ability to commit 1.5 code. Moving to 1.5 with 3.0 would be a clean step in my eyes. Having 3.0 with 1.4 back-compat and then 3.1 which get rid of this would confuse users. simon > > -- > Robert Muir > rcm...@gmail.com > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1837) Remove Searcher from explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1837: Lucene Fields: [New, Patch Available] (was: [New]) Summary: Remove Searcher from explain (was: Remove Searcher from explain and idf/maxDoc info from explain) > Remove Searcher from explain > > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1837.patch > > > these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO > - I think they need to be rolled back/out. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1837) Remove Searcher from Weight#explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1837: Description: Explain needs to calculate corpus wide stats in a way that is consistent with MultiSearcher. (was: these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO - I think they need to be rolled back/out.) Summary: Remove Searcher from Weight#explain (was: Remove Searcher from explain) > Remove Searcher from Weight#explain > --- > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1837.patch > > > Explain needs to calculate corpus wide stats in a way that is consistent with > MultiSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Finishing Lucene 2.9
Simon Willnauer wrote: > > Having 3.0 > with 1.4 back-compat and then 3.1 which get rid of this would confuse > users. > > simon > > If that was really a concern (and we decided to jump to 3.0), we could just say this 3.0 release requires Java 1.5 - 3.0 and beyond can still be considered Java 1.5. Even though 3.0 itself still happens to run on Java 1.4. We are not going to convert *everything* to Java 1.5 when we move to it on the first release. We also don't have to convert anything to say we now require it. Personally, I wouldn't be too worried about it either way. Following Changes correctly and with a solid understanding is 100x times more difficult and confusing. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1821: Fix Version/s: (was: 2.9) I'm going to push it out for now. Of course, feel free to argue for its re inclusion. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746662#action_12746662 ] Tim Smith commented on LUCENE-1821: --- can i at least argue for it being tagged for 3.0 or 3.1 (just so it gets looked at again prior to the next releases) I have workarounds for 2.9, so i'm ok with it not getting in then (just want to make sure my use cases won't be made impossible in future releases) > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1845: Attachment: LUCENE-1845.txt Mike, I attached a new patch. The old one had some problems with the sanity check as the check needs the jar though. This one will work for unit-tests but it will fail if ant tries to run compile-core during a build,jar, release etc. how should we handle if the jar can not be obtained? I would rather say the build must fail asif we du a release build the jar will not be included. Would it be an option to put the jar to some other location maybe on a commiter page on people.apache.org?! simon > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt, LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Lucene 3.0 and Java 5 (was Re: Finishing Lucene 2.9)
On Aug 23, 2009, at 2:06 PM, Simon Willnauer wrote: On Sun, Aug 23, 2009 at 7:38 PM, Robert Muir wrote: just wanted to mention this (i honestly don't have any opinion either way): Right, this (you can jump to 2.9, fix all deprecations, then easily move to 3.0 and see no deprecations) is my understanding too, but I don't see what's particularly useful about that. It does produce a Lucene release that has zero deprecated APIs (assuming we remove all of them), but I don't think that's very important. Also, it's extra work having to do a "no-op, except for deprecations removal and generics addition" release :) But isn't it also true it could be a bit more than no-op: 1) changing to "better" defaults in cases where back compat prevents this. I think I remember a few of these? 2) bugfixes found after release of 2.9 3) performance improvements, not just from #1 but also from removal of back-compat shims (i.e. tokenstream reflection) I am not saying this stuff is really important to users to merit a release, but I don't think it is a no-op either. I agree with robert that this is very likely not to be a no-op release. Changing to 1.5 brings in generics and lots of other stuff which could bring improvements. All the concurrent improvements, VarArgs and Utils in classes like Integer (valueOf) etc. I believe that we find may places in the code where existing stuff could be improved with the ability to commit 1.5 code. Moving to 1.5 with 3.0 would be a clean step in my eyes. Having 3.0 with 1.4 back-compat and then 3.1 which get rid of this would confuse users. My two cents. I think the contract of the 3.0 release is that it is a drop in replacement for the 2.9 release but requires Java 1.5. I expect to compile against Lucene 2.9 using Java 1.4, removing deprecations. And then go to Lucene 3.0 changing the compiler to Java 1.5 but making no code changes. To that end, any introduction of Java 1.5 into the end-user/non-expert/ non-experimental/non-contrib API needs to work with existing code as is. It may require the user to compile with lax permissions using Java 1.5 and run with Java 1.5. Requiring Java 1.5 can be as easy as using a Java 1.5 feature internally, in the expert or experimental APIs, and classes that are not part of the backward compatibility contract (e.g. utility classes). I don't think there should be any effort to maintain Java 1.4 compatibility, but I also think changes should be made only where it makes sense, giving a clear advantage (performance, maintainability, ). If that results in 1.4 compatibility it is a temporary benefit not guaranteed during the 3.x series. I agree with previous threads that there is both a blessing and a curse with Lucene's backward compatibility release policy. My biggest gripe is the evolution toward bad class names. I would like to see a 4.0 release dedicated to fixing the name/api problems and making the API of Lucene be what it should have been for a 3.0 release. I'd also suggest that repackaging, suggested in a prior thread, be tackled also. This could follow a 3.0 release quickly. -- DM Smith - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests
[ https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1845: Attachment: LUCENE-1845.txt this time with ASF licence grant > if the build fails to download JARs for contrib/db, just skip its tests > --- > > Key: LUCENE-1845 > URL: https://issues.apache.org/jira/browse/LUCENE-1845 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Simon Willnauer >Priority: Minor > Attachments: LUCENE-1845.txt, LUCENE-1845.txt, LUCENE-1845.txt > > > Every so often our nightly build fails because contrib/db is unable to > download the necessary BDB JARs from http://downloads.osafoundation.org. I > think in such cases we should simply skip contrib/db's tests, if it's the > nightly build that's running, since it's a false positive failure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1821: Affects Version/s: (was: 2.9) 3.1 Yeah, no problem - tag whatever you'd like - I only went to nothing because it was the easiest default move. With the current plan (subject to change), the earliest it could be considered again is 3.1, so I'll move there. > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 3.1 >Reporter: Tim Smith > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"
[ https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1821: Affects Version/s: (was: 3.1) 2.9 Fix Version/s: 3.1 whoops - try the right thing this time > Weight.scorer() not passed doc offset for "sub reader" > -- > > Key: LUCENE-1821 > URL: https://issues.apache.org/jira/browse/LUCENE-1821 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Affects Versions: 2.9 >Reporter: Tim Smith > Fix For: 3.1 > > Attachments: LUCENE-1821.patch > > > Now that searching is done on a per segment basis, there is no way for a > Scorer to know the "actual" doc id for the document's it matches (only the > relative doc offset into the segment) > If using caches in your scorer that are based on the "entire" index (all > segments), there is now no way to index into them properly from inside a > Scorer because the scorer is not passed the needed offset to calculate the > "real" docid > suggest having Weight.scorer() method also take a integer for the doc offset > Abstract Weight class should have a constructor that takes this offset as > well as a method to get the offset > All Weights that have "sub" weights must pass this offset down to created > "sub" weights > Details on workaround: > In order to work around this, you must do the following: > * Subclass IndexSearcher > * Add "int getIndexReaderBase(IndexReader)" method to your subclass > * during Weight creation, the Weight must hold onto a reference to the passed > in Searcher (casted to your sub class) > * during Scorer creation, the Scorer must be passed the result of > YourSearcher.getIndexReaderBase(reader) > * Scorer can now rebase any collected docids using this offset > Example implementation of getIndexReaderBase(): > {code} > // NOTE: more efficient implementation can be done if you cache the result if > gatherSubReaders in your constructor > public int getIndexReaderBase(IndexReader reader) { > if (reader == getReader()) { > return 0; > } else { > List readers = new ArrayList(); > gatherSubReaders(readers); > Iterator iter = readers.iterator(); > int maxDoc = 0; > while (iter.hasNext()) { > IndexReader r = (IndexReader)iter.next(); > if (r == reader) { > return maxDoc; > } > maxDoc += r.maxDoc(); > } > } > return -1; // reader not in searcher > } > {code} > Notes: > * This workaround makes it so you cannot serialize your custom Weight > implementation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1847) PhraseQuery uses IndexReader specific docFreqs in its explain
PhraseQuery uses IndexReader specific docFreqs in its explain - Key: LUCENE-1847 URL: https://issues.apache.org/jira/browse/LUCENE-1847 Project: Lucene - Java Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 2.9 As mentioned by Mike McCandless in LUCENE-1837. Always been a bug with MultiSearcher, but per segment search makes it worse. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1847) PhraseQuery/TermQuery use IndexReader specific stats in their explains
[ https://issues.apache.org/jira/browse/LUCENE-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1847: Description: PhraseQuery uses IndexReader in explainfor top level stats - as mentioned by Mike McCandless in LUCENE-1837. TermQuery uses IndexReader in explain for top level stats Always been a bug with MultiSearcher, but per segment search makes it worse. was: As mentioned by Mike McCandless in LUCENE-1837. Always been a bug with MultiSearcher, but per segment search makes it worse. Summary: PhraseQuery/TermQuery use IndexReader specific stats in their explains (was: PhraseQuery uses IndexReader specific docFreqs in its explain) Okay - I'm going to use the other issue just to revert the Searcher - more of a task. This issue can then be used to track the new work for this bug here. > PhraseQuery/TermQuery use IndexReader specific stats in their explains > -- > > Key: LUCENE-1847 > URL: https://issues.apache.org/jira/browse/LUCENE-1847 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller >Priority: Minor > Fix For: 2.9 > > > PhraseQuery uses IndexReader in explainfor top level stats - as mentioned by > Mike McCandless in LUCENE-1837. > TermQuery uses IndexReader in explain for top level stats > Always been a bug with MultiSearcher, but per segment search makes it worse. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1837) Remove Searcher from Weight#explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746671#action_12746671 ] Mark Miller commented on LUCENE-1837: - I'm just going to revert the Searcher here - a fix for the bugs can be tracked in LUCENE-1847 > Remove Searcher from Weight#explain > --- > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1837.patch > > > Explain needs to calculate corpus wide stats in a way that is consistent with > MultiSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1837) Remove Searcher from Weight#explain
[ https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-1837: Attachment: LUCENE-1837.patch > Remove Searcher from Weight#explain > --- > > Key: LUCENE-1837 > URL: https://issues.apache.org/jira/browse/LUCENE-1837 > Project: Lucene - Java > Issue Type: Bug >Reporter: Mark Miller >Assignee: Mark Miller > Fix For: 2.9 > > Attachments: LUCENE-1837.patch, LUCENE-1837.patch > > > Explain needs to calculate corpus wide stats in a way that is consistent with > MultiSearcher. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1843: -- Attachment: LUCENE-1846.patch Patch that makes all contrib/analyzer tests that work with TokenStreams subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility methods to check TokenStreams using arrays of strings/ints. This patch may still include some unused imports, had no time to check this manually (I am the person, that codes with Notepad...) > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch, LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1843: -- Attachment: (was: LUCENE-1846.patch) > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch, LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1843: -- Attachment: LUCENE-1843.patch Now the right file. Will commit tomorrow. > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch, LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746694#action_12746694 ] Uwe Schindler edited comment on LUCENE-1843 at 8/23/09 4:35 PM: Patch that makes all contrib/analyzer tests that work with TokenStreams subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility methods to check TokenStreams using arrays of strings/ints. The patch also contains a better version of SingleTokenTokenStream, using the Token.copyTo() function and a Token/TokenWrapper instance as attribute implementation. This patch may still include some unused imports, had no time to check this manually (I am the person, that codes with Notepad...) was (Author: thetaphi): Patch that makes all contrib/analyzer tests that work with TokenStreams subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility methods to check TokenStreams using arrays of strings/ints. This patch may still include some unused imports, had no time to check this manually (I am the person, that codes with Notepad...) > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch, LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()
[ https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1843: -- Attachment: LUCENE-1843.patch - Small updates - forget conversion of two filters in contrib/memory Hope this is the last patch. > Convert some tests to new TokenStream API, better support of cross-impl > AttributeImpl.copyTo() > -- > > Key: LUCENE-1843 > URL: https://issues.apache.org/jira/browse/LUCENE-1843 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 2.9 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1843.patch, LUCENE-1843.patch, LUCENE-1843.patch > > > This patch converts some remaining tests to the new TokenStream API and > non-deprecated classes. > This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to > also support copying e.g. TermAttributeImpl into Token. The target impl must > only support all interfaces but must not be of the same type. Token and > TokenWrapper use optimized coping without casting to 6 interfaces where > possible. > Maybe the special tokenizers in contrib (shingle matrix and so on using > tokens to cache may be enhanced by that). Also Yonik's request for optimized > copying of states between incompatible AttributeSources may be enhanced by > that (possibly a new issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Build failed in Hudson: Lucene-trunk #927
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/927/changes Changes: [uschindler] Fix small initialization bug in TermAttributeImpl.copyTo() [uschindler] Fix small initialization bug in Token.copyTo() [uschindler] LUCENE-1825: Another one :( [uschindler] LUCENE-1825: Additional incorrect getAttribute usage [rmuir] LUCENE-1826: the new tokenizer constructors should not allow deprecated charsets [uschindler] Cleanup on tearDown to really reset the TokenStream API to the default [uschindler] Change also the default LuceneTestCase to override runBare() instead of runTest(). This enables tests, to also monitor failures in random during setUp() and tearDown(). [buschmi] LUCENE-1826: Add constructors that take AttributeSource and AttributeFactory to all Tokenizer implementations. [markrmiller] using entry set is faster than looping on key set when you use map.get(key) in loop -- [...truncated 3983 lines...] clover: compile-core: jar-core: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/common/lucene-analyzers-2.9-SNAPSHOT.jar default: smartcn: [echo] Building smartcn... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: compile-core: jar-core: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/smartcn/lucene-smartcn-2.9-SNAPSHOT.jar default: default: javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: common.compile-core: compile-core: compile: check-files: init: clover.setup: clover.info: clover: compile-core: common.compile-test: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test [javac] Compiling 12 source files to http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [copy] Copying 2 files to http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test build-artifacts-and-tests: [echo] Building collation... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: compile-misc: [echo] Building misc... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: compile-core: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java [javac] Compiling 17 source files to http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile: init: clover.setup: clover.info: clover: compile-core: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java [javac] Compiling 4 source files to http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java jar-core: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/lucene-collation-2.9-SNAPSHOT.jar jar: compile-test: [echo] Building collation... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: compile-misc: [echo] Building misc... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: init: clover.setup: clover.info: clover: compile-core: compile: init: clover.setup: clover.info: clover: compile-core: common.compile-test: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/test [javac] Compiling 5 source files to http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/test [javac] Note: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/contrib/collation/src/test/org/apache/lucene/collation/CollationTestBase.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. build-artifacts-and-tests: bdb: [echo] Building bdb... javacc-uptodate-check: javacc-notice: jflex-uptodate-check: jflex-notice: common.init: build-lucene: build-lucene-tests: contrib-build.init: get-db-jar: [mkdir] Crea
[jira] Commented: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*
[ https://issues.apache.org/jira/browse/LUCENE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746719#action_12746719 ] Hoss Man commented on LUCENE-1798: -- i haven't looked at the patch, but i don't think you need two calls to the sanity checker. Why not just a single call after the val has been created and log if any of the Insanity objects contain the new val? > FieldCacheSanityChecker called directly by FieldCache.get* > -- > > Key: LUCENE-1798 > URL: https://issues.apache.org/jira/browse/LUCENE-1798 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Hoss Man >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1798.patch > > > As suggested by McCandless in LUCENE-1749, we can make FieldCacheImpl a > client of the FieldCacheSanityChecker and have it sanity check itself each > time it creates a new cache entry, and log a warning if it thinks there is a > problem. (although we'd probably only want to do this if the caller has set > some sort of infoStream/warningStream type property on the FieldCache object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org