[jira] Resolved: (LUCENE-1826) All Tokenizer implementations should have constructors that take AttributeSource and AttributeFactory

2009-08-23 Thread Michael Busch (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Busch resolved LUCENE-1826.
---

Resolution: Fixed

Committed revision 806942.

> All Tokenizer implementations should have constructors that take 
> AttributeSource and AttributeFactory
> -
>
> Key: LUCENE-1826
> URL: https://issues.apache.org/jira/browse/LUCENE-1826
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Tim Smith
>Assignee: Michael Busch
> Fix For: 2.9
>
> Attachments: lucene-1826.patch
>
>
> I have a TokenStream implementation that joins together multiple sub 
> TokenStreams (i then do additional filtering on top of this, so i can't just 
> have the indexer do the merging)
> in 2.4, this worked fine.
> once one sub stream was exhausted, i just started using the next stream 
> however, in 2.9, this is very difficult, and requires copying Term buffers 
> for every token being aggregated
> however, if all the sub TokenStreams share the same AttributeSource, and my 
> "concat" TokenStream shares the same AttributeSource, this goes back to being 
> very simple (and very efficient)
> So for example, i would like to see the following constructor added to 
> StandardTokenizer:
> {code}
>   public StandardTokenizer(AttributeSource source, Reader input, boolean 
> replaceInvalidAcronym) {
> super(source);
> ...
>   }
> {code}
> would likewise want similar constructors added to all Tokenizer sub classes 
> provided by lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-trunk #926

2009-08-23 Thread Michael McCandless
Looks like this build failed because downloads.osafoundation.org is
down (we download BDB JARs from there, for contrib/db).

This has happened a good number of times now... it'd be great to fix
the contrib/db/build.xml to just skip the tests when this download
fails.  I'll open an issue but I'm not sure how to do this w/ ant.

Mike

On Sat, Aug 22, 2009 at 10:16 PM, Apache Hudson
Server wrote:
> See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/926/changes
>
> Changes:
>
> [gsingers] LUCENE-1841: file format summary info
>
> [markrmiller] regex has been moved from core - package should have been 
> removed from test src
>
> [markrmiller] LUCENE-1827: Make the payload span queries consistent
>
> [markrmiller] more work on Scorer javadoc in package.html
>
> [markrmiller] LUCENE-1839: change explain from abstract to throw 
> UnsupportedOperationException
>
> [rmuir] LUCENE-1834: Remove unused code in SmartChineseAnalyzer hmm pkg
>
> [rmuir] LUCENE-1793: Deprecate custom encoding support in Greek and Russian 
> analyzers
>
> [markrmiller] LUCENE-1838: BoostingNearQuery must implement clone/toString
>
> [uschindler] LUCENE-1843: Convert some tests to new TokenStream API, better 
> support of cross-impl AttributeImpl.copyTo()
>
> [uschindler] LUCENE-1825: Incorrect usage of 
> AttributeSource.addAttribute/getAttribute leads to failures when 
> onlyUseNewAPI=true
>
> --
> [...truncated 3983 lines...]
>
> clover:
>
> compile-core:
>
> jar-core:
>      [jar] Building jar: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/common/lucene-analyzers-2.9-SNAPSHOT.jar
>
> default:
>
> smartcn:
>     [echo] Building smartcn...
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> build-lucene:
>
> build-lucene-tests:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>
> jar-core:
>      [jar] Building jar: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/smartcn/lucene-smartcn-2.9-SNAPSHOT.jar
>
> default:
>
> default:
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> build-lucene:
>
> build-lucene-tests:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> common.compile-core:
>
> compile-core:
>
> compile:
>
> check-files:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>
> common.compile-test:
>    [mkdir] Created dir: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
>    [javac] Compiling 12 source files to 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
>    [javac] Note: Some input files use or override a deprecated API.
>    [javac] Note: Recompile with -Xlint:deprecation for details.
>     [copy] Copying 2 files to 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
>
> build-artifacts-and-tests:
>     [echo] Building collation...
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> compile-misc:
>     [echo] Building misc...
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> build-lucene:
>
> build-lucene-tests:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>    [mkdir] Created dir: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java
>    [javac] Compiling 17 source files to 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java
>    [javac] Note: Some input files use or override a deprecated API.
>    [javac] Note: Recompile with -Xlint:deprecation for details.
>
> compile:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>    [mkdir] Created dir: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java
>    [javac] Compiling 4 source files to 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java
>
> jar-core:
>      [jar] Building jar: 
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/lucene-collation-2.9-SNAPSHOT.jar
>
> jar:
>
> compile-test:
>     [echo] Building collation...
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> compile-misc:
>     [echo] Building misc...
>
> javacc-uptodate-check:
>
> javacc-notice:
>
> jflex-uptodate-check:
>
> jflex-notice:
>
> common.init:
>
> build-lucene:
>
> build-lucene-tests:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>
> compile:
>
> init:
>
> clover.setup:
>
> clover.info:
>
> clover:
>
> compile-core:
>
> common.compil

[jira] Created: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Michael McCandless (JIRA)
if the build fails to download JARs for contrib/db, just skip its tests
---

 Key: LUCENE-1845
 URL: https://issues.apache.org/jira/browse/LUCENE-1845
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Priority: Minor


Every so often our nightly build fails because contrib/db is unable to download 
the necessary BDB JARs from http://downloads.osafoundation.org.  I think in 
such cases we should simply skip contrib/db's tests, if it's the nightly build 
that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1846) More Locale problems in Lucene

2009-08-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746579#action_12746579
 ] 

Uwe Schindler edited comment on LUCENE-1846 at 8/23/09 2:51 AM:


Patch.

The changes in DateTools may affect users with very strange default locales 
that indexed with prior Lucene versions, but this is unlikely a problem, as the 
whole sorting may be broken already.

Should I add a note to CHANGES.txt?

  was (Author: thetaphi):
Patch.

The changes in DateField may affect users with very strange default locales 
that indexed with prior Lucene versions, but this is unlikely a problem, as the 
whole sorting may be broken already.

Should I add a note to CHANGES.txt?
  
> More Locale problems in Lucene
> --
>
> Key: LUCENE-1846
> URL: https://issues.apache.org/jira/browse/LUCENE-1846
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1846.patch
>
>
> This is a followup to LUCENE-1836: I found some more Locale problems in 
> Lucene with Date Formats. Even for simple date formats only consisting of 
> numbers (like ISO dates), you should always give the US locale. Because the 
> dates in DateTools should sort according to String.compare(), it is 
> important, that the decimal digits are western ones. In some strange locales, 
> this may be different. Whenever you want to format dates for internal formats 
> you exspect to behave somehow, you should at least set the locale to US, 
> which uses ASCII. Dates entered by users and displayed to users, should be 
> formatted according to the default or a custom specified locale.
> I also looked for DecimalFormat (especially used for padding numbers), but 
> found no problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1846) More Locale problems in Lucene

2009-08-23 Thread Uwe Schindler (JIRA)
More Locale problems in Lucene
--

 Key: LUCENE-1846
 URL: https://issues.apache.org/jira/browse/LUCENE-1846
 Project: Lucene - Java
  Issue Type: Bug
  Components: Other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 2.9


This is a followup to LUCENE-1836: I found some more Locale problems in Lucene 
with Date Formats. Even for simple date formats only consisting of numbers 
(like ISO dates), you should always give the US locale. Because the dates in 
DateTools should sort according to String.compare(), it is important, that the 
decimal digits are western ones. In some strange locales, this may be 
different. Whenever you want to format dates for internal formats you exspect 
to behave somehow, you should at least set the locale to US, which uses ASCII. 
Dates entered by users and displayed to users, should be formatted according to 
the default or a custom specified locale.
I also looked for DecimalFormat (especially used for padding numbers), but 
found no problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1846) More Locale problems in Lucene

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1846:
--

Attachment: LUCENE-1846.patch

Patch.

The changes in DateField may affect users with very strange default locales 
that indexed with prior Lucene versions, but this is unlikely a problem, as the 
whole sorting may be broken already.

Should I add a note to CHANGES.txt?

> More Locale problems in Lucene
> --
>
> Key: LUCENE-1846
> URL: https://issues.apache.org/jira/browse/LUCENE-1846
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1846.patch
>
>
> This is a followup to LUCENE-1836: I found some more Locale problems in 
> Lucene with Date Formats. Even for simple date formats only consisting of 
> numbers (like ISO dates), you should always give the US locale. Because the 
> dates in DateTools should sort according to String.compare(), it is 
> important, that the decimal digits are western ones. In some strange locales, 
> this may be different. Whenever you want to format dates for internal formats 
> you exspect to behave somehow, you should at least set the locale to US, 
> which uses ASCII. Dates entered by users and displayed to users, should be 
> formatted according to the default or a custom specified locale.
> I also looked for DecimalFormat (especially used for padding numbers), but 
> found no problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1845:


Attachment: LUCENE-1845.txt

I set the property "ignoreerrors" to true on the get task. This should print 
out if there is a problem with the download and continue. The sanity check will 
fail if the jar is not present and unit-test will be skipped.
i guess that should do the job though.

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746600#action_12746600
 ] 

Tim Smith commented on LUCENE-1821:
---

well, you could go the route similar to the 2.4 TokenStream api (next() vs 
next(Token))

have Filter.getDocIdSet(IndexSearcher, IndexReader) call 
Filter.getDocIdSet(IndexReader), and vice versa by default
one method or the other would be required to be overridden

getDocIdSet(IndexReader) would be deprecated (and removed in 3.0)

Since the deprecated method would be removed in 3.0, and since noone would 
probably be depending on these new semantics right away this should work

Also, in general, QueryWrapperFilter performs a bit worse now in 2.9
this is because it creates an IndexSearcher for every query it wraps (which 
results in doing "gatherSubReaders" and creating the offsets anew each time 
getDocIdSet(IndexReader) is called
so, the new method with the IndexSearcher also passed in is much better for 
evaluating these Filters


> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1844) Speed up junit tests

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746605#action_12746605
 ] 

Mark Miller commented on LUCENE-1844:
-

Should also be able to speed up TestBooleanMinShouldMatch somehow. Its nearly a 
minute as well.

In a loop of 1000 random queries, this is called each time:

QueryUtils.check(q1,s);
QueryUtils.check(q2,s);

Take it out and the test is like 2-5 seconds. Must be some way to optimize this 
down without losing coverage.

> Speed up junit tests
> 
>
> Key: LUCENE-1844
> URL: https://issues.apache.org/jira/browse/LUCENE-1844
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Mark Miller
> Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png
>
>
> As Lucene grows, so does the number of JUnit tests. This is obviously a good 
> thing, but it comes with longer and longer test times. Now that we also run 
> back compat tests in a standard test run, this problem is essentially doubled.
> There are some ways this may get better, including running parallel tests. 
> You will need the hardware to fully take advantage, but it should be a nice 
> gain. There is already an issue for this, and Junit 4.6, 4.7 have the 
> beginnings of something we might be able to count on soon. 4.6 was buggy, and 
> 4.7 still doesn't come with nice ant integration. Parallel tests will come 
> though.
> Beyond parallel testing, I think we also need to concentrate on keeping our 
> tests lean. We don't want to sacrifice coverage or quality, but I'm sure 
> there is plenty of fat to skim.
> I've started making a list of some of the longer tests - I think with some 
> work we can make our tests much faster - and then with parallelization, I 
> think we could see some really great gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1844) Speed up junit tests

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746605#action_12746605
 ] 

Mark Miller edited comment on LUCENE-1844 at 8/23/09 7:23 AM:
--

Should also be able to speed up TestBooleanMinShouldMatch somehow. Its nearly a 
minute as well (30s in attached list, but nearly a min on other hardware I 
have).

In a loop of 1000 random queries, this is called each time:

QueryUtils.check(q1,s);
QueryUtils.check(q2,s);

Take it out and the test is like 2-5 seconds. Must be some way to optimize this 
down without losing coverage.

  was (Author: markrmil...@gmail.com):
Should also be able to speed up TestBooleanMinShouldMatch somehow. Its 
nearly a minute as well.

In a loop of 1000 random queries, this is called each time:

QueryUtils.check(q1,s);
QueryUtils.check(q2,s);

Take it out and the test is like 2-5 seconds. Must be some way to optimize this 
down without losing coverage.
  
> Speed up junit tests
> 
>
> Key: LUCENE-1844
> URL: https://issues.apache.org/jira/browse/LUCENE-1844
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Mark Miller
> Attachments: FastCnstScoreQTest.patch, hi_junit_test_runtimes.png
>
>
> As Lucene grows, so does the number of JUnit tests. This is obviously a good 
> thing, but it comes with longer and longer test times. Now that we also run 
> back compat tests in a standard test run, this problem is essentially doubled.
> There are some ways this may get better, including running parallel tests. 
> You will need the hardware to fully take advantage, but it should be a nice 
> gain. There is already an issue for this, and Junit 4.6, 4.7 have the 
> beginnings of something we might be able to count on soon. 4.6 was buggy, and 
> 4.7 still doesn't come with nice ant integration. Parallel tests will come 
> though.
> Beyond parallel testing, I think we also need to concentrate on keeping our 
> tests lean. We don't want to sacrifice coverage or quality, but I'm sure 
> there is plenty of fat to skim.
> I've started making a list of some of the longer tests - I think with some 
> work we can make our tests much faster - and then with parallelization, I 
> think we could see some really great gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746607#action_12746607
 ] 

Mark Miller commented on LUCENE-1821:
-

You want to weigh in again Mike ? You still have the same stance as your last 
comment?

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746608#action_12746608
 ] 

Mark Miller commented on LUCENE-1821:
-

bq. well, you could go the route similar to the 2.4 TokenStream api (next() vs 
next(Token))

thats a tough bunch of code to decide to spread ...

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1846) More Locale problems in Lucene

2009-08-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746611#action_12746611
 ] 

Robert Muir commented on LUCENE-1846:
-

Uwe, thanks for bringing this issue up! 

we still have more work to do. Out of curiosity, i looked to see if the old 
queryparser in core passed under korean locale.
it does not...

{noformat}
setenv ANT_ARGS "-Dargs=-Duser.language=ko -Duser.country=KR"
ant -Dtestcase=TestQueryParser test
{noformat}


> More Locale problems in Lucene
> --
>
> Key: LUCENE-1846
> URL: https://issues.apache.org/jira/browse/LUCENE-1846
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Other
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1846.patch
>
>
> This is a followup to LUCENE-1836: I found some more Locale problems in 
> Lucene with Date Formats. Even for simple date formats only consisting of 
> numbers (like ISO dates), you should always give the US locale. Because the 
> dates in DateTools should sort according to String.compare(), it is 
> important, that the decimal digits are western ones. In some strange locales, 
> this may be different. Whenever you want to format dates for internal formats 
> you exspect to behave somehow, you should at least set the locale to US, 
> which uses ASCII. Dates entered by users and displayed to users, should be 
> formatted according to the default or a custom specified locale.
> I also looked for DecimalFormat (especially used for padding numbers), but 
> found no problems.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*

2009-08-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1798:
---

Attachment: LUCENE-1798.patch

Attached patch.  I added get/setInfoStream to FieldCache, then, in 
FieldCacheImpl.Cache.get, if we hit a cache miss and infoStream is enabled, I 
gather the Insanity[] before  the cache entry is added and after, then print 
out any change involving the entry just added.  It produces this output to the 
infoStream:

{noformat}
[junit] WARNING: new FieldCache insanity created
[junit] Details: VALUEMISMATCH: Multiple distinct value objects for 
org.apache.lucene.index.directoryrea...@da3a1e+thedouble
[junit] 
'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=>[F#7896426
 (size =~ 3.9 KB)
[junit] 
'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',double,org.apache.lucene.search.FieldCache.DEFAULT_DOUBLE_PARSER=>[D#5503831
 (size =~ 7.8 KB)
[junit] 
'org.apache.lucene.index.directoryrea...@da3a1e'=>'theDouble',double,null=>[D#5503831
 (size =~ 7.8 KB)
[junit] 
[junit] 
[junit] Stack:
[junit] 
[junit] java.lang.Throwable
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.printNewInsanity(FieldCacheImpl.java:263)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:228)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:494)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$FloatCache.createValue(FieldCacheImpl.java:509)
[junit] at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:223)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:494)
[junit] at 
org.apache.lucene.search.FieldCacheImpl.getFloats(FieldCacheImpl.java:487)
[junit] at 
org.apache.lucene.search.TestFieldCache.testInfoStream(TestFieldCache.java:70)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at junit.framework.TestCase.runTest(TestCase.java:164)
[junit] at junit.framework.TestCase.runBare(TestCase.java:130)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
[junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
[junit] at junit.framework.TestResult.runProtected(TestResult.java:124)
[junit] at junit.framework.TestResult.run(TestResult.java:109)
[junit] at junit.framework.TestCase.run(TestCase.java:120)
[junit] at junit.framework.TestSuite.runTest(TestSuite.java:230)
[junit] at junit.framework.TestSuite.run(TestSuite.java:225)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
{noformat}

> FieldCacheSanityChecker called directly by FieldCache.get*
> --
>
> Key: LUCENE-1798
> URL: https://issues.apache.org/jira/browse/LUCENE-1798
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1798.patch
>
>
> As suggested by McCandless in LUCENE-1749, we can make FieldCacheImpl a 
> client of the FieldCacheSanityChecker and have it sanity check itself each 
> time it creates a new cache entry, and log a warning if it thinks there is a 
> problem.  (although we'd probably only want to do this if the caller has set 
> some sort of infoStream/warningStream type property on the FieldCache object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746613#action_12746613
 ] 

Tim Smith commented on LUCENE-1821:
---

bq. thats a tough bunch of code to decide to spread ...
at least it'll be able to go away real soon with 3.0

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1836) Flexible QueryParser fails with local different from en_US

2009-08-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746616#action_12746616
 ] 

Robert Muir commented on LUCENE-1836:
-

Adriano, also as I noted in LUCENE-1846, the old queryparser in core has this 
same issue.

So if you are able to figure out an improvement to the javacc grammar to fix 
this, I think we should consider applying it there as well.


> Flexible QueryParser fails with local different from en_US
> --
>
> Key: LUCENE-1836
> URL: https://issues.apache.org/jira/browse/LUCENE-1836
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Luis Alves
> Fix For: 2.9
>
> Attachments: LUCENE-1836.patch, LUCENE-1836.patch, LUCENE-1836.patch
>
>
> I get the following error during the mentioned testcases on my computer, if I 
> use the Locale de_DE (windows 32):
> {code}
> [junit] Testsuite: org.apache.lucene.queryParser.standard.TestQPHelper
> [junit] Tests run: 29, Failures: 1, Errors: 0, Time elapsed: 1,156 sec
> [junit]
> [junit] - Standard Output ---
> [junit] Result: (fieldX:x fieldy:)^2.0
> [junit] -  ---
> [junit] Testcase: 
> testLocalDateFormat(org.apache.lucene.queryParser.standard.TestQPHelper): 
> FAILED
> [junit] expected:<1> but was:<0>
> [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0>
> [junit] at 
> org.apache.lucene.queryParser.standard.TestQPHelper.assertHits(TestQPHelper.java:1148)
> [junit] at 
> org.apache.lucene.queryParser.standard.TestQPHelper.testLocalDateFormat(TestQPHelper.java:1005)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:201)
> [junit]
> [junit]
> [junit] Test org.apache.lucene.queryParser.standard.TestQPHelper FAILED
> [junit] Testsuite: 
> org.apache.lucene.queryParser.standard.TestQueryParserWrapper
> [junit] Tests run: 27, Failures: 1, Errors: 0, Time elapsed: 1,219 sec
> [junit]
> [junit] - Standard Output ---
> [junit] Result: (fieldX:x fieldy:)^2.0
> [junit] -  ---
> [junit] Testcase: 
> testLocalDateFormat(org.apache.lucene.queryParser.standard.TestQueryParserWrapper):
>FAILED
> [junit] expected:<1> but was:<0>
> [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0>
> [junit] at 
> org.apache.lucene.queryParser.standard.TestQueryParserWrapper.assertHits(TestQueryParserWrapper.java:1120)
> [junit] at 
> org.apache.lucene.queryParser.standard.TestQueryParserWrapper.testLocalDateFormat(TestQueryParserWrapper.java:985)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:201)
> [junit]
> [junit]
> [junit] Test 
> org.apache.lucene.queryParser.standard.TestQueryParserWrapper FAILED
> {code}
> With en_US as locale it works.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746617#action_12746617
 ] 

Michael McCandless commented on LUCENE-1821:


bq. You want to weigh in again Mike ? 

I do!  I'm trying desperately to catch up over here :)

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746618#action_12746618
 ] 

Michael McCandless commented on LUCENE-1845:


Hmm -- I tried applying the patch, then changing the download URL to something 
bogus that fails, and then "ant test" hits errors during the "compile-core" 
target.  It seems like we have to somehow skip compile-core if the sanity check 
fails?

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746619#action_12746619
 ] 

Michael McCandless commented on LUCENE-1837:


So Mark this will revert LUCENE-1771?  Ie no longer pass in the top searcher to 
weight.explain?

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746621#action_12746621
 ] 

Mark Miller commented on LUCENE-1837:
-

It won't revert the whole issue. Weight still an abstract class, the sub reader 
with the doc still the reader passed rather than top level reader.

The only revert:

Because TermWeight tried to take index level stats from the reader, we passed 
that searcher (to make the TermWeight explain behavior like it was when we 
passed top level reader) - its the only place its used currently. But thats 
illegal now and it was illegal before.

You cannot count on having access to the entire index through a Searcher - else 
we break MultiSearcher and remote use.

So passing that Searcher is a recipe for illegal abuse. Same with the other 
issue Tim brought up - though if we end up passing an IndexSearcher there with 
all kinds of warnings to abuse at your own peril - I guess we could here.

I'm not sure I like it because we encourage code that doesn't work correctly 
with MultiSearcher. I think if we wan't to go down that road, we should 
probably try to move away from support remote and multisearcher.

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746622#action_12746622
 ] 

Michael McCandless commented on LUCENE-1837:


bq. It won't revert the whole issue.

OK got it.

bq. Because TermWeight tried to take index level stats from the reader, we 
passed that searcher (to make the TermWeight explain behavior like it was when 
we passed top level reader) - its the only place its used currently.

PhraseQuery also prints the [top-level] docFreq for each term in the phrase.

bq. You cannot count on having access to the entire index through a Searcher - 
else we break MultiSearcher and remote use.

I agree, so our fix in LUCENE-1771 doesn't work w/ MultiSearcher.  So we 
definitely need to do something here...

The thing is, it's useful for TermQuery's explain to print out the 
docFreq/maxDoc, right?  (This was the original motivation of LUCENE-1066).  
But, it has to be the top-level numbers, not the single-segment numbers.

Really the Weight should gather & hold all top-level stats it needs on 
construction?  (The MultiSearcher is passed on Weight construction).

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746623#action_12746623
 ] 

Michael McCandless commented on LUCENE-1821:


Tim, one option might be to subclass DirectoryReader (though, it's package 
protected now, and, you'd need to make your own "open" to return your 
subclass), and override getSequentialSubReaders to return null?  Then Lucene 
would treat it as an atomic reader.  Could that work?

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746624#action_12746624
 ] 

Mark Miller commented on LUCENE-1837:
-

bq. Really the Weight should gather & hold all top-level stats it needs on 
construction? (The MultiSearcher is passed on Weight construction).

Ah - good point. I've said it before myself - index level stats should be taken 
from the createWeightSearcher - I just don't integrate thoughts well :)

So that seems like the right thing to do - only thing I don't like is that this 
info has to be calculated by calling each Searchable in the MultiSearcher, and 
then
you likely won't ever use it - explain is generally debug stuff. I don't like 
that.

But I guess, if you want the info, you gotto do what you gotto do ...

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746625#action_12746625
 ] 

Michael McCandless commented on LUCENE-1821:


bq. for string sorting, it makes a big difference - you now have to do a bunch 
of String.equals() calls, where you didn't have to in 2.4 (just used the ord 
index)

We actually went through a number of iterations on this, on the first cutover 
to per-segment collection, and eventually arrived at a decent comparator 
(StringOrdValComparator) that operates per segment.  Have you tested 
performance of this comparator?


> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746628#action_12746628
 ] 

Michael McCandless commented on LUCENE-1837:


bq. I don't like is that this info has to be calculated by calling each 
Searchable in the MultiSearcher, and then you likely won't ever use it - 
explain is generally debug stuff. I don't like that.

But those stats are already being computed (in the default Similarity impl's 
idf).  If we "improved" Similarity.idf so that it returned idf, docFreq and 
maxDoc in one go, then there's no added cost right?

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene SandBox in documentation

2009-08-23 Thread Michael McCandless
+1

Mike

On Thu, Aug 13, 2009 at 4:39 PM, Steven A Rowe wrote:
> Tangent: Now that contrib/CHANGES.txt is getting regular updates, I think it 
> would make sense to generate a Changes.html corresponding to its contents, in 
> the same way that the core CHANGES.txt is transformed.
>
> Looks like this Sandbox/Contrib page would be a good place to host it.
>
> Steve
>
>> -Original Message-
>> From: Mark Miller [mailto:markrmil...@gmail.com]
>> Sent: Thursday, August 13, 2009 4:29 PM
>> To: java-dev@lucene.apache.org
>> Subject: Lucene SandBox in documentation
>>
>> Looks like this page is a bit out of date:
>>
>> http://lucene.apache.org/java/2_4_1/lucene-sandbox/index.html
>>
>> been a while since its been the sandbox too ...
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746629#action_12746629
 ] 

Mark Miller commented on LUCENE-1837:
-

Right - I was just writing to that affect. If we can get that info back too, I 
think we are golden. What I was writting:

It looks like we have to calc twice (not sure about phrasequery - with that it 
looks like we would have to calc a bunch of info the scorer doesn't even use?) 
with termweight.

  We want to grab the info on the TermWeight constructor and store it. That 
info is already calced, but we don't have access to it:
{code}

public PhraseWeight(Searcher searcher)
  throws IOException {
  this.similarity = getSimilarity(searcher);

  idf = similarity.idf(terms, searcher);
  // Similiarity#idf
  // public float idf(Term term, Searcher searcher) throws IOException {
  //   return idf(searcher.docFreq(term), searcher.maxDoc());
  // }
}

 
{code}

  

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746630#action_12746630
 ] 

Yonik Seeley commented on LUCENE-1821:
--

bq. Filter.getDocIdSet(IndexSearcher, IndexReader).

This suggests that one needs an IndexSearcher to get the ids matching a filter.

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746629#action_12746629
 ] 

Mark Miller edited comment on LUCENE-1837 at 8/23/09 9:29 AM:
--

Right - I was just writing to that affect. If we can get that info back too, I 
think we are golden. What I was writting:

It looks like we have to calc twice (not sure about phrasequery - with that it 
looks like we would have to calc a bunch of info the scorer doesn't even use?) 
with termweight.

  We want to grab the info on the TermWeight constructor and store it. That 
info is already calced, but we don't have access to it:
{code}

public PhraseWeight(Searcher searcher)
  throws IOException {
  this.similarity = getSimilarity(searcher);

  idf = similarity.idf(terms, searcher);
  // Similiarity#idf
  // public float idf(Term term, Searcher searcher) throws IOException {
  //   return idf(searcher.docFreq(term), searcher.maxDoc());
  // }
}

 
{code}

  *edit*

bq. not sure about phrasequery - with that it looks like we would have to calc 
a bunch of info the scorer doesn't even use?

Okay, we do use all of that - again the info is just all hidden behind the 
Similarity. 

So we would also want all the  docFreq info from every term in:

public float idf(Collection terms, Searcher searcher) throws IOException {

  was (Author: markrmil...@gmail.com):
Right - I was just writing to that affect. If we can get that info back 
too, I think we are golden. What I was writting:

It looks like we have to calc twice (not sure about phrasequery - with that it 
looks like we would have to calc a bunch of info the scorer doesn't even use?) 
with termweight.

  We want to grab the info on the TermWeight constructor and store it. That 
info is already calced, but we don't have access to it:
{code}

public PhraseWeight(Searcher searcher)
  throws IOException {
  this.similarity = getSimilarity(searcher);

  idf = similarity.idf(terms, searcher);
  // Similiarity#idf
  // public float idf(Term term, Searcher searcher) throws IOException {
  //   return idf(searcher.docFreq(term), searcher.maxDoc());
  // }
}

 
{code}

  
  
> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746631#action_12746631
 ] 

Mark Miller commented on LUCENE-1837:
-

And also ;)

If a Sim didn't do those calculations (and its an impl detail now), how could 
we ask for them back?

If we tie them to the API, impls will be required to do those calcs for explain 
- when they didn't need to before. Prob not a huge deal, but ...

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746632#action_12746632
 ] 

Michael McCandless commented on LUCENE-1837:


bq. If a Sim didn't do those calculations (and its an impl detail now), how 
could we ask for them back?

We could require only that the thing that's returned can explain itself?


> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746633#action_12746633
 ] 

Michael McCandless commented on LUCENE-1821:


bq. one used an int[] ord index (the underlaying cache cannot be made per 
segment)

Could you compute the top-level ords, but then break it up
per-segment?  Ie, create your own map of IndexReader -> offset into
that large ord array?  This would make it "virtually" per-segment, but
allow you to continue computing at the top level.

BTW another option is to simply accumulate your own docBase, by adding
up the maxDoc() every time an IndexReader is passed to your
Weight.scorer().  EG this is what contrib/spatial is now doing.

This isn't a long-term solution, since the order in which Lucene
visits the readers isn't in general guaranteed, but it will work for
2.9 and buy time to figure out how to switch scoring to per-segment.


> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746634#action_12746634
 ] 

Michael McCandless commented on LUCENE-1821:


bq. Using a per-segment cache will cause some significant performance loss when 
performing faceting, as it requires creating the facets for each segment, and 
then merging them (this results in a good deal of extra object overhead/memory 
overhead/more work where faceting on the multi-reader does not see this)

This is a good point... Yonik, how [in general!] is Solr handling the cutover 
to per-segment, for faceting?

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746636#action_12746636
 ] 

Michael McCandless commented on LUCENE-1821:


Net/net, I'm still nervous about pushing down "full context" plus
"context free" searcher/reader deep into Lucene's general searching
(scorer/filter) APIs.  I think these APIs should remain fully
context-free (even IndexSearcher still makes me nervous).

In some sense, Multi/RemoteSearcher keep us honest, in that they force
us to clearly separate out "stuff that has the luxury of full context"
(to be done on construction of Weight) from "the heavy lifting that
must be context free since it may not have access to the top searcher"
(scorer(), getDocIdSet()).


> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746635#action_12746635
 ] 

Michael McCandless commented on LUCENE-1821:


bq. one used a cached DocIdSet created over the top level MultiReader (should 
be able to have a DocIdSet per Segment reader here, but this will take some 
more thinking (source of the matching docids is from a separate index), will 
also need to know which sub docidset to use based on which IndexReader is 
passed to scorer() - shouldn't be any big deal)

I think, similarly, you could continue to create the top-level
DocIdSet, but then make a new DocIdSet that presents one segment's
"slice" out of this top-level DocIdSet.  Then, pre-build the mapping
of IndexReader -> docBase like above, then when scorer() is called in
your custom query, just return the "virtual" per-segment DocIdSet.
Would this work?

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reopened LUCENE-1843:
---


There are some more tests, that fail with onlyUseNewAPI in contrib/analyzers.

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746638#action_12746638
 ] 

Uwe Schindler commented on LUCENE-1843:
---

>From a private mail with Robert Muir:

yes, all of what you mentioned are problems, and testing for
attributes that should be there is good in my opinion too.

I noticed the shingle problem as well, it was strange to test
termAtt.toString() and expect position increments or types to appear
:/

one reason I asked about this, is at some point it would be nice to
try to factor test cases in lucene contrib. currently, they all have
same helper methods such as assertAnalyzesTo and this is silly in my
opinion.

On Sun, Aug 23, 2009 at 12:57 PM, Uwe Schindler wrote:
> There are more problems. The test with getAttribute is good, if you are
> really sure, if the attribute is really available and want assert this. In
> all other cases addAttribute should be used to consume a TokenStream. The
> changed ones were problematic, because they used foreign TokenStreams, do
> not for sure have all these attributes.
>
> I thought, all tests in contrib use LuceneTestCase as superclass, but they
> use the standard junit class. Because of that I did not notice when I put
> setOnlyUseNewAPI(true) into LuceneTestCase.setUp(), that they are run with
> the default false setting.
>
> Other problems in these tests are, that some (shingle tests) use
> TermAttribute.toString() which looks different if the attribute is
> implemented by TermAttributeImpl (newAPI=true) or Token (newAPI=false).

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746639#action_12746639
 ] 

Mark Miller commented on LUCENE-1821:
-

Cool - I don't like it much either.

I say we push this issue from 2.9 for now.

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-1845:
---

Assignee: Simon Willnauer

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-23 Thread Michael McCandless
Right, this (you can jump to 2.9, fix all deprecations, then easily
move to 3.0 and see no deprecations) is my understanding too, but I
don't see what's particularly useful about that.  It does produce a
Lucene release that has zero deprecated APIs (assuming we remove all
of them), but I don't think that's very important.  Also, it's extra work
having to do a "no-op, except for deprecations removal and generics
addition" release :)

Vs say taking our time creating 3.0, letting it have real features,
etc.

Or, another option would be to simply release 3.0 next.  After all,
there are some seriously major changes in this release, compilation
breakage, etc. ... things you'd more expect (of "traditional"
software) in a .0 release.  And, then state clearly that all
deprecated APIs in 3.0 will be removed in 3.1.  While this is
technically a change to our back-compat policy, it's also just a
number-shifting game since it would just be a rename
(2.9 becomes 3.0; 3.0 becomes 3.1).

Mike

On Thu, Aug 20, 2009 at 8:58 AM, Mark Miller wrote:
> Michael McCandless wrote:
>> On Wed, Aug 19, 2009 at 6:21 PM, Mark Miller wrote:
>>
>>
>>> I forgot about this oddity. Its so weird. Its like we are doing two
>>> releases on top of each other - it just seems confusing.
>>>
>>
>> I'm also not wed to the "fast turnaround" (remove deprecations, switch
>> to generics) 3.0 release.
>>
>> We could, instead, take out time doing the 3.0 release, ie let it
>> include new features too.
>>
>> I thought I had read a motivation for the 1.9 -> 2.0 fast turnaround,
>> but I can't remember it nor find it now...
>>
>> Mike
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
> I thought the motivation was to provide a clean upgrade path with the
> deprecations - you move to 2.9 and move from all the deprecated methods
> - then you move to 3.0 and your good with no deprecations. I'd guess the
> worry is that new features in 3.0 would add new deprecations and its not
> quite so clean?
>
> Personally, I think thats fine though. New deprecations will come in 3.1
> anyway. You can still move everything in 2.9, and then move to 3.0 - so
> what if something else is now deprecated? You can move again or wait for
> 3.9 to move ...
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746640#action_12746640
 ] 

Simon Willnauer commented on LUCENE-1845:
-

Weird! - I changed the URL to http://foo.bar and ant.test succeeds with the 
expected message. I guess you changed the get url in bdb-je/build.xml but this 
file (the je.jar) is not the cause of this issue unless I got something wrong. 
I thought this issues is caused by the fact that 
http://downloads.osafoundation.org/db/db-${db.version}.jar is not available on 
a regular basis. Thats why I did not patch this sub-module.

simon

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746642#action_12746642
 ] 

Michael McCandless commented on LUCENE-1845:


Aha, you're right!  Sorry about the confusion.

OK so this is good to go.  Can you commit?

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-23 Thread Mark Miller
I'm still +1 on calling this 3.0 as I was before when you mentioned it.
Its a wakeup call that the upgrade is a bit major in certain areas.

In either case - 3.0 is more representative of what this release is IMO.

I also think we should allow new features in 3.0 if we release this as 2.9.

- Mark


Michael McCandless wrote:
> Right, this (you can jump to 2.9, fix all deprecations, then easily
> move to 3.0 and see no deprecations) is my understanding too, but I
> don't see what's particularly useful about that.  It does produce a
> Lucene release that has zero deprecated APIs (assuming we remove all
> of them), but I don't think that's very important.  Also, it's extra work
> having to do a "no-op, except for deprecations removal and generics
> addition" release :)
>
> Vs say taking our time creating 3.0, letting it have real features,
> etc.
>
> Or, another option would be to simply release 3.0 next.  After all,
> there are some seriously major changes in this release, compilation
> breakage, etc. ... things you'd more expect (of "traditional"
> software) in a .0 release.  And, then state clearly that all
> deprecated APIs in 3.0 will be removed in 3.1.  While this is
> technically a change to our back-compat policy, it's also just a
> number-shifting game since it would just be a rename
> (2.9 becomes 3.0; 3.0 becomes 3.1).
>
> Mike
>
> On Thu, Aug 20, 2009 at 8:58 AM, Mark Miller wrote:
>   
>> Michael McCandless wrote:
>> 
>>> On Wed, Aug 19, 2009 at 6:21 PM, Mark Miller wrote:
>>>
>>>
>>>   
 I forgot about this oddity. Its so weird. Its like we are doing two
 releases on top of each other - it just seems confusing.

 
>>> I'm also not wed to the "fast turnaround" (remove deprecations, switch
>>> to generics) 3.0 release.
>>>
>>> We could, instead, take out time doing the 3.0 release, ie let it
>>> include new features too.
>>>
>>> I thought I had read a motivation for the 1.9 -> 2.0 fast turnaround,
>>> but I can't remember it nor find it now...
>>>
>>> Mike
>>>
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>>
>>>
>>>   
>> I thought the motivation was to provide a clean upgrade path with the
>> deprecations - you move to 2.9 and move from all the deprecated methods
>> - then you move to 3.0 and your good with no deprecations. I'd guess the
>> worry is that new features in 3.0 would add new deprecations and its not
>> quite so clean?
>>
>> Personally, I think thats fine though. New deprecations will come in 3.1
>> anyway. You can still move everything in 2.9, and then move to 3.0 - so
>> what if something else is now deprecated? You can move again or wait for
>> 3.9 to move ...
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>> 
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746643#action_12746643
 ] 

Tim Smith commented on LUCENE-1821:
---

Lot of new comments to respond to :)
will try to cover them all

bq. decent comparator (StringOrdValComparator) that operates per segment.

Still, the StringOrdValComparator will have to break down and call 
String.equals() whenever it compars docs in different IndexReaders
It also has to do more maintenance in general than would be needed for just a 
StringOrd comparator that would have a cache across all IndexReaders
While the StringOrdValComparator may be faster in 2.9 than string sorting in 
2.4, its not as fast as it could be if the cache was created on the 
IndexSearcher level
I looked at the new string sorting stuff last week, and it looks pretty smart 
to reduce the number of String.equals() calls needed, but this adds extra 
complexity and will still be reduced to String.equals() calls, which will 
translate to slower sorting than could be possible

bq. one option might be to subclass DirectoryReader 

The idea of this is to disable per segment searching?
I don't actually want to do that. I want to use per segment searching 
functionality to take advantage of caches on per segment basis where possible, 
and map docs to the IndexSearcher context when i can't do per segment caching.

bq. Could you compute the top-level ords, but then break it up per-segment?

I think i see what your getting at here, and i've already thought of this as a 
potential solution. The cache will always need to be created at the top most 
level, but it will be pre-broken out into a per-segment cache whose context is 
the top level IndexSearcher/MultiReader. The biggest problem here is the 
complexity of actually creating such a cache, which i'm sure will translate to 
this cache loading slower (hard to say how much slower without implementing)
I do plan to try this approach, but i expect this will be at least a week or 
two out from now.

I've currently updated my code for this to work per-segment by adding the 
docBase when performing the lookup into this cache (which is per-IndexSearcher)
I did this using my getIndexReaderBase() funciton i added to my subclass of 
IndexSearcher during Scorer construction time (I can live with this, however i 
would like to see getIndexReaderBase() added to IndexSearcher, and the 
IndexSearcher passed to Weight.scorer() so i don't need to hold onto my 
IndexSearcher subclass in my Weight implementation)

bq. just return the "virtual" per-segment DocIdSet.

Thats what i'm doing now. I use the docid base for the IndexReader, along with 
its maxDoc to have the Scorer represent a virtual slice for just the segment in 
question
The only real problem here is that during Scorer initialization for this i have 
to call fullDocIdSetIter.advance(docBase) in the Scorer constructor. If 
advance(int) for the DocIdSet in question is O(N), this adds an extra penalty 
per segment that did not exist before

bq. his isn't a long-term solution, since the order in which Lucene visits the 
readers isn't in general guaranteed,

that's where IndexSearcher.getIndexReaderBase(IndexReader) comes into play. If 
you call this in your scorer to get the docBase, it doesn't matter what order 
the segments are searched in (as it'll always return the proper base (in the 
context of the IndexSearcher that is))


Here's another potential thought (very rough, haven't consulted code to see how 
feasible this is):
what if Similarity had a method called getDocIdBase(IndexReader)
then, the searcher implementation could wrap the provided Similarity to provide 
the proper calculation
Similarity is always already passed through this chain of Weight creation and 
is passed into the Scorer
Obviously, a Query Implementation can completely drop the passing of the 
Searcher's similarity and drop in its own (but this would mean it doesn't care 
about getting these docid bases)
I think this approach would potentially resolve all MultiSearcher difficulties







> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed

Re: Finishing Lucene 2.9

2009-08-23 Thread Robert Muir
just wanted to mention this (i honestly don't have any opinion either way):

> Right, this (you can jump to 2.9, fix all deprecations, then easily
> move to 3.0 and see no deprecations) is my understanding too, but I
> don't see what's particularly useful about that.  It does produce a
> Lucene release that has zero deprecated APIs (assuming we remove all
> of them), but I don't think that's very important.  Also, it's extra work
> having to do a "no-op, except for deprecations removal and generics
> addition" release :)

But isn't it also true it could be a bit more than no-op:
1) changing to "better" defaults in cases where back compat prevents
this. I think I remember a few of these?
2) bugfixes found after release of 2.9
3) performance improvements, not just from #1 but also from removal of
back-compat shims (i.e. tokenstream reflection)

I am not saying this stuff is really important to users to merit a
release, but I don't think it is a no-op either.

-- 
Robert Muir
rcm...@gmail.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746644#action_12746644
 ] 

Yonik Seeley commented on LUCENE-1821:
--

bq. This is a good point... Yonik, how [in general!] is Solr handling the 
cutover to per-segment, for faceting?

It doesn't.  Faceting is not connected to searching in Solr, and is only done 
at the top level IndexReader.
We obviously want to enable per-segment faceting for more NRT in the future - 
with the expected disadvantage that it will be somewhat slower for some types 
of facets.  I imagine we will keep the top-level faceting as an option because 
there will be tradeoffs.

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746645#action_12746645
 ] 

Yonik Seeley commented on LUCENE-1821:
--

bq. I say we push this issue from 2.9 for now.

+1 


> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 2.9
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746646#action_12746646
 ] 

Simon Willnauer commented on LUCENE-1845:
-

bq. OK so this is good to go. Can you commit?

will do!

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1837) Remove Searcher from explain and idf/maxDoc info from explain

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1837:


Attachment: LUCENE-1837.patch

Okay, very rough patch. No concern for back compat or anything.

Added:

place holder class

{code}
  public static abstract class SimExplain {
abstract float getIdf();
abstract String explain();
  }
{code}

{code} 
  public SimExplain idfExplain(Term term, Searcher searcher) throws IOException 
{
{code}

{code}  public SimExplain idfExplain(Collection terms, Searcher searcher) 
throws IOException{code}

Removed Searcher from explain method.

So I think this is the right path - still a few issues to jump through though, 
and still some ugliness I've left in.

> Remove Searcher from explain and idf/maxDoc info from explain
> -
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1837.patch
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-23 Thread Simon Willnauer
On Sun, Aug 23, 2009 at 7:38 PM, Robert Muir wrote:
> just wanted to mention this (i honestly don't have any opinion either way):
>
>> Right, this (you can jump to 2.9, fix all deprecations, then easily
>> move to 3.0 and see no deprecations) is my understanding too, but I
>> don't see what's particularly useful about that.  It does produce a
>> Lucene release that has zero deprecated APIs (assuming we remove all
>> of them), but I don't think that's very important.  Also, it's extra work
>> having to do a "no-op, except for deprecations removal and generics
>> addition" release :)
>
> But isn't it also true it could be a bit more than no-op:
> 1) changing to "better" defaults in cases where back compat prevents
> this. I think I remember a few of these?
> 2) bugfixes found after release of 2.9
> 3) performance improvements, not just from #1 but also from removal of
> back-compat shims (i.e. tokenstream reflection)
>
> I am not saying this stuff is really important to users to merit a
> release, but I don't think it is a no-op either.

I agree with robert that this is very likely not to be a no-op
release. Changing to 1.5 brings in generics and lots of other stuff
which could bring improvements. All the concurrent improvements,
VarArgs and Utils in classes like Integer (valueOf) etc. I believe
that we find may places in the code where existing stuff could be
improved with the ability to commit 1.5 code.
Moving to 1.5 with 3.0 would be a clean step in my eyes. Having 3.0
with 1.4 back-compat and then 3.1 which get rid of this would confuse
users.

simon
>
> --
> Robert Muir
> rcm...@gmail.com
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1837) Remove Searcher from explain

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1837:


Lucene Fields: [New, Patch Available]  (was: [New])
  Summary: Remove Searcher from explain  (was: Remove Searcher from 
explain and idf/maxDoc info from explain)

> Remove Searcher from explain
> 
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1837.patch
>
>
> these changes (starting with the TermWeight idf/maxDoc info) were illegal IMO 
> - I think they need to be rolled back/out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1837) Remove Searcher from Weight#explain

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1837:


Description: Explain needs to calculate corpus wide stats in a way that is 
consistent with MultiSearcher.  (was: these changes (starting with the 
TermWeight idf/maxDoc info) were illegal IMO - I think they need to be rolled 
back/out.)
Summary: Remove Searcher from Weight#explain  (was: Remove Searcher 
from explain)

> Remove Searcher from Weight#explain
> ---
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1837.patch
>
>
> Explain needs to calculate corpus wide stats in a way that is consistent with 
> MultiSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Finishing Lucene 2.9

2009-08-23 Thread Mark Miller
Simon Willnauer wrote:
>
>  Having 3.0
> with 1.4 back-compat and then 3.1 which get rid of this would confuse
> users.
>
> simon
>   
>
If that was really a concern (and we decided to jump to 3.0), we could
just say this 3.0 release requires Java 1.5 - 3.0 and beyond can still
be considered Java 1.5. Even though 3.0 itself still happens to run on
Java 1.4. We are not going to convert *everything* to Java 1.5 when we
move to it on the first release. We also don't have to convert anything
to say we now require it.

Personally, I wouldn't be too worried about it either way. Following
Changes correctly and with a solid understanding is 100x times more
difficult and confusing.

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1821:


Fix Version/s: (was: 2.9)

I'm going to push it out for now. Of course, feel free to argue for its re 
inclusion.

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Tim Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746662#action_12746662
 ] 

Tim Smith commented on LUCENE-1821:
---

can i at least argue for it being tagged for 3.0 or 3.1 (just so it gets looked 
at again prior to the next releases)

I have workarounds for 2.9, so i'm ok with it not getting in then (just want to 
make sure my use cases won't be made impossible in future releases)

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1845:


Attachment: LUCENE-1845.txt

Mike, I attached a new patch. The old one had some problems with the sanity 
check as the check needs the jar though. 
This one will work for unit-tests but it will fail if ant tries to run 
compile-core during a build,jar, release etc. 

how should we handle if the jar can not be obtained? I would rather say the 
build must fail asif we du a release build the jar will not be included. 
Would it be an option to put the jar to some other location maybe on a commiter 
page on people.apache.org?! 

simon

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt, LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Lucene 3.0 and Java 5 (was Re: Finishing Lucene 2.9)

2009-08-23 Thread DM Smith


On Aug 23, 2009, at 2:06 PM, Simon Willnauer wrote:


On Sun, Aug 23, 2009 at 7:38 PM, Robert Muir wrote:
just wanted to mention this (i honestly don't have any opinion  
either way):



Right, this (you can jump to 2.9, fix all deprecations, then easily
move to 3.0 and see no deprecations) is my understanding too, but I
don't see what's particularly useful about that.  It does produce a
Lucene release that has zero deprecated APIs (assuming we remove all
of them), but I don't think that's very important.  Also, it's  
extra work

having to do a "no-op, except for deprecations removal and generics
addition" release :)


But isn't it also true it could be a bit more than no-op:
1) changing to "better" defaults in cases where back compat prevents
this. I think I remember a few of these?
2) bugfixes found after release of 2.9
3) performance improvements, not just from #1 but also from removal  
of

back-compat shims (i.e. tokenstream reflection)

I am not saying this stuff is really important to users to merit a
release, but I don't think it is a no-op either.


I agree with robert that this is very likely not to be a no-op
release. Changing to 1.5 brings in generics and lots of other stuff
which could bring improvements. All the concurrent improvements,
VarArgs and Utils in classes like Integer (valueOf) etc. I believe
that we find may places in the code where existing stuff could be
improved with the ability to commit 1.5 code.
Moving to 1.5 with 3.0 would be a clean step in my eyes. Having 3.0
with 1.4 back-compat and then 3.1 which get rid of this would confuse
users.


My two cents. I think the contract of the 3.0 release is that it is a  
drop in replacement for the 2.9 release but requires Java 1.5. I  
expect to compile against Lucene 2.9 using Java 1.4, removing  
deprecations. And then go to Lucene 3.0 changing the compiler to Java  
1.5 but making no code changes.


To that end, any introduction of Java 1.5 into the end-user/non-expert/ 
non-experimental/non-contrib API needs to work with existing code as  
is. It may require the user to compile with lax permissions using Java  
1.5 and run with Java 1.5.


Requiring Java 1.5 can be as easy as using a Java 1.5 feature  
internally, in the expert or experimental APIs, and classes that are  
not part of the backward compatibility contract (e.g. utility classes).


I don't think there should be any effort to maintain Java 1.4  
compatibility, but I also think changes should be made only where it  
makes sense, giving a clear advantage (performance,  
maintainability, ). If that results in 1.4 compatibility it is a  
temporary benefit not guaranteed during the 3.x series.


I agree with previous threads that there is both a blessing and a  
curse with Lucene's backward compatibility release policy. My biggest  
gripe is the evolution toward bad class names. I would like to see a  
4.0 release dedicated to fixing the name/api problems and making the  
API of Lucene be what it should have been for a 3.0 release. I'd also  
suggest that repackaging, suggested in a prior thread, be tackled  
also. This could follow a 3.0 release quickly.


-- DM Smith


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1845) if the build fails to download JARs for contrib/db, just skip its tests

2009-08-23 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-1845:


Attachment: LUCENE-1845.txt

this time with ASF licence grant

> if the build fails to download JARs for contrib/db, just skip its tests
> ---
>
> Key: LUCENE-1845
> URL: https://issues.apache.org/jira/browse/LUCENE-1845
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>Priority: Minor
> Attachments: LUCENE-1845.txt, LUCENE-1845.txt, LUCENE-1845.txt
>
>
> Every so often our nightly build fails because contrib/db is unable to 
> download the necessary BDB JARs from http://downloads.osafoundation.org.  I 
> think in such cases we should simply skip contrib/db's tests, if it's the 
> nightly build that's running, since it's a false positive failure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1821:


Affects Version/s: (was: 2.9)
   3.1

Yeah, no problem - tag whatever you'd like - I only went to nothing because it 
was the easiest default move.

With the current plan (subject to change), the earliest it could be considered 
again is 3.1, so I'll move there.

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 3.1
>Reporter: Tim Smith
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1821:


Affects Version/s: (was: 3.1)
   2.9
Fix Version/s: 3.1

whoops - try the right thing this time

> Weight.scorer() not passed doc offset for "sub reader"
> --
>
> Key: LUCENE-1821
> URL: https://issues.apache.org/jira/browse/LUCENE-1821
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.9
>Reporter: Tim Smith
> Fix For: 3.1
>
> Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
> return 0;
>   } else {
> List readers = new ArrayList();
> gatherSubReaders(readers);
> Iterator iter = readers.iterator();
> int maxDoc = 0;
> while (iter.hasNext()) {
>   IndexReader r = (IndexReader)iter.next();
>   if (r == reader) {
> return maxDoc;
>   } 
>   maxDoc += r.maxDoc();
> } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1847) PhraseQuery uses IndexReader specific docFreqs in its explain

2009-08-23 Thread Mark Miller (JIRA)
PhraseQuery uses IndexReader specific docFreqs in its explain
-

 Key: LUCENE-1847
 URL: https://issues.apache.org/jira/browse/LUCENE-1847
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9


As mentioned by Mike McCandless in LUCENE-1837.

Always been a bug with MultiSearcher, but per segment search makes it worse.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1847) PhraseQuery/TermQuery use IndexReader specific stats in their explains

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1847:


Description: 
PhraseQuery uses IndexReader in explainfor top level stats - as mentioned by 
Mike McCandless in LUCENE-1837.
TermQuery uses IndexReader in explain for top level stats

Always been a bug with MultiSearcher, but per segment search makes it worse.



  was:
As mentioned by Mike McCandless in LUCENE-1837.

Always been a bug with MultiSearcher, but per segment search makes it worse.



Summary: PhraseQuery/TermQuery use IndexReader specific stats in their 
explains  (was: PhraseQuery uses IndexReader specific docFreqs in its explain)

Okay - I'm going to use the other issue just to revert the Searcher - more of a 
task.

This issue can then be used to track the new work for this bug here.

> PhraseQuery/TermQuery use IndexReader specific stats in their explains
> --
>
> Key: LUCENE-1847
> URL: https://issues.apache.org/jira/browse/LUCENE-1847
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 2.9
>
>
> PhraseQuery uses IndexReader in explainfor top level stats - as mentioned by 
> Mike McCandless in LUCENE-1837.
> TermQuery uses IndexReader in explain for top level stats
> Always been a bug with MultiSearcher, but per segment search makes it worse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1837) Remove Searcher from Weight#explain

2009-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746671#action_12746671
 ] 

Mark Miller commented on LUCENE-1837:
-

I'm just going to revert the Searcher here - a fix for the bugs can be tracked 
in LUCENE-1847

> Remove Searcher from Weight#explain
> ---
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1837.patch
>
>
> Explain needs to calculate corpus wide stats in a way that is consistent with 
> MultiSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1837) Remove Searcher from Weight#explain

2009-08-23 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1837:


Attachment: LUCENE-1837.patch

> Remove Searcher from Weight#explain
> ---
>
> Key: LUCENE-1837
> URL: https://issues.apache.org/jira/browse/LUCENE-1837
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 2.9
>
> Attachments: LUCENE-1837.patch, LUCENE-1837.patch
>
>
> Explain needs to calculate corpus wide stats in a way that is consistent with 
> MultiSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1843:
--

Attachment: LUCENE-1846.patch

Patch that makes all contrib/analyzer tests that work with TokenStreams 
subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility 
methods to check TokenStreams using arrays of strings/ints.

This patch may still include some unused imports, had no time to check this 
manually (I am the person, that codes with Notepad...)

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch, LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1843:
--

Attachment: (was: LUCENE-1846.patch)

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch, LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1843:
--

Attachment: LUCENE-1843.patch

Now the right file.

Will commit tomorrow.

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch, LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746694#action_12746694
 ] 

Uwe Schindler edited comment on LUCENE-1843 at 8/23/09 4:35 PM:


Patch that makes all contrib/analyzer tests that work with TokenStreams 
subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility 
methods to check TokenStreams using arrays of strings/ints.

The patch also contains a better version of SingleTokenTokenStream, using the 
Token.copyTo() function and a Token/TokenWrapper instance as attribute 
implementation.

This patch may still include some unused imports, had no time to check this 
manually (I am the person, that codes with Notepad...)

  was (Author: thetaphi):
Patch that makes all contrib/analyzer tests that work with TokenStreams 
subclasses of BaseTokenStreamTestCase. This superclass now has a lot of utility 
methods to check TokenStreams using arrays of strings/ints.

This patch may still include some unused imports, had no time to check this 
manually (I am the person, that codes with Notepad...)
  
> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch, LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1843) Convert some tests to new TokenStream API, better support of cross-impl AttributeImpl.copyTo()

2009-08-23 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1843:
--

Attachment: LUCENE-1843.patch

- Small updates
- forget conversion of two filters in contrib/memory

Hope this is the last patch.

> Convert some tests to new TokenStream API, better support of cross-impl 
> AttributeImpl.copyTo()
> --
>
> Key: LUCENE-1843
> URL: https://issues.apache.org/jira/browse/LUCENE-1843
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.9
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 2.9
>
> Attachments: LUCENE-1843.patch, LUCENE-1843.patch, LUCENE-1843.patch
>
>
> This patch converts some remaining tests to the new TokenStream API and 
> non-deprecated classes.
> This patch also enhances AttributeImpl.copyTo() of Token and TokenWrapper to 
> also support copying e.g. TermAttributeImpl into Token. The target impl must 
> only support all interfaces but must not be of the same type. Token and 
> TokenWrapper use optimized coping without casting to 6 interfaces where 
> possible.
> Maybe the special tokenizers in contrib (shingle matrix and so on using 
> tokens to cache may be enhanced by that). Also Yonik's request for optimized 
> copying of states between incompatible AttributeSources may be enhanced by 
> that (possibly a new issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #927

2009-08-23 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/927/changes

Changes:

[uschindler] Fix small initialization bug in TermAttributeImpl.copyTo()

[uschindler] Fix small initialization bug in Token.copyTo()

[uschindler] LUCENE-1825: Another one :(

[uschindler] LUCENE-1825: Additional incorrect getAttribute usage

[rmuir] LUCENE-1826: the new tokenizer constructors should not allow deprecated 
charsets

[uschindler] Cleanup on tearDown to really reset the TokenStream API to the 
default

[uschindler] Change also the default LuceneTestCase to override runBare() 
instead of runTest(). This enables tests, to also monitor failures in random 
during setUp() and tearDown().

[buschmi] LUCENE-1826: Add constructors that take AttributeSource and 
AttributeFactory to all Tokenizer implementations.

[markrmiller] using entry set is faster than looping on key set when you use 
map.get(key) in loop

--
[...truncated 3983 lines...]

clover:

compile-core:

jar-core:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/common/lucene-analyzers-2.9-SNAPSHOT.jar
 

default:

smartcn:
 [echo] Building smartcn...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:

build-lucene-tests:

init:

clover.setup:

clover.info:

clover:

compile-core:

jar-core:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/analyzers/smartcn/lucene-smartcn-2.9-SNAPSHOT.jar
 

default:

default:

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:

build-lucene-tests:

init:

clover.setup:

clover.info:

clover:

common.compile-core:

compile-core:

compile:

check-files:

init:

clover.setup:

clover.info:

clover:

compile-core:

common.compile-test:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
 
[javac] Compiling 12 source files to 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
 
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
 [copy] Copying 2 files to 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/benchmark/classes/test
 

build-artifacts-and-tests:
 [echo] Building collation...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

compile-misc:
 [echo] Building misc...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:

build-lucene-tests:

init:

clover.setup:

clover.info:

clover:

compile-core:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java
 
[javac] Compiling 17 source files to 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/misc/classes/java
 
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile:

init:

clover.setup:

clover.info:

clover:

compile-core:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java
 
[javac] Compiling 4 source files to 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/java
 

jar-core:
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/lucene-collation-2.9-SNAPSHOT.jar
 

jar:

compile-test:
 [echo] Building collation...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

compile-misc:
 [echo] Building misc...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:

build-lucene-tests:

init:

clover.setup:

clover.info:

clover:

compile-core:

compile:

init:

clover.setup:

clover.info:

clover:

compile-core:

common.compile-test:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/test
 
[javac] Compiling 5 source files to 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/contrib/collation/classes/test
 
[javac] Note: 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/contrib/collation/src/test/org/apache/lucene/collation/CollationTestBase.java
  uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

build-artifacts-and-tests:

bdb:
 [echo] Building bdb...

javacc-uptodate-check:

javacc-notice:

jflex-uptodate-check:

jflex-notice:

common.init:

build-lucene:

build-lucene-tests:

contrib-build.init:

get-db-jar:
[mkdir] Crea

[jira] Commented: (LUCENE-1798) FieldCacheSanityChecker called directly by FieldCache.get*

2009-08-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746719#action_12746719
 ] 

Hoss Man commented on LUCENE-1798:
--

i haven't looked at the patch, but i don't think you need two calls to the 
sanity checker. 

Why not just a single call after the val has been created and log if any of the 
Insanity  objects contain the new val?

> FieldCacheSanityChecker called directly by FieldCache.get*
> --
>
> Key: LUCENE-1798
> URL: https://issues.apache.org/jira/browse/LUCENE-1798
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Hoss Man
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: LUCENE-1798.patch
>
>
> As suggested by McCandless in LUCENE-1749, we can make FieldCacheImpl a 
> client of the FieldCacheSanityChecker and have it sanity check itself each 
> time it creates a new cache entry, and log a warning if it thinks there is a 
> problem.  (although we'd probably only want to do this if the caller has set 
> some sort of infoStream/warningStream type property on the FieldCache object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org