[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1460: -- Attachment: lucene-1460.patch I converted some more contribs... > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1693) AttributeSource/TokenStream API improvements
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1693: -- Attachment: LUCENE-1693-TokenizerAttrFactory.patch This is a small improvement, related to Grant's comments: The TokenStream ctor can have a AttributeFactory, so you can create a subclass of TokenStream that uses a specific AttributeFacory (e.g. using Token instances). Filters do not need this (as they use the factory of the input stream). The factory must therefore be set on the root stream. This is normally a subclass of Tokenizer. The problem: Tokenizer does not have ctors for AttributeFacory, so you are not able to create any Tokenizer using a custom factory, e.g. for using Token as impl. I will commit this patch shortly. > AttributeSource/TokenStream API improvements > > > Key: LUCENE-1693 > URL: https://issues.apache.org/jira/browse/LUCENE-1693 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1693-TokenizerAttrFactory.patch, > lucene-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > lucene-1693.patch, LUCENE-1693.patch, lucene-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, PerfTest3.java, > TestAPIBackwardsCompatibility.java, TestCompatibility.java, > TestCompatibility.java, TestCompatibility.java, TestCompatibility.java > > > This patch makes the following improvements to AttributeSource and > TokenStream/Filter: > - introduces interfaces for all Attributes. The corresponding > implementations have the postfix 'Impl', e.g. TermAttribute and > TermAttributeImpl. AttributeSource now has a factory for creating > the Attribute instances; the default implementation looks for > implementing classes with the postfix 'Impl'. Token now implements > all 6 TokenAttribute interfaces. > - new method added to AttributeSource: > addAttributeImpl(AttributeImpl). Using reflection it walks up in the > class hierarchy of the passed in object and finds all interfaces > that the class or superclasses implement and that extend the > Attribute interface. It then adds the interface->instance mappings > to the attribute map for each of the found interfaces. > - removes the set/getUseNewAPI() methods (including the standard > ones). Instead it is now enough to only implement the new API, > if one old TokenStream implements still the old API (next()/next(Token)), > it is wrapped automatically. The delegation path is determined via > reflection (the patch determines, which of the three methods was > overridden). > - Token is no longer deprecated, instead it implements all 6 standard > token interfaces (see above). The wrapper for next() and next(Token) > uses this, to automatically map all attribute interfaces to one > TokenWrapper instance (implementing all 6 interfaces), that contains > a Token instance. next() and next(Token) exchange the inner Token > instance as needed. For the new incrementToken(), only one > TokenWrapper instance is visible, delegating to the currect reusable > Token. This API also preserves custom Token subclasses, that maybe > created by very special token streams (see example in Backwards-Test). > - AttributeImpl now has a default implementation of toString that uses > reflection to print out the values of the attributes in a default > formatting. This makes it a bit easier to implement AttributeImpl, > because toString() was declared abstract before. > - Cloning is now done much more efficiently in > captureState. The method figures out which unique AttributeImpl > instances are contained as values in the attributes map, because > those are the ones that need to be cloned. It creates a single > linked list that supports deep cloning (in the inner class > AttributeSource.State). AttributeSource keeps track of when this > state changes, i.e. whenever new attributes are added to the > AttributeSource. Only in that case will captureState recompute the > state, otherwise it will simply clone the precomputed state and > return the clone. restoreState(AttributeSource.State) walks the > linked list and uses the copyTo() method of AttributeImpl to copy > all values over into the attribute that the source stream > (e.g. SinkTokenizer) uses. > - Tee- and SinkTokenizer were deprecated, because they use > Token instances for caching. This is not compatible to the new API > using AttributeSource.State objects. You c
RE: Build failed in Hudson: Lucene-trunk #899
The problem is not the TokenStream API, it is 1644. The tests pass with our patch from yesterday. Maybe the problem is somehow the auto rewrite method. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael Busch [mailto:busch...@gmail.com] > Sent: Saturday, July 25, 2009 7:11 AM > To: java-dev@lucene.apache.org > Subject: Re: Build failed in Hudson: Lucene-trunk #899 > > I think 1644 caused this: > SpanScorer.init() -> WeightedSpanTermExtractor.extract(Query, Map) calls > extract (line 151) > Then MultiTermQuery.rewrite() returns a ConstantScoreQuery. > Then extract(Query, Map) is called again with the ConstantScoreQuery, > but it does not contain an if clause that will do anything with that > type of a query. Hence, no terms are extracted. > > I haven't followed 1644, so I'm not sure what exactly the problem is. > > Michael > > > On 7/24/09 7:46 PM, Apache Hudson Server wrote: > > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/899/changes > > > > Changes: > > > > [mikemccand] LUCENE-1644: enable different rewrite methods for > MultiTermQuery > > > > [buschmi] LUCENE-1693: Various improvements to the new TokenStream API. > > > > [otis] - Typo > > > > -- > > [...truncated 13417 lines...] > > init: > > > > test: > > [echo] Building swing... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > compile-test: > > [echo] Building swing... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > clover.setup: > > > > clover.info: > > > > clover: > > > > compile-core: > > > > common.compile-test: > > > > common.test: > > [mkdir] Created dir: > http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/ws/trunk/build/contrib/swing/test > > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList > > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.584 > sec > > [junit] > > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable > > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.567 > sec > > [junit] > > [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList > > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.63 > sec > > [junit] > > [junit] Testsuite: > org.apache.lucene.swing.models.TestSearchingTable > > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.64 > sec > > [junit] > > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList > > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.739 > sec > > [junit] > > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable > > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.934 > sec > > [junit] > > [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag > > [echo] Building wikipedia... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > test: > > [echo] Building wikipedia... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > compile-test: > > [echo] Building wikipedia... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > clover.setup: > > > > clover.info: > > > > clover: > > > > compile-core: > > > > common.compile-test: > > > > common.test: > > [mkdir] Created dir: > http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/ws/trunk/build/contrib/wikipedia/test > > [junit] Testsuite: > org.apache.lucene.wikipedia.analysis.WikipediaTokenizerTest > > [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.373 > sec > > [junit] > > [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/ws/trunk/build/contrib/wikipedia/test/junitfailed.flag > > [echo] Building wordnet... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notice: > > > > common.init: > > > > build-lucene: > > > > build-lucene-tests: > > > > init: > > > > test: > > [echo] Building xml-query-parser... > > > > javacc-uptodate-check: > > > > javacc-notice: > > > > jflex-uptodate-check: > > > > jflex-notic
Re: Build failed in Hudson: Lucene-trunk #899
Urgh... I'll dig. Mike On Sat, Jul 25, 2009 at 5:12 AM, Uwe Schindler wrote: > The problem is not the TokenStream API, it is 1644. The tests pass with our > patch from yesterday. Maybe the problem is somehow the auto rewrite method. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Michael Busch [mailto:busch...@gmail.com] >> Sent: Saturday, July 25, 2009 7:11 AM >> To: java-dev@lucene.apache.org >> Subject: Re: Build failed in Hudson: Lucene-trunk #899 >> >> I think 1644 caused this: >> SpanScorer.init() -> WeightedSpanTermExtractor.extract(Query, Map) calls >> extract (line 151) >> Then MultiTermQuery.rewrite() returns a ConstantScoreQuery. >> Then extract(Query, Map) is called again with the ConstantScoreQuery, >> but it does not contain an if clause that will do anything with that >> type of a query. Hence, no terms are extracted. >> >> I haven't followed 1644, so I'm not sure what exactly the problem is. >> >> Michael >> >> >> On 7/24/09 7:46 PM, Apache Hudson Server wrote: >> > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/899/changes >> > >> > Changes: >> > >> > [mikemccand] LUCENE-1644: enable different rewrite methods for >> MultiTermQuery >> > >> > [buschmi] LUCENE-1693: Various improvements to the new TokenStream API. >> > >> > [otis] - Typo >> > >> > -- >> > [...truncated 13417 lines...] >> > init: >> > >> > test: >> > [echo] Building swing... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jflex-uptodate-check: >> > >> > jflex-notice: >> > >> > common.init: >> > >> > build-lucene: >> > >> > build-lucene-tests: >> > >> > init: >> > >> > compile-test: >> > [echo] Building swing... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jflex-uptodate-check: >> > >> > jflex-notice: >> > >> > common.init: >> > >> > build-lucene: >> > >> > build-lucene-tests: >> > >> > init: >> > >> > clover.setup: >> > >> > clover.info: >> > >> > clover: >> > >> > compile-core: >> > >> > common.compile-test: >> > >> > common.test: >> > [mkdir] Created dir: >> http://hudson.zones.apache.org/hudson/job/Lucene- >> trunk/ws/trunk/build/contrib/swing/test >> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList >> > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.584 >> sec >> > [junit] >> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable >> > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.567 >> sec >> > [junit] >> > [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList >> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.63 >> sec >> > [junit] >> > [junit] Testsuite: >> org.apache.lucene.swing.models.TestSearchingTable >> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.64 >> sec >> > [junit] >> > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList >> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.739 >> sec >> > [junit] >> > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable >> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.934 >> sec >> > [junit] >> > [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene- >> trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag >> > [echo] Building wikipedia... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jflex-uptodate-check: >> > >> > jflex-notice: >> > >> > common.init: >> > >> > build-lucene: >> > >> > build-lucene-tests: >> > >> > init: >> > >> > test: >> > [echo] Building wikipedia... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jflex-uptodate-check: >> > >> > jflex-notice: >> > >> > common.init: >> > >> > build-lucene: >> > >> > build-lucene-tests: >> > >> > init: >> > >> > compile-test: >> > [echo] Building wikipedia... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jflex-uptodate-check: >> > >> > jflex-notice: >> > >> > common.init: >> > >> > build-lucene: >> > >> > build-lucene-tests: >> > >> > init: >> > >> > clover.setup: >> > >> > clover.info: >> > >> > clover: >> > >> > compile-core: >> > >> > common.compile-test: >> > >> > common.test: >> > [mkdir] Created dir: >> http://hudson.zones.apache.org/hudson/job/Lucene- >> trunk/ws/trunk/build/contrib/wikipedia/test >> > [junit] Testsuite: >> org.apache.lucene.wikipedia.analysis.WikipediaTokenizerTest >> > [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.373 >> sec >> > [junit] >> > [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene- >> trunk/ws/trunk/build/contrib/wikipedia/test/junitfailed.flag >> > [echo] Building wordnet... >> > >> > javacc-uptodate-check: >> > >> > javacc-notice: >> > >> > jfl
[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements
[ https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735253#action_12735253 ] Uwe Schindler commented on LUCENE-1693: --- Committed revision: 797727 > AttributeSource/TokenStream API improvements > > > Key: LUCENE-1693 > URL: https://issues.apache.org/jira/browse/LUCENE-1693 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1693-TokenizerAttrFactory.patch, > lucene-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > lucene-1693.patch, LUCENE-1693.patch, lucene-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, > LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, PerfTest3.java, > TestAPIBackwardsCompatibility.java, TestCompatibility.java, > TestCompatibility.java, TestCompatibility.java, TestCompatibility.java > > > This patch makes the following improvements to AttributeSource and > TokenStream/Filter: > - introduces interfaces for all Attributes. The corresponding > implementations have the postfix 'Impl', e.g. TermAttribute and > TermAttributeImpl. AttributeSource now has a factory for creating > the Attribute instances; the default implementation looks for > implementing classes with the postfix 'Impl'. Token now implements > all 6 TokenAttribute interfaces. > - new method added to AttributeSource: > addAttributeImpl(AttributeImpl). Using reflection it walks up in the > class hierarchy of the passed in object and finds all interfaces > that the class or superclasses implement and that extend the > Attribute interface. It then adds the interface->instance mappings > to the attribute map for each of the found interfaces. > - removes the set/getUseNewAPI() methods (including the standard > ones). Instead it is now enough to only implement the new API, > if one old TokenStream implements still the old API (next()/next(Token)), > it is wrapped automatically. The delegation path is determined via > reflection (the patch determines, which of the three methods was > overridden). > - Token is no longer deprecated, instead it implements all 6 standard > token interfaces (see above). The wrapper for next() and next(Token) > uses this, to automatically map all attribute interfaces to one > TokenWrapper instance (implementing all 6 interfaces), that contains > a Token instance. next() and next(Token) exchange the inner Token > instance as needed. For the new incrementToken(), only one > TokenWrapper instance is visible, delegating to the currect reusable > Token. This API also preserves custom Token subclasses, that maybe > created by very special token streams (see example in Backwards-Test). > - AttributeImpl now has a default implementation of toString that uses > reflection to print out the values of the attributes in a default > formatting. This makes it a bit easier to implement AttributeImpl, > because toString() was declared abstract before. > - Cloning is now done much more efficiently in > captureState. The method figures out which unique AttributeImpl > instances are contained as values in the attributes map, because > those are the ones that need to be cloned. It creates a single > linked list that supports deep cloning (in the inner class > AttributeSource.State). AttributeSource keeps track of when this > state changes, i.e. whenever new attributes are added to the > AttributeSource. Only in that case will captureState recompute the > state, otherwise it will simply clone the precomputed state and > return the clone. restoreState(AttributeSource.State) walks the > linked list and uses the copyTo() method of AttributeImpl to copy > all values over into the attribute that the source stream > (e.g. SinkTokenizer) uses. > - Tee- and SinkTokenizer were deprecated, because they use > Token instances for caching. This is not compatible to the new API > using AttributeSource.State objects. You can still use the old > deprecated ones, but new features provided by new Attribute types > may get lost in the chain. A replacement is a new TeeSinkTokenFilter, > which has a factory to create new Sink instances, that have compatible > attributes. Sink instances created by one Tee can also be added to > another Tee, as long as the attribute implementations are compatible > (it is not possible to add a sink from a tee using one Token instance > to a tee using the six separate attribute impls). In this case UOE is thrown. > The cloning performance can be greatl
Re: Build failed in Hudson: Lucene-trunk #899
OK fixed. Sorry about that. Mike On Sat, Jul 25, 2009 at 5:23 AM, Michael McCandless wrote: > Urgh... I'll dig. > > Mike > > On Sat, Jul 25, 2009 at 5:12 AM, Uwe Schindler wrote: >> The problem is not the TokenStream API, it is 1644. The tests pass with our >> patch from yesterday. Maybe the problem is somehow the auto rewrite method. >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -Original Message- >>> From: Michael Busch [mailto:busch...@gmail.com] >>> Sent: Saturday, July 25, 2009 7:11 AM >>> To: java-dev@lucene.apache.org >>> Subject: Re: Build failed in Hudson: Lucene-trunk #899 >>> >>> I think 1644 caused this: >>> SpanScorer.init() -> WeightedSpanTermExtractor.extract(Query, Map) calls >>> extract (line 151) >>> Then MultiTermQuery.rewrite() returns a ConstantScoreQuery. >>> Then extract(Query, Map) is called again with the ConstantScoreQuery, >>> but it does not contain an if clause that will do anything with that >>> type of a query. Hence, no terms are extracted. >>> >>> I haven't followed 1644, so I'm not sure what exactly the problem is. >>> >>> Michael >>> >>> >>> On 7/24/09 7:46 PM, Apache Hudson Server wrote: >>> > See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/899/changes >>> > >>> > Changes: >>> > >>> > [mikemccand] LUCENE-1644: enable different rewrite methods for >>> MultiTermQuery >>> > >>> > [buschmi] LUCENE-1693: Various improvements to the new TokenStream API. >>> > >>> > [otis] - Typo >>> > >>> > -- >>> > [...truncated 13417 lines...] >>> > init: >>> > >>> > test: >>> > [echo] Building swing... >>> > >>> > javacc-uptodate-check: >>> > >>> > javacc-notice: >>> > >>> > jflex-uptodate-check: >>> > >>> > jflex-notice: >>> > >>> > common.init: >>> > >>> > build-lucene: >>> > >>> > build-lucene-tests: >>> > >>> > init: >>> > >>> > compile-test: >>> > [echo] Building swing... >>> > >>> > javacc-uptodate-check: >>> > >>> > javacc-notice: >>> > >>> > jflex-uptodate-check: >>> > >>> > jflex-notice: >>> > >>> > common.init: >>> > >>> > build-lucene: >>> > >>> > build-lucene-tests: >>> > >>> > init: >>> > >>> > clover.setup: >>> > >>> > clover.info: >>> > >>> > clover: >>> > >>> > compile-core: >>> > >>> > common.compile-test: >>> > >>> > common.test: >>> > [mkdir] Created dir: >>> http://hudson.zones.apache.org/hudson/job/Lucene- >>> trunk/ws/trunk/build/contrib/swing/test >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList >>> > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.584 >>> sec >>> > [junit] >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable >>> > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.567 >>> sec >>> > [junit] >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.63 >>> sec >>> > [junit] >>> > [junit] Testsuite: >>> org.apache.lucene.swing.models.TestSearchingTable >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.64 >>> sec >>> > [junit] >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.739 >>> sec >>> > [junit] >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.934 >>> sec >>> > [junit] >>> > [delete] Deleting: http://hudson.zones.apache.org/hudson/job/Lucene- >>> trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag >>> > [echo] Building wikipedia... >>> > >>> > javacc-uptodate-check: >>> > >>> > javacc-notice: >>> > >>> > jflex-uptodate-check: >>> > >>> > jflex-notice: >>> > >>> > common.init: >>> > >>> > build-lucene: >>> > >>> > build-lucene-tests: >>> > >>> > init: >>> > >>> > test: >>> > [echo] Building wikipedia... >>> > >>> > javacc-uptodate-check: >>> > >>> > javacc-notice: >>> > >>> > jflex-uptodate-check: >>> > >>> > jflex-notice: >>> > >>> > common.init: >>> > >>> > build-lucene: >>> > >>> > build-lucene-tests: >>> > >>> > init: >>> > >>> > compile-test: >>> > [echo] Building wikipedia... >>> > >>> > javacc-uptodate-check: >>> > >>> > javacc-notice: >>> > >>> > jflex-uptodate-check: >>> > >>> > jflex-notice: >>> > >>> > common.init: >>> > >>> > build-lucene: >>> > >>> > build-lucene-tests: >>> > >>> > init: >>> > >>> > clover.setup: >>> > >>> > clover.info: >>> > >>> > clover: >>> > >>> > compile-core: >>> > >>> > common.compile-test: >>> > >>> > common.test: >>> > [mkdir] Created dir: >>> http://hudson.zones.apache.org/hudson/job/Lucene- >>> trunk/ws/trunk/build/contrib/wikipedia/test >>> > [junit] Testsuite: >>> org.apache.lucene.wikipedia.analysis.WikipediaTokenizerTest >>> > [junit] Tests run: 5, Failures: 0, Errors:
[jira] Updated: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1460: -- Attachment: lucene-1460.patch More progress... ngram was a bit tricky. But I think it is much more efficiently implemented now. It used to clone every token it returns. Now it only clones the term that it receives from the input stream. Would be good if someone could take a look at the ngram changes... well, the testcases pass. > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, lucene-1460.patch, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1754) Get rid of NonMatchingScorer from BooleanScorer2
[ https://issues.apache.org/jira/browse/LUCENE-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1754: --- Attachment: LUCENE-1754.patch New patch attached -- sync'd to trunk, and, fixed places to also catch when disi.iterator() returns null. > Get rid of NonMatchingScorer from BooleanScorer2 > > > Key: LUCENE-1754 > URL: https://issues.apache.org/jira/browse/LUCENE-1754 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Shai Erera >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1754.patch, LUCENE-1754.patch, LUCENE-1754.patch, > LUCENE-1754.patch > > > Over in LUCENE-1614 Mike has made a comment about removing NonMatchinScorer > from BS2, and return null in BooleanWeight.scorer(). I've checked and this > can be easily done, so I'm going to post a patch shortly. For reference: > https://issues.apache.org/jira/browse/LUCENE-1614?focusedCommentId=12715064&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12715064. > I've marked the issue as 2.9 just because it's small, and kind of related to > all the search enhancements done for 2.9. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735261#action_12735261 ] Michael McCandless commented on LUCENE-1595: My driver here was... updating Lucene in Action to explain all the recent changes to contrib/benchmark, and explaining the tiny differences between these 3 DocMakers was awkward :) They are "nearly" the same. I'll work up a patch. > Split DocMaker into ContentSource and DocMaker > -- > > Key: LUCENE-1595 > URL: https://issues.apache.org/jira/browse/LUCENE-1595 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Shai Erera >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, > LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch > > > This issue proposes some refactoring to the benchmark package. Today, > DocMaker has two roles: collecting documents from a collection and preparing > a Document object. These two should actually be split up to ContentSource and > DocMaker, which will use a ContentSource instance. > ContentSource will implement all the methods of DocMaker, like > getNextDocData, raw size in bytes tracking etc. This can actually fit well w/ > 1591, by having a basic ContentSource that offers input stream services, and > wraps a file (for example) with a bzip or gzip streams etc. > DocMaker will implement the makeDocument methods, reusing DocState etc. > The idea is that collecting the Enwiki documents, for example, should be the > same whether I create documents using DocState, add payloads or index > additional metadata. Same goes for Trec and Reuters collections, as well as > LineDocMaker. > In fact, if one inspects EnwikiDocMaker and LineDocMaker closely, they are > 99% the same and 99% different. Most of their differences lie in the way they > read the data, while most of the similarity lies in the way they create > documents (using DocState). > That led to a somehwat bizzare extension of LineDocMaker by EnwikiDocMaker > (just the reuse of DocState). Also, other DocMakers do not use that DocState > today, something they could have gotten for free with this refactoring > proposed. > So by having a EnwikiContentSource, ReutersContentSource and others (TREC, > Line, Simple), I can write several DocMakers, such as DocStateMaker, > ConfigurableDocMaker (one which accpets all kinds of config options) and > custom DocMakers (payload, facets, sorting), passing to them a ContentSource > instance and reuse the same DocMaking algorithm with many content sources, as > well as the same ContentSource algorithm with many DocMaker implementations. > This will also give us the opportunity to perf test content sources alone > (i.e., compare bzip, gzip and regular input streams), w/o the overhead of > creating a Document object. > I've already done so in my code environment (I extend the benchmark package > for my application's purposes) and I like the flexibility I have. I think > this can be a nice contribution to the benchmark package, which can result in > some code cleanup as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Build failed in Hudson: Lucene-trunk #899
I have some additional small improvements for 1644, I commit shortly. It is mainly the numeric range query tests and the default rewrite method for NRQ. There is also missing a deprecation for MTQ(Term) ctor. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Saturday, July 25, 2009 11:38 AM > To: java-dev@lucene.apache.org > Subject: Re: Build failed in Hudson: Lucene-trunk #899 > > OK fixed. Sorry about that. > > Mike > > On Sat, Jul 25, 2009 at 5:23 AM, Michael > McCandless wrote: > > Urgh... I'll dig. > > > > Mike > > > > On Sat, Jul 25, 2009 at 5:12 AM, Uwe Schindler wrote: > >> The problem is not the TokenStream API, it is 1644. The tests pass with > our > >> patch from yesterday. Maybe the problem is somehow the auto rewrite > method. > >> > >> - > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > >>> -Original Message- > >>> From: Michael Busch [mailto:busch...@gmail.com] > >>> Sent: Saturday, July 25, 2009 7:11 AM > >>> To: java-dev@lucene.apache.org > >>> Subject: Re: Build failed in Hudson: Lucene-trunk #899 > >>> > >>> I think 1644 caused this: > >>> SpanScorer.init() -> WeightedSpanTermExtractor.extract(Query, Map) > calls > >>> extract (line 151) > >>> Then MultiTermQuery.rewrite() returns a ConstantScoreQuery. > >>> Then extract(Query, Map) is called again with the ConstantScoreQuery, > >>> but it does not contain an if clause that will do anything with that > >>> type of a query. Hence, no terms are extracted. > >>> > >>> I haven't followed 1644, so I'm not sure what exactly the problem is. > >>> > >>> Michael > >>> > >>> > >>> On 7/24/09 7:46 PM, Apache Hudson Server wrote: > >>> > See http://hudson.zones.apache.org/hudson/job/Lucene- > trunk/899/changes > >>> > > >>> > Changes: > >>> > > >>> > [mikemccand] LUCENE-1644: enable different rewrite methods for > >>> MultiTermQuery > >>> > > >>> > [buschmi] LUCENE-1693: Various improvements to the new TokenStream > API. > >>> > > >>> > [otis] - Typo > >>> > > >>> > -- > >>> > [...truncated 13417 lines...] > >>> > init: > >>> > > >>> > test: > >>> > [echo] Building swing... > >>> > > >>> > javacc-uptodate-check: > >>> > > >>> > javacc-notice: > >>> > > >>> > jflex-uptodate-check: > >>> > > >>> > jflex-notice: > >>> > > >>> > common.init: > >>> > > >>> > build-lucene: > >>> > > >>> > build-lucene-tests: > >>> > > >>> > init: > >>> > > >>> > compile-test: > >>> > [echo] Building swing... > >>> > > >>> > javacc-uptodate-check: > >>> > > >>> > javacc-notice: > >>> > > >>> > jflex-uptodate-check: > >>> > > >>> > jflex-notice: > >>> > > >>> > common.init: > >>> > > >>> > build-lucene: > >>> > > >>> > build-lucene-tests: > >>> > > >>> > init: > >>> > > >>> > clover.setup: > >>> > > >>> > clover.info: > >>> > > >>> > clover: > >>> > > >>> > compile-core: > >>> > > >>> > common.compile-test: > >>> > > >>> > common.test: > >>> > [mkdir] Created dir: > >>> http://hudson.zones.apache.org/hudson/job/Lucene- > >>> trunk/ws/trunk/build/contrib/swing/test > >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList > >>> > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: > 0.584 > >>> sec > >>> > [junit] > >>> > [junit] Testsuite: > org.apache.lucene.swing.models.TestBasicTable > >>> > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: > 0.567 > >>> sec > >>> > [junit] > >>> > [junit] Testsuite: > org.apache.lucene.swing.models.TestSearchingList > >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: > 0.63 > >>> sec > >>> > [junit] > >>> > [junit] Testsuite: > >>> org.apache.lucene.swing.models.TestSearchingTable > >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: > 0.64 > >>> sec > >>> > [junit] > >>> > [junit] Testsuite: > org.apache.lucene.swing.models.TestUpdatingList > >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: > 0.739 > >>> sec > >>> > [junit] > >>> > [junit] Testsuite: > org.apache.lucene.swing.models.TestUpdatingTable > >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: > 1.934 > >>> sec > >>> > [junit] > >>> > [delete] Deleting: > http://hudson.zones.apache.org/hudson/job/Lucene- > >>> trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag > >>> > [echo] Building wikipedia... > >>> > > >>> > javacc-uptodate-check: > >>> > > >>> > javacc-notice: > >>> > > >>> > jflex-uptodate-check: > >>> > > >>> > jflex-notice: > >>> > > >>> > common.init: > >>> > > >>> > build-lucene: > >>> > > >>> > build-lucene-tests: > >>> > > >>> > init: > >>> > > >>> > test: > >>> > [echo] Building wikipedia... > >>> > > >>> > javacc-uptodate-check: > >>> > > >>> > javacc-noti
[jira] Created: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
Slightly more readable code in TermAttributeImpl - Key: LUCENE-1762 URL: https://issues.apache.org/jira/browse/LUCENE-1762 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Eks Dev Priority: Trivial No big deal. growTermBuffer(int newSize) was using correct, but slightly hard to follow code. the method was returning null as a hint that the current termBuffer has enough space to the upstream code or reallocated buffer. this patch simplifies logic making this method to only reallocate buffer, nothing more. It reduces number of if(null) checks in a few methods and reduces amount of code. all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Priority: Trivial > Attachments: LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735275#action_12735275 ] Uwe Schindler commented on LUCENE-1762: --- As Token is not yet deprecated, I think, this patch should also apply to Token.java? Can you prepare that, too? > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Priority: Trivial > Attachments: LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-1762: - Assignee: Uwe Schindler > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Assignee: Uwe Schindler >Priority: Trivial > Attachments: LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-trunk #899
Thanks Uwe! Mike On Sat, Jul 25, 2009 at 6:45 AM, Uwe Schindler wrote: > I have some additional small improvements for 1644, I commit shortly. It is > mainly the numeric range query tests and the default rewrite method for NRQ. > There is also missing a deprecation for MTQ(Term) ctor. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Saturday, July 25, 2009 11:38 AM >> To: java-dev@lucene.apache.org >> Subject: Re: Build failed in Hudson: Lucene-trunk #899 >> >> OK fixed. Sorry about that. >> >> Mike >> >> On Sat, Jul 25, 2009 at 5:23 AM, Michael >> McCandless wrote: >> > Urgh... I'll dig. >> > >> > Mike >> > >> > On Sat, Jul 25, 2009 at 5:12 AM, Uwe Schindler wrote: >> >> The problem is not the TokenStream API, it is 1644. The tests pass with >> our >> >> patch from yesterday. Maybe the problem is somehow the auto rewrite >> method. >> >> >> >> - >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> >> >>> -Original Message- >> >>> From: Michael Busch [mailto:busch...@gmail.com] >> >>> Sent: Saturday, July 25, 2009 7:11 AM >> >>> To: java-dev@lucene.apache.org >> >>> Subject: Re: Build failed in Hudson: Lucene-trunk #899 >> >>> >> >>> I think 1644 caused this: >> >>> SpanScorer.init() -> WeightedSpanTermExtractor.extract(Query, Map) >> calls >> >>> extract (line 151) >> >>> Then MultiTermQuery.rewrite() returns a ConstantScoreQuery. >> >>> Then extract(Query, Map) is called again with the ConstantScoreQuery, >> >>> but it does not contain an if clause that will do anything with that >> >>> type of a query. Hence, no terms are extracted. >> >>> >> >>> I haven't followed 1644, so I'm not sure what exactly the problem is. >> >>> >> >>> Michael >> >>> >> >>> >> >>> On 7/24/09 7:46 PM, Apache Hudson Server wrote: >> >>> > See http://hudson.zones.apache.org/hudson/job/Lucene- >> trunk/899/changes >> >>> > >> >>> > Changes: >> >>> > >> >>> > [mikemccand] LUCENE-1644: enable different rewrite methods for >> >>> MultiTermQuery >> >>> > >> >>> > [buschmi] LUCENE-1693: Various improvements to the new TokenStream >> API. >> >>> > >> >>> > [otis] - Typo >> >>> > >> >>> > -- >> >>> > [...truncated 13417 lines...] >> >>> > init: >> >>> > >> >>> > test: >> >>> > [echo] Building swing... >> >>> > >> >>> > javacc-uptodate-check: >> >>> > >> >>> > javacc-notice: >> >>> > >> >>> > jflex-uptodate-check: >> >>> > >> >>> > jflex-notice: >> >>> > >> >>> > common.init: >> >>> > >> >>> > build-lucene: >> >>> > >> >>> > build-lucene-tests: >> >>> > >> >>> > init: >> >>> > >> >>> > compile-test: >> >>> > [echo] Building swing... >> >>> > >> >>> > javacc-uptodate-check: >> >>> > >> >>> > javacc-notice: >> >>> > >> >>> > jflex-uptodate-check: >> >>> > >> >>> > jflex-notice: >> >>> > >> >>> > common.init: >> >>> > >> >>> > build-lucene: >> >>> > >> >>> > build-lucene-tests: >> >>> > >> >>> > init: >> >>> > >> >>> > clover.setup: >> >>> > >> >>> > clover.info: >> >>> > >> >>> > clover: >> >>> > >> >>> > compile-core: >> >>> > >> >>> > common.compile-test: >> >>> > >> >>> > common.test: >> >>> > [mkdir] Created dir: >> >>> http://hudson.zones.apache.org/hudson/job/Lucene- >> >>> trunk/ws/trunk/build/contrib/swing/test >> >>> > [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList >> >>> > [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: >> 0.584 >> >>> sec >> >>> > [junit] >> >>> > [junit] Testsuite: >> org.apache.lucene.swing.models.TestBasicTable >> >>> > [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: >> 0.567 >> >>> sec >> >>> > [junit] >> >>> > [junit] Testsuite: >> org.apache.lucene.swing.models.TestSearchingList >> >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: >> 0.63 >> >>> sec >> >>> > [junit] >> >>> > [junit] Testsuite: >> >>> org.apache.lucene.swing.models.TestSearchingTable >> >>> > [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: >> 0.64 >> >>> sec >> >>> > [junit] >> >>> > [junit] Testsuite: >> org.apache.lucene.swing.models.TestUpdatingList >> >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: >> 0.739 >> >>> sec >> >>> > [junit] >> >>> > [junit] Testsuite: >> org.apache.lucene.swing.models.TestUpdatingTable >> >>> > [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: >> 1.934 >> >>> sec >> >>> > [junit] >> >>> > [delete] Deleting: >> http://hudson.zones.apache.org/hudson/job/Lucene- >> >>> trunk/ws/trunk/build/contrib/swing/test/junitfailed.flag >> >>> > [echo] Building wikipedia... >> >>> > >> >>> > javacc-uptodate-check: >> >>> > >> >>> > javacc-notice: >> >>> > >> >>> > jflex-uptodate-check: >> >>> > >> >>> > jflex-no
[jira] Issue Comment Edited: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735275#action_12735275 ] Uwe Schindler edited comment on LUCENE-1762 at 7/25/09 7:01 AM: As Token is not yet deprecated, I think, this patch should also apply to Token.java? Can you prepare that, too? (This is important, because if the backwards-compatibility layer is enabled with setOnlyUseNewAPI(false)), the TermAttributeImpl is never used and a Token instance is used instead - if no tests fail, this may also be the case :-] ) was (Author: thetaphi): As Token is not yet deprecated, I think, this patch should also apply to Token.java? Can you prepare that, too? > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Assignee: Uwe Schindler >Priority: Trivial > Attachments: LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1595: --- Attachment: LUCENE-1595.patch Attached patch that addresses the SortField.AUTO issue, and deprecates Line/EnwikiDocMaker. I do think we could do some speeding up of the fields handling in DocMaker for the reuse case, eg make dedicated members in DocData to hold the known fields (id, body, title, etc.) but I think we can wait on that for now. > Split DocMaker into ContentSource and DocMaker > -- > > Key: LUCENE-1595 > URL: https://issues.apache.org/jira/browse/LUCENE-1595 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Shai Erera >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, > LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch > > > This issue proposes some refactoring to the benchmark package. Today, > DocMaker has two roles: collecting documents from a collection and preparing > a Document object. These two should actually be split up to ContentSource and > DocMaker, which will use a ContentSource instance. > ContentSource will implement all the methods of DocMaker, like > getNextDocData, raw size in bytes tracking etc. This can actually fit well w/ > 1591, by having a basic ContentSource that offers input stream services, and > wraps a file (for example) with a bzip or gzip streams etc. > DocMaker will implement the makeDocument methods, reusing DocState etc. > The idea is that collecting the Enwiki documents, for example, should be the > same whether I create documents using DocState, add payloads or index > additional metadata. Same goes for Trec and Reuters collections, as well as > LineDocMaker. > In fact, if one inspects EnwikiDocMaker and LineDocMaker closely, they are > 99% the same and 99% different. Most of their differences lie in the way they > read the data, while most of the similarity lies in the way they create > documents (using DocState). > That led to a somehwat bizzare extension of LineDocMaker by EnwikiDocMaker > (just the reuse of DocState). Also, other DocMakers do not use that DocState > today, something they could have gotten for free with this refactoring > proposed. > So by having a EnwikiContentSource, ReutersContentSource and others (TREC, > Line, Simple), I can write several DocMakers, such as DocStateMaker, > ConfigurableDocMaker (one which accpets all kinds of config options) and > custom DocMakers (payload, facets, sorting), passing to them a ContentSource > instance and reuse the same DocMaking algorithm with many content sources, as > well as the same ContentSource algorithm with many DocMaker implementations. > This will also give us the opportunity to perf test content sources alone > (i.e., compare bzip, gzip and regular input streams), w/o the overhead of > creating a Document object. > I've already done so in my code environment (I extend the benchmark package > for my application's purposes) and I like the flexibility I have. I think > this can be a nice contribution to the benchmark package, which can result in > some code cleanup as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735283#action_12735283 ] Uwe Schindler commented on LUCENE-1595: --- Just an idea, when redoing the field stuff: maybe we should add the DATE field as a long timestamp (Date.getTime()) using NumericField with default precStep? We could then also do benchmarks on NumericRangeQuery and easily sort by date with type="long". > Split DocMaker into ContentSource and DocMaker > -- > > Key: LUCENE-1595 > URL: https://issues.apache.org/jira/browse/LUCENE-1595 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Shai Erera >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, > LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch > > > This issue proposes some refactoring to the benchmark package. Today, > DocMaker has two roles: collecting documents from a collection and preparing > a Document object. These two should actually be split up to ContentSource and > DocMaker, which will use a ContentSource instance. > ContentSource will implement all the methods of DocMaker, like > getNextDocData, raw size in bytes tracking etc. This can actually fit well w/ > 1591, by having a basic ContentSource that offers input stream services, and > wraps a file (for example) with a bzip or gzip streams etc. > DocMaker will implement the makeDocument methods, reusing DocState etc. > The idea is that collecting the Enwiki documents, for example, should be the > same whether I create documents using DocState, add payloads or index > additional metadata. Same goes for Trec and Reuters collections, as well as > LineDocMaker. > In fact, if one inspects EnwikiDocMaker and LineDocMaker closely, they are > 99% the same and 99% different. Most of their differences lie in the way they > read the data, while most of the similarity lies in the way they create > documents (using DocState). > That led to a somehwat bizzare extension of LineDocMaker by EnwikiDocMaker > (just the reuse of DocState). Also, other DocMakers do not use that DocState > today, something they could have gotten for free with this refactoring > proposed. > So by having a EnwikiContentSource, ReutersContentSource and others (TREC, > Line, Simple), I can write several DocMakers, such as DocStateMaker, > ConfigurableDocMaker (one which accpets all kinds of config options) and > custom DocMakers (payload, facets, sorting), passing to them a ContentSource > instance and reuse the same DocMaking algorithm with many content sources, as > well as the same ContentSource algorithm with many DocMaker implementations. > This will also give us the opportunity to perf test content sources alone > (i.e., compare bzip, gzip and regular input streams), w/o the overhead of > creating a Document object. > I've already done so in my code environment (I extend the benchmark package > for my application's purposes) and I like the flexibility I have. I think > this can be a nice contribution to the benchmark package, which can result in > some code cleanup as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1595) Split DocMaker into ContentSource and DocMaker
[ https://issues.apache.org/jira/browse/LUCENE-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735298#action_12735298 ] Michael McCandless commented on LUCENE-1595: bq. maybe we should add the DATE field as a long timestamp (Date.getTime()) using NumericField with default precStep? +1 > Split DocMaker into ContentSource and DocMaker > -- > > Key: LUCENE-1595 > URL: https://issues.apache.org/jira/browse/LUCENE-1595 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Reporter: Shai Erera >Assignee: Michael McCandless > Fix For: 2.9 > > Attachments: LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, > LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch, LUCENE-1595.patch > > > This issue proposes some refactoring to the benchmark package. Today, > DocMaker has two roles: collecting documents from a collection and preparing > a Document object. These two should actually be split up to ContentSource and > DocMaker, which will use a ContentSource instance. > ContentSource will implement all the methods of DocMaker, like > getNextDocData, raw size in bytes tracking etc. This can actually fit well w/ > 1591, by having a basic ContentSource that offers input stream services, and > wraps a file (for example) with a bzip or gzip streams etc. > DocMaker will implement the makeDocument methods, reusing DocState etc. > The idea is that collecting the Enwiki documents, for example, should be the > same whether I create documents using DocState, add payloads or index > additional metadata. Same goes for Trec and Reuters collections, as well as > LineDocMaker. > In fact, if one inspects EnwikiDocMaker and LineDocMaker closely, they are > 99% the same and 99% different. Most of their differences lie in the way they > read the data, while most of the similarity lies in the way they create > documents (using DocState). > That led to a somehwat bizzare extension of LineDocMaker by EnwikiDocMaker > (just the reuse of DocState). Also, other DocMakers do not use that DocState > today, something they could have gotten for free with this refactoring > proposed. > So by having a EnwikiContentSource, ReutersContentSource and others (TREC, > Line, Simple), I can write several DocMakers, such as DocStateMaker, > ConfigurableDocMaker (one which accpets all kinds of config options) and > custom DocMakers (payload, facets, sorting), passing to them a ContentSource > instance and reuse the same DocMaking algorithm with many content sources, as > well as the same ContentSource algorithm with many DocMaker implementations. > This will also give us the opportunity to perf test content sources alone > (i.e., compare bzip, gzip and regular input streams), w/o the overhead of > creating a Document object. > I've already done so in my code environment (I extend the benchmark package > for my application's purposes) and I like the flexibility I have. I think > this can be a nice contribution to the benchmark package, which can result in > some code cleanup as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735313#action_12735313 ] Michael Busch commented on LUCENE-1460: --- Btw: I'm taking a tokenstream break today... so if anyone feels the sudden urge to convert some of the remaining streams: don't hesitate - it won't conflict with my work, the patch I posted late last night is still current. I'll try to continue tomorrow. > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, lucene-1460.patch, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1460) Change all contrib TokenStreams/Filters to use the new TokenStream API
[ https://issues.apache.org/jira/browse/LUCENE-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735327#action_12735327 ] Robert Muir commented on LUCENE-1460: - Michael, looks like you got a ton done. I'll take a look and see late sunday/monday at what you did with ngram for curiousity at least. if you get a moment maybe someone could review what I did with Thai, I didn't look to hard to see if save/restore state was worse than the previous cloning... thanks for tackling the tougher ones here :) > Change all contrib TokenStreams/Filters to use the new TokenStream API > -- > > Key: LUCENE-1460 > URL: https://issues.apache.org/jira/browse/LUCENE-1460 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 2.9 > > Attachments: lucene-1460.patch, lucene-1460.patch, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_contrib_partial.txt, > LUCENE-1460_contrib_partial.txt, LUCENE-1460_core.txt, LUCENE-1460_partial.txt > > > Now that we have the new TokenStream API (LUCENE-1422) we should change all > contrib modules to use it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch made the changes in Token along the same lines, - had to change one constant in TokenTest as I have changed initial allocation policy of termBuffer to be consistent with Arayutils.getnextSize() if(termBuffer==null) NEW: termBuffer = new char[ArrayUtil.getNextSize(newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize)]; OLD: termBuffer = new char[newSize < MIN_BUFFER_SIZE ? MIN_BUFFER_SIZE : newSize]; not sure if this is better, but looks more consistent to me (buffer size is always determined via getNewSize()) Uwe, setOnlyUseNewAPI(false) does not exist, it was removed with some of the patches lately. It gets automatically detected via reflection? > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Assignee: Uwe Schindler >Priority: Trivial > Attachments: LUCENE-1762.patch, LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1762) Slightly more readable code in TermAttributeImpl
[ https://issues.apache.org/jira/browse/LUCENE-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1762: Attachment: LUCENE-1762.patch - made allocation in initTermBuffer() consistent with ArrayUtil.getNextSize(int) - this is ok not to start with MIN_BUFFER_SIZE, but rather with ArrayUtil.getNextSize(MIN_BUFFER_SIZE)... e.g. if getNextSize gets very sensitive to initial conditions one day... - null-ed termText on switch to termBuffer in resizeTermBuffer (as it was before!) . This was a bug in previous patch > Slightly more readable code in TermAttributeImpl > - > > Key: LUCENE-1762 > URL: https://issues.apache.org/jira/browse/LUCENE-1762 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Eks Dev >Assignee: Uwe Schindler >Priority: Trivial > Attachments: LUCENE-1762.patch, LUCENE-1762.patch, LUCENE-1762.patch > > > No big deal. > growTermBuffer(int newSize) was using correct, but slightly hard to follow > code. > the method was returning null as a hint that the current termBuffer has > enough space to the upstream code or reallocated buffer. > this patch simplifies logic making this method to only reallocate buffer, > nothing more. > It reduces number of if(null) checks in a few methods and reduces amount of > code. > all tests pass. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Hudson build is back to normal: Lucene-trunk #900
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/900/changes - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org