[jira] Updated: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-2167: Attachment: LUCENE-2167.patch Updated to trunk. All tests pass. Documentation improved at package and class level. modules/analysis/CHANGES.txt entry included. I think this is ready to commit. Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1395) Integrate Katta
[ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909654#action_12909654 ] jianfeng zheng commented on SOLR-1395: -- You are so nice, Mathias I am using the MultiShard Distributed Search of Solr, and also let Katta chose a node for a shard. I found there is only one proxy object in KattaClient for each Katta node, lock it will solve the problem you post on 18/Aug. but it will lead to each node works as single thread. Integrate Katta --- Key: SOLR-1395 URL: https://issues.apache.org/jira/browse/SOLR-1395 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: Next Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, katta.node.properties, katta.zk.properties, log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, solr-1395-1431.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar Original Estimate: 336h Remaining Estimate: 336h We'll integrate Katta into Solr so that: * Distributed search uses Hadoop RPC * Shard/SolrCore distribution and management * Zookeeper based failover * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2122) How to escape the special character in the Apache solr example between strings
How to escape the special character in the Apache solr example between strings -- Key: SOLR-2122 URL: https://issues.apache.org/jira/browse/SOLR-2122 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 1.4 Environment: Linux Environment Reporter: JAYABAALAN V Priority: Critical Having a special character between the string like Arts Culture.If the user the selecting this value in web gui we need to display the corresponding records from the solr. http://localhost:8983/solr/select/?q=rsprimarysub:Arts\Culturefl=rsprimarysubdebugQuery=true Error Message HTTP ERROR: 400 org.apache.lucene.queryParser.ParseException: Cannot parse 'rsprimarysub:Arts\': Lexical error at line 1, column 19. Encountered: EOF after : RequestURI=/solr/select/ Powered by Jetty:// Do provide inputs for the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2622) Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs)
[ https://issues.apache.org/jira/browse/LUCENE-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909686#action_12909686 ] Simon Willnauer commented on LUCENE-2622: - It seems that we figured out whats going on here. The problem seem to be the optimization done in LUCENE-2588 where we strip off the non-distinguishing suffix to save RAM in the loaded terms index. The problem with this optimization is that it is not safe for all comparators. The testcase runs with a reverse unicode comparator which triggers terms to appear in reverse order during indexing. Yet, this is not a problem until we have run into the situations where the the stripped suffix is required due to the nature of the comparator. In this case here we index number from 0 - 173 and with the randomly set termIndexInterval set to 54 we run into a situation where the indexing code was wrong about the prefix. It sees the term 49 with prior term 5 and thinks it could strip of the 9 from the previous term and uses 4 as the indexed term. Once we seek on the terms dictionary the binary search in CoreFieldIndex#getIndexOffset we try to find the indexedTerm prior to term 44 we compare to 4 which returns -1 while comparing to 49 would have yield 1. That lets us end up with the wrong offset and the assert blows up. We somehow need to have access to the actually used comparator during building the indexed terms to fix that - I will reopen LUCENE-2588 Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs) Key: LUCENE-2622 URL: https://issues.apache.org/jira/browse/LUCENE-2622 Project: Lucene - Java Issue Type: Bug Components: Tests Reporter: Mark Miller Priority: Minor Error Message state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1 Stacktrace junit.framework.AssertionFailedError: state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1 at org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$SegmentTermsEnum.seek(StandardTermsDictReader.java:395) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1099) at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1028) at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4213) at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3381) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3221) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3211) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2345) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2323) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2293) at org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:645) at org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:381) at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:373) Standard Output NOTE: random codec of testcase 'testPerFieldCodec' was: MockFixedIntBlock(blockSize=1327) NOTE: random locale of testcase 'testPerFieldCodec' was: lt_LT NOTE: random timezone of testcase 'testPerFieldCodec' was: Africa/Lusaka NOTE: random seed of testcase 'testPerFieldCodec' was: 812019387131615618 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (LUCENE-2588) terms index should not store useless suffixes
[ https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reopened LUCENE-2588: - Reopening this because this optimization is not safe for all BytesRef comparators. Reverse unicode order already breaks it. See LUCENE-2622 for details. The non-distinguishable suffix must be determined by the actually used Comparator otherwise the assumption might be wrong for non-standard sort order. terms index should not store useless suffixes - Key: LUCENE-2588 URL: https://issues.apache.org/jira/browse/LUCENE-2588 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2588.patch, LUCENE-2588.patch This idea came up when discussing w/ Robert how to improve our terms index... The terms dict index today simply grabs whatever term was at a 0 mod 128 index (by default). But this is wasteful because you often don't need the suffix of the term at that point. EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, instead of storing that full term you only need to store ab. The suffix is useless, and uses up RAM since we load the terms index into RAM. The patch is very simple. The optimization is particularly easy because terms are now byte[] and we sort in binary order. I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, indexing body field tokenized but title / date fields untokenized). I expect on noisier terms dicts, especially ones w/ bad terms accidentally indexed, that the savings will be even more. In the future we could do crazier things. EG there's no real reason why the indexed terms must be regular (every N terms), so, we could instead pick terms more carefully, say approximately every N, but favor terms that have a smaller net prefix. We can also index more sparsely in regions where the net docFreq is lowish, since we can afford somewhat higher seek+scan time to these terms since enuming their docs will be much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2588) terms index should not store useless suffixes
[ https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909699#action_12909699 ] Robert Muir commented on LUCENE-2588: - Should we really change StandardCodec to support this [non-binary order]? Really if you have anything but regular unicode order, other things in lucene will break too, such as queries. The test just doesnt test these. Try changing the order of PreFlexCodec's comparator... Can't we just fix the test not to use StandardCodec? I mean we aren't taking any feature away here. terms index should not store useless suffixes - Key: LUCENE-2588 URL: https://issues.apache.org/jira/browse/LUCENE-2588 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2588.patch, LUCENE-2588.patch This idea came up when discussing w/ Robert how to improve our terms index... The terms dict index today simply grabs whatever term was at a 0 mod 128 index (by default). But this is wasteful because you often don't need the suffix of the term at that point. EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, instead of storing that full term you only need to store ab. The suffix is useless, and uses up RAM since we load the terms index into RAM. The patch is very simple. The optimization is particularly easy because terms are now byte[] and we sort in binary order. I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, indexing body field tokenized but title / date fields untokenized). I expect on noisier terms dicts, especially ones w/ bad terms accidentally indexed, that the savings will be even more. In the future we could do crazier things. EG there's no real reason why the indexed terms must be regular (every N terms), so, we could instead pick terms more carefully, say approximately every N, but favor terms that have a smaller net prefix. We can also index more sparsely in regions where the net docFreq is lowish, since we can afford somewhat higher seek+scan time to these terms since enuming their docs will be much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Current trunk example woes...
Well, maybe. I'm sure I got SOLR, but maybe not Lucene. What I'm sure I *hadn't* done was clean the lucene tree before building the solr example. Which, if I'd been thinking would have been logical... Doing both fixes my self-generated problem, and all's well now, I was having a hard time imagining that I was the first one to run into such an egregious error, but it had been a long day by last night... Never mind thanks Erick On Tue, Sep 14, 2010 at 9:46 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson erickerick...@gmail.com wrote: If I check out the current trunk, and from solr do an ant clean example all is well, even up to starting Solr. But trying to hit anything on the site gives a response in the browser starting with: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Error loading class 'solr.SpatialTileField' Commenting the relevant fieldType out of schema.xml fixes this. Should I open a Jira or does someone want to jump on it? Hmmm, I can't reproduce this. Something like http://localhost:8983/solr/select?q=solr seems to work fine. Did you do an svn up at the trunk level (i.e. get lucene too)? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Current trunk example woes...
This can happen fairly easily/often these days. We probably still want to consider having a Solr clean call Lucene clean. - Mark On 9/15/10 7:54 AM, Erick Erickson wrote: Well, maybe. I'm sure I got SOLR, but maybe not Lucene. What I'm sure I *hadn't* done was clean the lucene tree before building the solr example. Which, if I'd been thinking would have been logical... Doing both fixes my self-generated problem, and all's well now, I was having a hard time imagining that I was the first one to run into such an egregious error, but it had been a long day by last night... Never mind thanks Erick On Tue, Sep 14, 2010 at 9:46 PM, Yonik Seeley yo...@lucidimagination.com mailto:yo...@lucidimagination.com wrote: On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson erickerick...@gmail.com mailto:erickerick...@gmail.com wrote: If I check out the current trunk, and from solr do an ant clean example all is well, even up to starting Solr. But trying to hit anything on the site gives a response in the browser starting with: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType:Error loading class 'solr.SpatialTileField' Commenting the relevant fieldType out of schema.xml fixes this. Should I open a Jira or does someone want to jump on it? Hmmm, I can't reproduce this. Something like http://localhost:8983/solr/select?q=solr seems to work fine. Did you do an svn up at the trunk level (i.e. get lucene too)? -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org mailto:dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org mailto:dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction
What I want you to do is, I want you to find the guys who are putting all the bugs in the code, and I want you to FIRE THEM! He who is without bugs in their code, may be the first to fire. Did noone fire you? Neither do the ASF. Go away and skip unit tests no more.. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2588) terms index should not store useless suffixes
[ https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909717#action_12909717 ] Simon Willnauer commented on LUCENE-2588: - bq. Should we really change StandardCodec to support this [non-binary order]? I'm not sure if we should do, but we should at least document the limitation. People who work with that level do also read doc strings - if they don't let them be doomed but if you run into the bug we had in LUCENE-2622 you will have a super hard time to figure out what is going on without knowing lucene very very well. bq. Can't we just fix the test not to use StandardCodec? I mean we aren't taking any feature away here. +1 I think we should fix this test ASAP with either using byte sort order or add some MockCodec (what robert has suggested). terms index should not store useless suffixes - Key: LUCENE-2588 URL: https://issues.apache.org/jira/browse/LUCENE-2588 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2588.patch, LUCENE-2588.patch This idea came up when discussing w/ Robert how to improve our terms index... The terms dict index today simply grabs whatever term was at a 0 mod 128 index (by default). But this is wasteful because you often don't need the suffix of the term at that point. EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, instead of storing that full term you only need to store ab. The suffix is useless, and uses up RAM since we load the terms index into RAM. The patch is very simple. The optimization is particularly easy because terms are now byte[] and we sort in binary order. I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, indexing body field tokenized but title / date fields untokenized). I expect on noisier terms dicts, especially ones w/ bad terms accidentally indexed, that the savings will be even more. In the future we could do crazier things. EG there's no real reason why the indexed terms must be regular (every N terms), so, we could instead pick terms more carefully, say approximately every N, but favor terms that have a smaller net prefix. We can also index more sparsely in regions where the net docFreq is lowish, since we can afford somewhat higher seek+scan time to these terms since enuming their docs will be much faster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2504) sorting performance regression
[ https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909407#action_12909407 ] Yonik Seeley edited comment on LUCENE-2504 at 9/15/10 9:00 AM: --- bq. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included). I tried IBM's latest Java6 (SR8 FP1, 20100624) It seems to have some of the same pitfalls as Oracle's JVM, just different. The first run does not differ from the second run in the same JVM as it does with Oracle, but the first run itself has much more variation. The worst case is worse, and just like the Oracle JVM, it gets stuck in it's worst case. Each run (of the complete set of fields) in a separate JVM since two runs in the same JVM didn't really differ as they did in the oracle JVM. branch_3x: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|129|128|130|109|98|128|135 |1|128|123|127|127|98|128|135 |1000|129|130|130|128|98|130|136 |100|128|133|133|130|100|132|139 |10|150|153|153|154|122|153|159 trunk: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|217|81|383|99|79|78|215 |1|254|73|346|101|106|108|267 |1000|253|74|347|99|107|108|258 |100|253|107|394|98|107|102|255 |10|251|107|388|99|106|98|257 The second way of testing is to completely mix fields (no serial correlation between what field is sorted on). This is the test that is very predictable with the Oracle JVM, but I still see wide variability with the IBM JVM. Here is the list of different runs for the IBM JVM (ms): branch_3x |128|129|123|120|128|100|95|74|130|91|120 trunk |106|89|168|116|155|119|108|118|112|169|165 To my eye, it looks like we have more variability in trunk, due to increased use of abstractions? edit: corrected the table description - all times in this message are for the IBM JVM. was (Author: ysee...@gmail.com): bq. The open question is whether this hotspot fickleness is particular to Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included). I tried IBM's latest Java6 (SR8 FP1, 20100624) It seems to have some of the same pitfalls as Oracle's JVM, just different. The first run does not differ from the second run in the same JVM as it does with Oracle, but the first run itself has much more variation. The worst case is worse, and just like the Oracle JVM, it gets stuck in it's worst case. Each run (of the complete set of fields) in a separate JVM since two runs in the same JVM didn't really differ as they did in the oracle JVM. branch_3x: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|129|128|130|109|98|128|135 |1|128|123|127|127|98|128|135 |1000|129|130|130|128|98|130|136 |100|128|133|133|130|100|132|139 |10|150|153|153|154|122|153|159 trunk: |unique terms in field|median sort time of 100 sorts in ms|another run|another run|another run|another run|another run|another run |10|217|81|383|99|79|78|215 |1|254|73|346|101|106|108|267 |1000|253|74|347|99|107|108|258 |100|253|107|394|98|107|102|255 |10|251|107|388|99|106|98|257 The second way of testing is to completely mix fields (no serial correlation between what field is sorted on). This is the test that is very predictable with the Oracle JVM, but I still see wide variability with the IBM JVM. Here is the list of different runs for the Oracle JVM (ms): branch_3x |128|129|123|120|128|100|95|74|130|91|120 trunk |106|89|168|116|155|119|108|118|112|169|165 To my eye, it looks like we have more variability in trunk, due to increased use of abstractions? sorting performance regression -- Key: LUCENE-2504 URL: https://issues.apache.org/jira/browse/LUCENE-2504 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Yonik Seeley Fix For: 4.0 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch sorting can be much slower on trunk than branch_3x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909759#action_12909759 ] Robert Muir commented on LUCENE-2167: - bq. I think this is ready to commit. I think so too, i applied the svn moves and the patch and all tests pass. One last question, it might be reasonable to move ClassicTokenizer and friends to .classic package? There is nothing standards-based about them at all and it makes the .standard directory a little confusing. To do this i would have to make StandardTokenizerInterface public, but it could marked @lucene.internal. Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909760#action_12909760 ] Robert Muir commented on LUCENE-2167: - bq. One last question, it might be reasonable to move ClassicTokenizer and friends to .classic package? by the way, if we decide this is best, i would like to open a new issue for it. we don't have to do everything in one step, and currently this patch cleanly applies with the svn move instructions. so I would like to commit this patch in a few days as-is if they are no objections. if we want to improve packaging lets open a followup-issue. Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909770#action_12909770 ] Steven Rowe commented on LUCENE-2167: - bq. One last question, it might be reasonable to move ClassicTokenizer and friends to .classic package? I agree with your arguments about moving to .classic package. I think new users won't care about what StandardTokenizer/Analyzer used to be. My only concern here is existing users' upgrade experience - users should be able to continue using the ClassicTokenizer if they want to keep current behavior. Right now, they can do that by setting Version to 3.0 in the constructor to StandardTokenizer/Analyzer. I think this should remain the case until Lucene version 5.0. Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909771#action_12909771 ] Jason Rutherglen commented on LUCENE-2575: -- Because of the way byte slices work, eg, they need to pre-know the size of the slice before iterating on it, we can't simply point to the middle of a slice and read without probably iterating over the forwarding address. It seems the skip list will need to point to the beginning of a slice. This'll make the interval iteration in the RAM buffer skip list writer a little more complicated than today in that it'll need to store positions that are the start of byte slices. In other words, the intervals will be slightly uneven at times. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909772#action_12909772 ] Robert Muir commented on LUCENE-2167: - {quote} My only concern here is existing users' upgrade experience - users should be able to continue using the ClassicTokenizer if they want to keep current behavior. Right now, they can do that by setting Version to 3.0 in the constructor to StandardTokenizer/Analyzer. I think this should remain the case until Lucene version 5.0. {quote} I agree completely, i think we can do this though with the Classic stuff in a separate package? (like we can have both) Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard
[ https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909776#action_12909776 ] Steven Rowe commented on LUCENE-2167: - bq. I agree completely, i think we can do this though with the Classic stuff in a separate package? (like we can have both) Right, I didn't mean to say that moving the Classic stuff out of .standard was antithetical to preserving Classic functionality in StandardTokenizer - I just wanted to make sure that we agreed that that move doesn't mean complete separation (yet). Sounds like we agree. Implement StandardTokenizer with the UAX#29 Standard Key: LUCENE-2167 URL: https://issues.apache.org/jira/browse/LUCENE-2167 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Affects Versions: 3.1 Reporter: Shyamal Prasad Assignee: Robert Muir Priority: Minor Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-lucene-buildhelper-maven-plugin.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, standard.zip, StandardTokenizerImpl.jflex Original Estimate: 0.5h Remaining Estimate: 0.5h It would be really nice for StandardTokenizer to adhere straight to the standard as much as we can with jflex. Then its name would actually make sense. Such a transition would involve renaming the old StandardTokenizer to EuropeanTokenizer, as its javadoc claims: bq. This should be a good tokenizer for most European-language documents The new StandardTokenizer could then say bq. This should be a good tokenizer for most languages. All the english/euro-centric stuff like the acronym/company/apostrophe stuff can stay with that EuropeanTokenizer, and it could be used by the european analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909839#action_12909839 ] Jason Rutherglen commented on LUCENE-2575: -- Is there a way to know the level of a slice given only the forwarding address/position? It doesn't look like it. Hmm... This could mean encoding the level or the size of the slice into the slice, which would elongate slices in general, I suppose though that the level index would only add one byte and that would be okay. Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations
[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909852#action_12909852 ] Jason Rutherglen commented on LUCENE-2575: -- In the following line of ByteBlockPool.allocSlice we're recording the slice level, however it's at the end of the slice rather than the beginning, which is where we'll need to write the level in order to implement slice seek. I'm not immediately sure what's reading the level at this end position of the byte[]. {code} buffer[byteUpto-1] = (byte) (16|newLevel); {code} Concurrent byte and int block implementations - Key: LUCENE-2575 URL: https://issues.apache.org/jira/browse/LUCENE-2575 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: Realtime Branch Reporter: Jason Rutherglen Fix For: Realtime Branch Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch The current *BlockPool implementations aren't quite concurrent. We really need something that has a locking flush method, where flush is called at the end of adding a document. Once flushed, the newly written data would be available to all other reading threads (ie, postings etc). I'm not sure I understand the slices concept, it seems like it'd be easier to implement a seekable random access file like API. One'd seek to a given position, then read or write from there. The underlying management of byte arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2098) Search Grouping: Facet support
[ https://issues.apache.org/jira/browse/SOLR-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2098: --- Attachment: SOLR-2098.patch Attaching patch that makes faceting work with field collapsing. Search Grouping: Facet support -- Key: SOLR-2098 URL: https://issues.apache.org/jira/browse/SOLR-2098 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Attachments: SOLR-2098.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (SOLR-2098) Search Grouping: Facet support
[ https://issues.apache.org/jira/browse/SOLR-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2098. Fix Version/s: 4.0 Resolution: Fixed committed. Search Grouping: Facet support -- Key: SOLR-2098 URL: https://issues.apache.org/jira/browse/SOLR-2098 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2098.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2646) Iimplement the Military Grid Reference System for tiling
Iimplement the Military Grid Reference System for tiling Key: LUCENE-2646 URL: https://issues.apache.org/jira/browse/LUCENE-2646 Project: Lucene - Java Issue Type: New Feature Components: contrib/spatial Reporter: Grant Ingersoll The current tile based system in Lucene is broken. We should standardize on a common way of labeling grids and provide that as an option. Based on previous conversations with Ryan McKinley and Chris Male, it seems the Military Grid Reference System (http://en.wikipedia.org/wiki/Military_grid_reference_system) is a good candidate for the replacement due to its standard use of metric tiles of increasing orders of magnitude (1, 10, 100, 1000, etc.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2646) Iimplement the Military Grid Reference System for tiling
[ https://issues.apache.org/jira/browse/LUCENE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909962#action_12909962 ] Chris Male commented on LUCENE-2646: +1 Do you have an implementation in mind/already started? Iimplement the Military Grid Reference System for tiling Key: LUCENE-2646 URL: https://issues.apache.org/jira/browse/LUCENE-2646 Project: Lucene - Java Issue Type: New Feature Components: contrib/spatial Reporter: Grant Ingersoll The current tile based system in Lucene is broken. We should standardize on a common way of labeling grids and provide that as an option. Based on previous conversations with Ryan McKinley and Chris Male, it seems the Military Grid Reference System (http://en.wikipedia.org/wiki/Military_grid_reference_system) is a good candidate for the replacement due to its standard use of metric tiles of increasing orders of magnitude (1, 10, 100, 1000, etc.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2562) Make Luke a Lucene/Solr Module
[ https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909980#action_12909980 ] Mark Miller commented on LUCENE-2562: - I haven't had any time to really work on this in a while, but I did bite the bullet and join the pivot mailing list and figured out my issues with making a fluid resizing layout - which is sweet and will hopefully motivate me to make some progress here soon. Make Luke a Lucene/Solr Module -- Key: LUCENE-2562 URL: https://issues.apache.org/jira/browse/LUCENE-2562 Project: Lucene - Java Issue Type: Task Reporter: Mark Miller Attachments: luke1.jpg, luke2.jpg, luke3.jpg see http://search.lucidimagination.com/search/document/ee0e048c6b56ee2/luke_in_need_of_maintainer http://search.lucidimagination.com/search/document/5e53136b7dcb609b/web_based_luke I think it would be great if there was a version of Luke that always worked with trunk - and it would also be great if it was easier to match Luke jars with Lucene versions. While I'd like to get GWT Luke into the mix as well, I think the easiest starting point is to straight port Luke to another UI toolkit before abstracting out DTO objects that both GWT Luke and Pivot Luke could share. I've started slowly converting Luke's use of thinlet to Apache Pivot. I haven't/don't have a lot of time for this at the moment, but I've plugged away here and there over the past work or two. There is still a *lot* to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Build failed in Hudson: Lucene-3.x #116
this is unrelated to the clover problem. the problem is @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class) as you can see, clover thinks we added 6210 core tests (see https://hudson.apache.org/hudson/view/Lucene/job/Lucene-3.x/115/testReport/) We do not do many iterations, we run every test the same time. its not a parameter thing. try ant test -Dtestcase=TestQueryParser to see what i mean, then comment out that @RunWith On Wed, Sep 15, 2010 at 9:35 PM, Uwe Schindler u...@thetaphi.de wrote: Maybe we should reduce the iterations in the clover case. Clover should only test coverage and that does not need to try all random variants. For colverage a single run of each test should be fine. How about removing the –Dtests. from the clover part of the build file? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Robert Muir [mailto:rcm...@gmail.com] *Sent:* Wednesday, September 15, 2010 5:44 PM *To:* dev@lucene.apache.org *Subject:* Re: Build failed in Hudson: Lucene-3.x #116 Hi, I think the switch of all tests to Junit4 may be causing a clover issue. For example, TestQueryParser now thinks it has over 5000 tests. The reason is that it runs each test under every locale and junit4 counts them this way. It does the same with MultiCodecRunner. I wonder if now that we vary these in the tests anyway, if we should consider commenting out the Localized/MultiCodec runners? We could keep them available (but not used) in case you want to quickly run a test under every single Locale/Codec On Wed, Sep 15, 2010 at 8:34 PM, Apache Hudson Server hud...@hudson.apache.org wrote: See https://hudson.apache.org/hudson/job/Lucene-3.x/116/changes Changes: [mikemccand] don't close reader prematurely [rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version -- [...truncated 18329 lines...] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 2.721 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.016 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec [junit] [junit] Testsuite: org.apache.lucene.search.TestWildcard [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.037 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestCustomScoreQuery [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.241 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestDocValues [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.005 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery [junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.267 sec [junit] [junit] Testsuite: org.apache.lucene.search.function.TestOrdValues [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.108 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadNearQuery [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.72 sec [junit] [junit] Testsuite: org.apache.lucene.search.payloads.TestPayloadTermQuery [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.973 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestBasics [junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 12.966 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestFieldMaskingSpanQuery [junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.667 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.073 sec [junit] [junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.4 sec [junit] [junit] - Standard Output --- [junit] [junit] Spans Dump -- [junit] payloads for span:2 [junit] doc:0 s:3 e:6 three:Noise:5 [junit] doc:0 s:3 e:6 one:Entity:3 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:1 s:0 e:4 rr:Noise:3 [junit] doc:1 s:0 e:4 xx:Entity:0 [junit] doc:1 s:0 e:4 yy:Noise:1 [junit] [junit] Spans Dump -- [junit] payloads for span:3 [junit] doc:0 s:0 e:3 rr:Noise:1 [junit] doc:0 s:0 e:3 yy:Noise:2 [junit] doc:0 s:0 e:3 xx:Entity:0 [junit] [junit] Spans Dump --
[jira] Commented: (SOLR-792) Tree Faceting Component
[ https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910008#action_12910008 ] Lance Norskog commented on SOLR-792: Can this be back-ported (easily) to Solr 1.4.1? Is it dependent on new features? Tree Faceting Component --- Key: SOLR-792 URL: https://issues.apache.org/jira/browse/SOLR-792 Project: Solr Issue Type: New Feature Reporter: Erik Hatcher Assignee: Ryan McKinley Priority: Minor Attachments: SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch A component to do multi-level faceting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Hudson: Lucene-trunk #1289
See https://hudson.apache.org/hudson/job/Lucene-trunk/1289/changes Changes: [mikemccand] don't close reader prematurely [rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version [rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version -- [...truncated 14620 lines...] common.init: build-lucene: init: compile-test: [echo] Building swing... compile-analyzers-common: common.init: build-lucene: init: clover.setup: [clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778) [clover-setup] Loaded from: /export/home/hudson/tools/clover/clover2latest/clover-2.6.3.jar [clover-setup] Clover: Open Source License registered to Apache. [clover-setup] Clover is enabled with initstring 'https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/test/clover/db/lucene_coverage.db' clover.info: clover: common.compile-core: compile-core: common.compile-test: junit-mkdir: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/swing/test junit-sequential: [junit] Testsuite: org.apache.lucene.swing.models.TestBasicList [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.846 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.143 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.303 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestSearchingTable [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.136 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.48 sec [junit] [junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.616 sec [junit] junit-parallel: common.test: [echo] Building wordnet... common.init: build-lucene: init: test: [echo] Building wordnet... common.init: build-lucene: init: compile-test: [echo] Building wordnet... compile-analyzers-common: common.init: build-lucene: init: clover.setup: [clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778) [clover-setup] Loaded from: /export/home/hudson/tools/clover/clover2latest/clover-2.6.3.jar [clover-setup] Clover: Open Source License registered to Apache. [clover-setup] Clover is enabled with initstring 'https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/test/clover/db/lucene_coverage.db' clover.info: clover: common.compile-core: compile-core: common.compile-test: junit-mkdir: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/test junit-sequential: [junit] Testsuite: org.apache.lucene.wordnet.TestSynonymTokenFilter [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.607 sec [junit] [junit] Testsuite: org.apache.lucene.wordnet.TestWordnet [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.946 sec [junit] [junit] - Standard Output --- [junit] Opening Prolog file https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt [junit] [1/2] Parsing https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt [junit] 2 s(10001,1,'woods',n,1,0). 0 0 ndecent=0 [junit] 4 s(10001,3,'forest',n,1,0). 2 1 ndecent=0 [junit] 8 s(10003,2,'baron',n,1,1). 6 3 ndecent=0 [junit] [2/2] Building index to store synonyms, map sizes are 8 and 4 [junit] row=1/8 doc= Documentstored,omitNormssyn:king stored,indexedword:baron [junit] row=2/8 doc= Documentstored,omitNormssyn:wood stored,omitNormssyn:woods stored,indexedword:forest [junit] row=4/8 doc= Documentstored,omitNormssyn:wolfish stored,indexedword:ravenous [junit] Optimizing.. [junit] Opening Prolog file https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt [junit] [1/2] Parsing https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt [junit] 2 s(10001,1,'woods',n,1,0). 0 0 ndecent=0 [junit] 4 s(10001,3,'forest',n,1,0). 2 1 ndecent=0 [junit] 8 s(10003,2,'baron',n,1,1). 6 3 ndecent=0 [junit] [2/2] Building index to store synonyms, map sizes are 8 and 4 [junit] row=1/8 doc= Documentstored,omitNormssyn:king stored,indexedword:baron [junit] row=2/8 doc= Documentstored,omitNormssyn:wood