[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6549 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6549/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010) Build Log (for compile errors): [...truncated 5227 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013869#comment-13013869 ] Dawid Weiss commented on LUCENE-3003: - For what it's worth, the instrumentation interface allows one to get exact allocation sizes of objects. I put together a small spike at https://github.com/dweiss/poligon/tree/master/instrumenter that measures the actual allocation size of byte[]. On my hotspot, 64-bit, this yields: {noformat} byte[0] takes 24 bytes. byte[1] takes 32 bytes. byte[2] takes 32 bytes. byte[3] takes 32 bytes. byte[4] takes 32 bytes. byte[5] takes 32 bytes. byte[6] takes 32 bytes. byte[7] takes 32 bytes. byte[8] takes 32 bytes. byte[9] takes 40 bytes. byte[10] takes 40 bytes. byte[11] takes 40 bytes. ... {noformat} IBM's VM yields the same (64-bit), but the version of jrockit that I have (which may be an old one, but is 64-bit!) yields: {noformat} byte[0] takes 16 bytes. byte[1] takes 24 bytes. byte[2] takes 24 bytes. byte[3] takes 24 bytes. byte[4] takes 24 bytes. byte[5] takes 24 bytes. byte[6] takes 24 bytes. byte[7] takes 24 bytes. byte[8] takes 24 bytes. byte[9] takes 32 bytes. byte[10] takes 32 bytes. byte[11] takes 32 bytes. byte[12] takes 32 bytes. byte[13] takes 32 bytes. byte[14] takes 32 bytes. byte[15] takes 32 bytes. byte[16] takes 32 bytes. byte[17] takes 40 bytes. {noformat} Don't have access to a 32-bit system right now, but if you're keen on checking, checkout that github repo and run: {noformat} cd instrumenter mvn package java -javaagent:target/instrumenter-0.1.0-SNAPSHOT.jar -version {noformat} Move UnInvertedField into Lucene core - Key: LUCENE-3003 URL: https://issues.apache.org/jira/browse/LUCENE-3003 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3003.patch, LUCENE-3003.patch Solr's UnInvertedField lets you quickly lookup all terms ords for a given doc/field. Like, FieldCache, it inverts the index to produce this, and creates a RAM-resident data structure holding the bits; but, unlike FieldCache, it can handle multiple values per doc, and, it does not hold the term bytes in RAM. Rather, it holds only term ords, and then uses TermsEnum to resolve ord - term. This is great eg for faceting, where you want to use int ords for all of your counting, and then only at the end you need to resolve the top N ords to their text. I think this is a useful core functionality, and we should move most of it into Lucene's core. It's a good complement to FieldCache. For this first baby step, I just move it into core and refactor Solr's usage of it. After this, as separate issues, I think there are some things we could explore/improve: * The first-pass that allocates lots of tiny byte[] looks like it could be inefficient. Maybe we could use the byte slices from the indexer for this... * We can improve the RAM efficiency of the TermIndex: if the codec supports ords, and we are operating on one segment, we should just use it. If not, we can use a more RAM-efficient data structure, eg an FST mapping to the ord. * We may be able to improve on the main byte[] representation by using packed ints instead of delta-vInt? * Eventually we should fold this ability into docvalues, ie we'd write the byte[] image at indexing time, and then loading would be fast, instead of uninverting -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
HTTP ERROR: 400
Getting error messge while indexing file HTTP ERROR: 400 ERROR:unknown field 'trapped'
Re: HTTP ERROR: 400
ERROR 500: unknown context On Thu, Mar 31, 2011 at 10:21, Deepak Singh deep...@praumtech.com wrote: Getting error messge while indexing file HTTP ERROR: 400 ERROR:unknown field 'trapped' - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013878#comment-13013878 ] Yuriy Akopov commented on SOLR-236: --- Another question: The patched version of .war starts and works as expected if I place the following simple instruction in solrconfig.xml: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent /searchComponent But if I add additional factories like it is advised by the sample config, it produces an error when searching with collapsing turned on: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent collapseCollectorFactory class=solr.fieldcollapse.collector.DocumentGroupCountCollapseCollectorFactory / collapseCollectorFactory class=solr.fieldcollapse.collector.FieldValueCountCollapseCollectorFactory / collapseCollectorFactory class=solr.fieldcollapse.collector.DocumentFieldsCollapseCollectorFactory / collapseCollectorFactory name=groupAggregatedData class=org.apache.solr.search.fieldcollapse.collector.AggregateCollapseCollectorFactory function name=sum class=org.apache.solr.search.fieldcollapse.collector.aggregate.SumFunction/ function name=avg class=org.apache.solr.search.fieldcollapse.collector.aggregate.AverageFunction/ function name=min class=org.apache.solr.search.fieldcollapse.collector.aggregate.MinFunction/ function name=max class=org.apache.solr.search.fieldcollapse.collector.aggregate.MaxFunction/ /collapseCollectorFactory /searchComponent So far it does what I expect from it without additional factories mentioned, but still it bothers me that it fails when they're listed. Maybe I placed the libraries in a wrong place? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: DocSetScoreCollector.java, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA.
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/ 3 tests failed. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280) at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302) at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345) at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244) REGRESSION: org.apache.lucene.index.TestSegmentTermDocs.test Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) at org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45) REGRESSION: org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) Build Log (for compile errors): [...truncated 3276 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013907#comment-13013907 ] David Mark Nemeskey commented on LUCENE-2959: - Robert: thanks for all the info! It's nice to see so much work has already been done. I plan to delve into it after the selection, and try to get other things out of the way until then, so that I can concentrate on GSoC during the summer. I think the main point would be to make the addition of a new ranking function as easy as possible. At least a prototype implementation should be very straightforward, even at the expense of performance. Then, if the new method provides good results, the developer can go on to the lower level to squeeze more juice out of it. It's hard for me to discuss new this without knowing the code, of course, but do you think it is possible? Even though I added a Performance section to my proposal (http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/davidnemeskey/1), I see now that it's probably more important than I believed it to be at first. I think I will follow your advice and concentrate on how to make BM25F fast. It may be a bit tougher nut to crack than DFR, as the latter has logarithms scattered all over it. However, the first thing that comes to mind is that the tf-BM25 curve becomes almost flat very quickly (less so for a high k1 value, though). So it may be possible to pre-compute a tf map or array for a query. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013932#comment-13013932 ] Dawid Weiss commented on SOLR-2378: --- I didn't have time to take care of this until now, apologies. So, looking at Lookup#lookup(), I just wanted to clarify: {code} /** * Look up a key and return possible completion for this key. * @param key lookup key. Depending on the implementation this may be * a prefix, misspelling, or even infix. * @param onlyMorePopular return only more popular results * @param num maximum number of results to return * @return a list of possible completions, with their relative weight (e.g. popularity) */ public abstract ListLookupResult lookup(String key, boolean onlyMorePopular, int num); {code} the onlyMorePopular means more popular than... what? I see TSTLookup and JaspellLookup (Andrzej, will you confirm, please?) sorts matches in a priority queue by their associated value (frequency I guess). This makes sense, but onlyMorePopular is misleading -- it should be called onlyMostPopular (those with the native knowledge of English subtlieties, speak up if I'm right here). I also see and wanted to confirm -- the Dictionary can come from various sources, so we can't rely on the presence of the built-in Lucene automaton, can we? Even if I wanted to reuse it, there'd be no easy way to determine if it's a full automaton, or a partial one (because of the gaps/trimming)... I think I'll just implement the solution by building the automaton from whatever Dictionary comes in and serializing/ deserializing it similar to TSTLookup. Sounds ok? FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Automaton-based suggest lookup
https://issues.apache.org/jira/browse/SOLR-2378 Andrzej, Mike, would you peek at my latest comment and commit if I got the API requirements right? I'll implement the FSA-based suggested based on the trunk code layout for now and we can move it around later if needed. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
[ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013933#comment-13013933 ] Andrzej Bialecki commented on SOLR-2378: - bq. I see TSTLookup and JaspellLookup (Andrzej, will you confirm, please?) sorts matches in a priority queue by their associated value (frequency I guess) Correct. I agree that the name is so-so, I inherited it from the spellchecker API - feel free to change it. bq. the Dictionary can come from various sources, ... Yes. This is again a legacy of the Lucene SpellChecker API. I tried to extend this API in Solr without changing it in Lucene (see the *IteratorWrapper classes and TermFreqIterator) but ultimately it would be better to refactor this. FST-based Lookup (suggestions) for prefix matches. -- Key: SOLR-2378 URL: https://issues.apache.org/jira/browse/SOLR-2378 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Labels: lookup, prefix Fix For: 4.0 Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST package). This issue is for implementing a relatively basic prefix matcher, we will handle infixes and other types of input matches gradually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: HTTP ERROR: 400
Deepak: 1 please put questions like this on the users list. This list for development of Lucene and Solr. 2 please provide context Best Erick On Thu, Mar 31, 2011 at 3:21 AM, Deepak Singh deep...@praumtech.com wrote: Getting error messge while indexing file HTTP ERROR: 400 ERROR:unknown field 'trapped' - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013944#comment-13013944 ] Robert Muir commented on LUCENE-2959: - {quote} I think the main point would be to make the addition of a new ranking function as easy as possible. At least a prototype implementation should be very straightforward, even at the expense of performance. Then, if the new method provides good results, the developer can go on to the lower level to squeeze more juice out of it. It's hard for me to discuss new this without knowing the code, of course, but do you think it is possible? {quote} This sounds great! For example, you could extend the low-level api, gather every possible statistic that lucene has, and present a high-level api that looks more like terrier's scoring api (which i'm guessing is what researchers would prefer?), where they basically implement the scoring in one method with all the stats there. So someone would extend this API to do prototyping, it would make it easier to experiment. {quote} I think I will follow your advice and concentrate on how to make BM25F fast. {quote} Actually as far as BM25f, this one presents a few challenges (some already discussed on LUCENE-2091). To summarize: * for any field, Lucene has a per-field terms dictionary that contains that term's docFreq. To compute BM25f's IDF method would be challenging, because it wants a docFreq across all the fields. (its not clear to me at a glance either from the original paper, if this should be across only the fields in the query, across all the fields in the document, and if a static schema is implied in this scoring system (in lucene document 1 can have 3 fields and document 2 can have 40 different ones, even with different properties). * the same issue applies to length normalization, lucene has a field length but really no concept of document length. So I just wanted to mention that while its possible here to apply a per-field TF boost before the non-linear TF saturation, its not immediately clear how to adjust the BM25f formula to lucene: how to combine these scores without using a (wasteful) catch-all-field and some lying behind the scenes to force this catch-all-field's length normalization and docFreq to be used. Too many questions arise for BM25f and how it would fit with lucene, for example the fact that multiple fields can really mean anything, and having a field in lucene doesnt mean at all that it was in your original document! For example, Solr users frequently use a copyField to take the content of one field, duplicate it to a different field (and perhaps apply some processing). In terms of things like length normalization, it seems that document length calculated as the sum across the fields would be wrong for many use cases. I only wanted to recommend against this one because of this rather serious challenge, it seems its something we might want to table at the moment: lucene is changing fast and as new capabilities arise, we might realize there is a more elegant way to address this... but at the moment I think I would recommend starting with BM25. [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: Examples, Javadocs, Query/Scoring Reporter: David Mark Nemeskey Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2959_mockdfr.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013971#comment-13013971 ] Upayavira commented on SOLR-2399: - https://github.com/upayavira/solr-admin/commit/a96d7b8bc63cb5ae6125c0a2c91302f553782ef2 * added current time (note, it is the time the overall page was loaded, not now). * fixed cwd. I've added 'ms' to ping and made it stand out more. * fixed threaddump to make it work multicore * moved java properties to global level * removed replication link - info is already on dashboard IMO this is now ready for testing - folks, please try it on your browsers (I've seen it work on Firefox/Chrome on Linux and Firefox on Mac). Anyone able to try it on IE? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Priority: Minor Fix For: 4.0 *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin [This commit shows the differences|https://github.com/steffkes/solr-admin/commit/5f80bb0ea9deb4b94162632912fe63386f869e0d] between old/existing index.jsp and my new one (which is could copy-cut/paste'd from the existing one). Main Action takes place in [js/script.js|https://github.com/steffkes/solr-admin/blob/master/js/script.js] which is actually neither clean nor pretty .. just work-in-progress. Actually it's Work in Progress, so ... give it a try. It's developed with Firefox as Browser, so, for a first impression .. please don't use _things_ like Internet Explorer or so ;o Jan already suggested a bunch of good things, i'm sure there are more ideas over there :) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Brainstorming on Improving the Release Process
On Wed, 30 Mar 2011 12:00 -0400, Grant Ingersoll gsing...@apache.org wrote: On Mar 30, 2011, at 9:19 AM, Robert Muir wrote: On Wed, Mar 30, 2011 at 8:22 AM, Grant Ingersoll gsing...@apache.org wrote: (Long post, please bear with me and please read!) Now that we have the release done (I'm working through the publication process now), I want to start the process of thinking about how we can improve the release process. As I see it, building the artifacts and checking the legal items are now almost completely automated and testable at earlier stages in the game. Thanks for writing this up. Here is my major beef with 2 concrete suggestions: It seems the current process is that we all develop and develop and at some point we agree we want to try to release. At this point its the RM's job to polish a turd, and no serious community participation takes place until an RC is actually produced: so its a chicken-and-egg thing, perhaps with the RM even declaring publicly 'i dont expect this to actually pass, i'm just building this to make you guys look at it'. I think its probably hard/impossible to force people to review this stuff before an RC, for some reason a VOTE seems to be the only thing for people to take it seriously. But what we can do is ask ourselves, how did the codebase become a turd in the first place? Because at one point we released off the code and the packaging was correct, there weren't javadocs warnings, and there weren't licensing issues, etc. So I think an important step would be to try to make more of this continuous, in other words, we did all the work to fix up the codebase to make it releasable, lets implement things to enforce it stays this way. It seems we did this for some things (e.g. code correctness with the unit tests and licensing with the license checker) but there is more to do. A. implement the hudson-patch capability to vote -1 on patches that break things as soon as they go on the JIRA issues. this is really early feedback and I think will go a long way. +1. I asked on builds@a.o if there was any standard way of doing this, or if there is a place someone can point me at to get this going. B. increase the scope of our 'ant test'/hudson runs to check more things. For example, it would be nice if they failed on javadocs warnings. Its insane if you think about it: we go to a ton of effort to implement really cruel and picky unit tests to verify the correctness of our code, but you can almost break the packaging and documentation completely and the build still passes. +1 on failing on javadocs. Also, what about code coverage? We run all this Clover stuff, but how do we incorporate that into our dev. cycle? Anyway, we spend a lot of time on trying to make our code correct, but our build is a bit messy. I know if we look at the time we spend on search performance and correctness, and applied even 1% of this effort to our build system to make it fast, picky, and and cleaner, that we would be in much better shape as a development team, with a faster compile/test/debug cycle to boot... I think there is a lot of low-hanging fruit here, and I think this thread has encouraged me to revisit the build and try to straighten some of this out. Yeah, our build is a bit messy, lots of recursion. I'm still not totally happy w/ how license checking is hooked in. Are you willing to say more? I have a little time, and have done a lot of work with Ant. Maybe I could help. Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974 ] Simon Willnauer commented on LUCENE-2573: - I run a couple of benchmarks with interesting results the graph below show documents per second for the RT branch with DWPT yielding a very good IO/CPU utilization and overall throughput is much better than trunks. !http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png! Yet, when we look at trunk the peak performance is much better on trunk than on DWPT. The reason for that I think is that we flush concurrently which takes at most one thread out of the loop, those are the little drops in docs/sec. This does not yet explain the reason for the constantly lower max indexing rate, I suspect that this is at least influenced due to the fact that flushing is very very CPU intensive. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk. Eventually, I need to run a profiler and see what is going on. !http://people.apache.org/~simonw/Trunk_dps.png! Interesting is that beside the nice CPU utilization we also have an nearly perfect IO utilization. The graph below shows that we are consistently using IO to flush segments. the width of the bars show the time it took to flush a single DWPT, there is almost no overlap. !http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png! Overall those are super results! Good job everybody! simon Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Brainstorming on Improving the Release Process
On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote: Are you willing to say more? I have a little time, and have done a lot of work with Ant. Maybe I could help. Upayavira Thanks, there is some followup discussion on this JIRA issue: https://issues.apache.org/jira/browse/SOLR-2002 The prototype patch I refer to in the comments where solr build system is changed to extend lucene's is the latest _merged.patch on the issue: https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch (Additionally as sort of a followup there are more comments/ideas about additional things we could do besides just refactoring the build system to be faster and simpler) As a first step I think the patch needs to be brought up to trunk (it gets out of date fast). I mentioned on the issue we can simply create a branch to make coordination easier. A branch might seem silly for a thing like this, but it would at least allow us to work together and people could contribute parts (e.g. PMD integration or something) without having to juggle huge out of sync patches. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Handling wildcard search containing special characters (unicode)
Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? Searching the ML archive and the issue tracker in both SOLR and LUCENE projects didn't provide me a pointer to this problem. One of the reference I found on the web talking about this problem is: http://forum.compass-project.org/message.jspa?messageID=227709 But again, no pointer to a discussion or issue. Thanks in advance for your help, Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Handling wildcard search containing special characters (unicode)
On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT patrick.alla...@gmail.com wrote: Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? this isn't the case, there's no unicode limitation here. more likely, your analyzer is configured to lowercase text, so in the index Kiinteistösihteeri is really kiinteistösihteeri in other words, try kiinteistösih* and see how that works. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013986#comment-13013986 ] Yonik Seeley commented on LUCENE-3003: -- Thanks Dawid, this suggests that we could round up to the 8 byte boundary for free. Move UnInvertedField into Lucene core - Key: LUCENE-3003 URL: https://issues.apache.org/jira/browse/LUCENE-3003 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3003.patch, LUCENE-3003.patch Solr's UnInvertedField lets you quickly lookup all terms ords for a given doc/field. Like, FieldCache, it inverts the index to produce this, and creates a RAM-resident data structure holding the bits; but, unlike FieldCache, it can handle multiple values per doc, and, it does not hold the term bytes in RAM. Rather, it holds only term ords, and then uses TermsEnum to resolve ord - term. This is great eg for faceting, where you want to use int ords for all of your counting, and then only at the end you need to resolve the top N ords to their text. I think this is a useful core functionality, and we should move most of it into Lucene's core. It's a good complement to FieldCache. For this first baby step, I just move it into core and refactor Solr's usage of it. After this, as separate issues, I think there are some things we could explore/improve: * The first-pass that allocates lots of tiny byte[] looks like it could be inefficient. Maybe we could use the byte slices from the indexer for this... * We can improve the RAM efficiency of the TermIndex: if the codec supports ords, and we are operating on one segment, we should just use it. If not, we can use a more RAM-efficient data structure, eg an FST mapping to the ord. * We may be able to improve on the main byte[] representation by using packed ints instead of delta-vInt? * Eventually we should fold this ability into docvalues, ie we'd write the byte[] image at indexing time, and then loading would be fast, instead of uninverting -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
Dr On Mar 31, 2011 9:44 AM, Simon Willnauer (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974] Simon Willnauer commented on LUCENE-2573: - I run a couple of benchmarks with interesting results the graph below show documents per second for the RT branch with DWPT yielding a very good IO/CPU utilization and overall throughput is much better than trunks. !http://people.apache.org/~simonw/DocumentsWriterPerThread_dps.png! Yet, when we look at trunk the peak performance is much better on trunk than on DWPT. The reason for that I think is that we flush concurrently which takes at most one thread out of the loop, those are the little drops in docs/sec. This does not yet explain the reason for the constantly lower max indexing rate, I suspect that this is at least influenced due to the fact that flushing is very very CPU intensive. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk. Eventually, I need to run a profiler and see what is going on. !http://people.apache.org/~simonw/Trunk_dps.png! Interesting is that beside the nice CPU utilization we also have an nearly perfect IO utilization. The graph below shows that we are consistently using IO to flush segments. the width of the bars show the time it took to flush a single DWPT, there is almost no overlap. !http://people.apache.org/~simonw/DocumentsWriterPerThread_flush.png! Overall those are super results! Good job everybody! simon Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #75: POMs out of sync
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-Maven-trunk/75/ 1 tests failed. FAILED: org.apache.solr.schema.TestICUCollationField.org.apache.solr.schema.TestICUCollationField Error Message: Cannot find resource: solr-analysis-extras/conf/solrconfig-icucollate.xml Stack Trace: java.lang.RuntimeException: Cannot find resource: solr-analysis-extras/conf/solrconfig-icucollate.xml at org.apache.solr.SolrTestCaseJ4.getFile(SolrTestCaseJ4.java:1056) at org.apache.solr.schema.TestICUCollationField.setupSolrHome(TestICUCollationField.java:77) at org.apache.solr.schema.TestICUCollationField.beforeClass(TestICUCollationField.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:146) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) at $Proxy0.invoke(Unknown Source) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:145) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:87) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) Build Log (for compile errors): [...truncated 18422 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3006) Javadocs warnings should fail the build
[ https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014011#comment-13014011 ] Steven Rowe commented on LUCENE-3006: - bq. This patch eliminates javadoc warnings on trunk under Sun JDK 1.5.0_22 and 1.6.0_21 for Lucene, and for just 1.6.0_21 on Solr. Committed: - r1087319: trunk - r1087329: branch_3x On branch_3x, under both Sun JDK 1.5.0_22 and 1.6.0_21, there are no javadoc warnings for either Solr or Lucene. Javadocs warnings should fail the build --- Key: LUCENE-3006 URL: https://issues.apache.org/jira/browse/LUCENE-3006 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.2, 4.0 Reporter: Grant Ingersoll Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, LUCENE-3006.patch, LUCENE-3006.patch We should fail the build when there are javadocs warnings, as this should not be the Release Manager's job to fix all at once right before the release. See http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Using contrib Lucene Benchmark with Solr
Thanks Robert and Grant, Does this need a separate JIRA issue dealing specifically with the ability of benchmark to read Solr config settings, or is it subsumed in LUCENE-2845? or should I just add a comment to LUCENE-2845? Tom -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, March 30, 2011 7:56 PM To: dev@lucene.apache.org Subject: Re: Using contrib Lucene Benchmark with Solr On Wed, Mar 30, 2011 at 4:49 PM, Burton-West, Tom tburt...@umich.edu wrote: I would like to be able to use the Lucene Benchmark code with Solr to run some indexing tests. It would be nice if Lucene Benchmark to could read Solr configuration rather than having to translate my filter chain and other parameters into Lucene. Would it be appropriate to open a JIRA issue for this or is this something that doesn’t really make any sense? I think it makes great sense, we moved the benchmarking facility to a toplevel module so we can do this: https://issues.apache.org/jira/browse/LUCENE-2845, but we didn't actually add any integration yet. I've been in this exact same situation too when trying to use the benchmark package, and I'd sure like to see better solr integration with the benchmarking package myself. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014016#comment-13014016 ] Jason Rutherglen commented on LUCENE-2573: -- bq. influenced due to the fact that flushing is very very CPU intensive Do you think this is due mostly to the vint decoding? We're not interleaving postings on flush with this patch so the CPU consumption should be somewhat lower. bq. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk This's probably the more likely case. In general, we may be able to default to a higher overall RAM buffer size, and perhaps there won't be degradation in indexing performance like there is with trunk? In the future with RT we could get fancy and selectively merge segments as we're flushing, if writing larger segments is important. I'd personally prefer to write out 1-2 GB segments, and limit the number of DWPTs to 2-3, mainly for servers that are concurrently indexing and searching (eg, the RT use case). I think the current default number of thread states is a bit high. Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure
This on is weird seems like there is a synchronized missing on FieldInfoBiMap#containsConsistent I try to reproduce first. simon On Thu, Mar 31, 2011 at 11:37 AM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/ 3 tests failed. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280) at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302) at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345) at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244) REGRESSION: org.apache.lucene.index.TestSegmentTermDocs.test Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) at org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45) REGRESSION: org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) Build Log (for compile errors): [...truncated 3276 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Brainstorming on Improving the Release Process
Other things to add: 1. Managing our website is a big pain in the butt. Why do we need to publish PDFs again? We really need to get on the new CMS. 2. Copying/moving the artifacts to the release area could be automated, too At the end of the day, #1 below is what strikes me as the biggest impediment to releases. -Original Message- From: ext Grant Ingersoll [mailto:gsing...@apache.org] Sent: Wednesday, March 30, 2011 8:22 AM To: dev@lucene.apache.org Subject: Brainstorming on Improving the Release Process (Long post, please bear with me and please read!) Now that we have the release done (I'm working through the publication process now), I want to start the process of thinking about how we can improve the release process. As I see it, building the artifacts and checking the legal items are now almost completely automated and testable at earlier stages in the game. We have kept saying we want to release more often, but we have never defined actionable steps with which we can get there. Goals without actionable steps are useless. So, with that in mind, I'd like to brainstorm on how we can improve things a bit more. Several us acted as RM this time around, so I think we have some common, shared knowledge to take advantage of this time as opposed to in the past where one person mostly just did the release in the background and then we all voted. So, let's start with what we have right: 1. The Ant process for building a release candidate for both Lucene and Solr is almost identical now and fairly straightforward. 2. I think the feature freeze is a good thing, although it is a bit too long perhaps. 3. Pretty good documentation on the steps involved to branch, etc. 4. The new license validation stuff is a start for enforcing licensing up front more effectively. What else can we validate up front in terms of packaging? 5. We have an awesome test infrastructure now. I think it is safe to say that this version of Lucene is easily the most tested version we have ever shipped. Things I see that can be improved, and these are only suggestions: 1. We need to define the Minimum Effective Dose (MED - http://gizmodo.com/#!5709902/4+hour-body-the-principle-of-the-minimum-effective-dose) for producing a quality release. Nothing more, nothing less. I think one of our biggest problems is we don't know when we are done. It's this loosey-goosey we all agree notion, but that's silly. It's software, we should be able to test almost all of the artifacts for certain attributes and then release when they pass. If we get something wrong, put in a test for it in the next release. The old saying about perfect being the enemy of great applies here. In other words, we don't have well defined things that we all are looking for when vetting a release candidate, other than what the ASF requires. Look at the last few vote threads or for any of the previous threads. It's obvious that we have a large variety of people doing a large variety of things when it comes to testing the candidates. For instance, I do the following: a. check sigs., md5 hashes, etc. b. run the demos, c. run the Solr example and index some content, d. check over the LICENSE, NOTICE, CHANGES files e. Check the overall packaging, etc. is reasonable f. I run them through my training code Others clearly do many other things. Many of you have your own benchmark tests you run, others read over every last bit of documentation others still put the RC into their own application and test it. All of this is good, but the problem is it is not _shared_ until the actual RC is up and it is not repeatable (not that all of it can be). If you have benchmark code/tests that your run on an RC that doesn't involve proprietary code, why isn't it donated to the project so that we can all use it? That way we don't have to wait until your -1 at the 11th hour to realize the RC is not good. I personally don't care whether it's python or perl or whatever. Something that works is better than nothing. For instance, right now some of the committers have an Apache Extras project going for benchmarking. Can we get this running on ASF resources on a regular basis? If it's a computing resource issue, let's go to Infrastructure and ask for resources. Infrastructure has repeatedly said that if a project needs resources to put together a proposal of what you want. I bet we could get budget to spin up an EC2 instance once a week, run those long running tests (Test2B and other benchmarks) and then report back. All of that can be automated. Also, please think hard about whether the things you test can be automated and built into our test suite or at least run nightly or something on Jenkins and then donating them. I know reading documentation can't, but what else? For instance, could we auto-generate the file
Re: Using contrib Lucene Benchmark with Solr
On Thu, Mar 31, 2011 at 11:24 AM, Burton-West, Tom tburt...@umich.edu wrote: Thanks Robert and Grant, Does this need a separate JIRA issue dealing specifically with the ability of benchmark to read Solr config settings, or is it subsumed in LUCENE-2845? or should I just add a comment to LUCENE-2845? I think full integration with Solr might be a lot of work? So i would start with opening an issue to address your particular itch (e.g. benchmarking a Analyzer thats instantiated from a solr schema). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Brainstorming on Improving the Release Process
On Mar 31, 2011, at 11:51 AM, Marvin Humphrey wrote: On Thu, Mar 31, 2011 at 11:45:53AM -0400, Grant Ingersoll wrote: Why do we need to publish PDFs again? IIRC, publishing PDFs is the default in Forrest. It might have been a passive choice. Yeah, it is. I know. Just one more thing to worry about when it is broken. I think we need to simplify across a lot of our processes and get back to what I said earlier Minimum Effective Dose when it comes to builds, releases, etc. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene.apache.org download link lucene/solr?
On the front page, in the announcement: News 31 March 2011 - Lucene Core 3.1 and Solr 3.1 Available The Lucene PMC is... after Solr 1.4.1. Lucene can be downloaded from: The Lucene download link says /java put actually points to /solr. -cks - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks
[ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014046#comment-13014046 ] Michael Busch commented on LUCENE-2573: --- Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive. We should probably do some profiling to figure out where the time is spent. I can probably do that Sunday, but feel free to beat me :) Tiered flushing of DWPTs by RAM with low/high water marks - Key: LUCENE-2573 URL: https://issues.apache.org/jira/browse/LUCENE-2573 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Simon Willnauer Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) - Flush all DWPTs at a high water mark (e.g. at 110%) - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: lucene.apache.org download link lucene/solr?
On Thu, Mar 31, 2011 at 12:07 PM, Christopher St John ckstj...@gmail.com wrote: On the front page, in the announcement: News 31 March 2011 - Lucene Core 3.1 and Solr 3.1 Available The Lucene PMC is... after Solr 1.4.1. Lucene can be downloaded from: The Lucene download link says /java put actually points to /solr. thank you! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Handling wildcard search containing special characters (unicode)
2011/3/31 Robert Muir rcm...@gmail.com: On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT patrick.alla...@gmail.com wrote: Hello, Facing a Solr issue, I have been told that queries with a term like: Kiinteistösih* will not match the Finnish word Kiinteistösihteeri and that it's a known limitation of Lucene. Instead, using the word directly, without wildcard, works. Do you confirm this a known limitation/bug? If so do you have any registered issue about that? this isn't the case, there's no unicode limitation here. more likely, your analyzer is configured to lowercase text, so in the index Kiinteistösihteeri is really kiinteistösihteeri in other words, try kiinteistösih* and see how that works. Following your suggestion, I tested with: kiinteistösih* but it doesn't show me the intended result. I have found the reason why, this is because of the ISOLatin1AccentFilterFactory filter which is present for both the index and query analyzer. Searching with: kiinteistosih* did the trick. One question remains now: why should I lowercase terms containing a wildcard and making the ISO Latin1 accent conversion myself while I do have: analyzer type=query ... filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ ... for the corresponding fieldType? I would have guessed it would does it for me. Your reply helped me a lot understanding what's going on. Thank you very much for your participation! Patrick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2981: Attachment: LUCENE-2981.patch patch file implementing grant's suggestions. Review and potentially remove unused/unsupported Contribs - Key: LUCENE-2981 URL: https://issues.apache.org/jira/browse/LUCENE-2981 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 3.2, 4.0 Attachments: LUCENE-2981.patch Some of our contribs appear to be lacking for development/support or are missing tests. We should review whether they are even pertinent these days and potentially deprecate and remove them. One of the things we did in Mahout when bringing in Colt code was to mark all code that didn't have tests as @deprecated and then we removed the deprecation once tests were added. Those that didn't get tests added over about a 6 mos. period of time were removed. I would suggest taking a hard look at: ant db lucli swing (spatial should be gutted to some extent and moved to modules) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014057#comment-13014057 ] Ryan McKinley commented on LUCENE-2981: --- +1 for 4.0 -0 for 3.2 Review and potentially remove unused/unsupported Contribs - Key: LUCENE-2981 URL: https://issues.apache.org/jira/browse/LUCENE-2981 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 3.2, 4.0 Attachments: LUCENE-2981.patch Some of our contribs appear to be lacking for development/support or are missing tests. We should review whether they are even pertinent these days and potentially deprecate and remove them. One of the things we did in Mahout when bringing in Colt code was to mark all code that didn't have tests as @deprecated and then we removed the deprecation once tests were added. Those that didn't get tests added over about a 6 mos. period of time were removed. I would suggest taking a hard look at: ant db lucli swing (spatial should be gutted to some extent and moved to modules) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014060#comment-13014060 ] Grant Ingersoll commented on LUCENE-2981: - +1 for 4.0 I'm fine w/ 3.2, too, FWIW. I can't remember the last time someone submitted a patch or even reported a bug on any of these or even asked about them on user@. Review and potentially remove unused/unsupported Contribs - Key: LUCENE-2981 URL: https://issues.apache.org/jira/browse/LUCENE-2981 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 3.2, 4.0 Attachments: LUCENE-2981.patch Some of our contribs appear to be lacking for development/support or are missing tests. We should review whether they are even pertinent these days and potentially deprecate and remove them. One of the things we did in Mahout when bringing in Colt code was to mark all code that didn't have tests as @deprecated and then we removed the deprecation once tests were added. Those that didn't get tests added over about a 6 mos. period of time were removed. I would suggest taking a hard look at: ant db lucli swing (spatial should be gutted to some extent and moved to modules) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3010) Add the ability for the Lucene Benchmarkcode to read Solr configuration information for testing Analyzer/Filter Chains
Add the ability for the Lucene Benchmarkcode to read Solr configuration information for testing Analyzer/Filter Chains --- Key: LUCENE-3010 URL: https://issues.apache.org/jira/browse/LUCENE-3010 Project: Lucene - Java Issue Type: Wish Components: contrib/benchmark Reporter: Tom Burton-West Priority: Trivial I would like to be able to use the Lucene Benchmark code in Lucene contrib with Solr to run some indexing tests. It would be nice if Lucene Benchmark could read my Solr configuration rather than having to translate my filter chain and other parameters into Lucene java code. This relates to Lucene 2845, -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3010) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains
[ https://issues.apache.org/jira/browse/LUCENE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-3010: Summary: Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains (was: Add the ability for the Lucene Benchmarkcode to read Solr configuration information for testing Analyzer/Filter Chains) Add the ability for the Lucene Benchmark code to read Solr configuration information for testing Analyzer/Filter Chains Key: LUCENE-3010 URL: https://issues.apache.org/jira/browse/LUCENE-3010 Project: Lucene - Java Issue Type: Wish Components: contrib/benchmark Reporter: Tom Burton-West Priority: Trivial I would like to be able to use the Lucene Benchmark code in Lucene contrib with Solr to run some indexing tests. It would be nice if Lucene Benchmark could read my Solr configuration rather than having to translate my filter chain and other parameters into Lucene java code. This relates to Lucene 2845, -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014070#comment-13014070 ] Andi Vajda commented on LUCENE-2981: Unless there are users, I'm +1 for removing db anytime. Last time I fixed something there was for the Java version of db, a contribution by someone else I haven't heard of in years. I haven't heard from any users with questions or bug reports in a long time either. Review and potentially remove unused/unsupported Contribs - Key: LUCENE-2981 URL: https://issues.apache.org/jira/browse/LUCENE-2981 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 3.2, 4.0 Attachments: LUCENE-2981.patch Some of our contribs appear to be lacking for development/support or are missing tests. We should review whether they are even pertinent these days and potentially deprecate and remove them. One of the things we did in Mahout when bringing in Colt code was to mark all code that didn't have tests as @deprecated and then we removed the deprecation once tests were added. Those that didn't get tests added over about a 6 mos. period of time were removed. I would suggest taking a hard look at: ant db lucli swing (spatial should be gutted to some extent and moved to modules) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6567 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6567/ All tests passed Build Log (for compile errors): [...truncated 47 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/6568/ 2 tests failed. FAILED: init.org.apache.lucene.util.TestBitVector Error Message: org.apache.lucene.util.TestBitVector Stack Trace: java.lang.ClassNotFoundException: org.apache.lucene.util.TestBitVector at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) FAILED: init.org.apache.lucene.util.TestFieldCacheSanityChecker Error Message: org.apache.lucene.util.TestFieldCacheSanityChecker Stack Trace: java.lang.ClassNotFoundException: org.apache.lucene.util.TestFieldCacheSanityChecker at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) Build Log (for compile errors): [...truncated 47 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2981) Review and potentially remove unused/unsupported Contribs
[ https://issues.apache.org/jira/browse/LUCENE-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014108#comment-13014108 ] Earwin Burrfoot commented on LUCENE-2981: - Bye-bye, DB. Few things can compete with it in pointlessness. Review and potentially remove unused/unsupported Contribs - Key: LUCENE-2981 URL: https://issues.apache.org/jira/browse/LUCENE-2981 Project: Lucene - Java Issue Type: Improvement Reporter: Grant Ingersoll Fix For: 3.2, 4.0 Attachments: LUCENE-2981.patch Some of our contribs appear to be lacking for development/support or are missing tests. We should review whether they are even pertinent these days and potentially deprecate and remove them. One of the things we did in Mahout when bringing in Colt code was to mark all code that didn't have tests as @deprecated and then we removed the deprecation once tests were added. Those that didn't get tests added over about a 6 mos. period of time were removed. I would suggest taking a hard look at: ant db lucli swing (spatial should be gutted to some extent and moved to modules) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing
I killed hanging java processes! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Hudson Server [mailto:hud...@hudson.apache.org] Sent: Thursday, March 31, 2011 8:10 PM To: dev@lucene.apache.org Subject: [HUDSON] Lucene-Solr-tests-only-3.x - Build # 6568 - Still Failing Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only- 3.x/6568/ 2 tests failed. FAILED: init.org.apache.lucene.util.TestBitVector Error Message: org.apache.lucene.util.TestBitVector Stack Trace: java.lang.ClassNotFoundException: org.apache.lucene.util.TestBitVector at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) FAILED: init.org.apache.lucene.util.TestFieldCacheSanityChecker Error Message: org.apache.lucene.util.TestFieldCacheSanityChecker Stack Trace: java.lang.ClassNotFoundException: org.apache.lucene.util.TestFieldCacheSanityChecker at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) Build Log (for compile errors): [...truncated 47 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-391) Luke.Net for Lucene.Net
[ https://issues.apache.org/jira/browse/LUCENENET-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014116#comment-13014116 ] Sergey Mirvoda commented on LUCENENET-391: -- Notice guys We renamed the project. FYI latest version works very good on mono. Luke.Net for Lucene.Net --- Key: LUCENENET-391 URL: https://issues.apache.org/jira/browse/LUCENENET-391 Project: Lucene.Net Issue Type: New Feature Components: Lucene.Net Contrib Reporter: Pasha Bizhan Assignee: Sergey Mirvoda Priority: Minor Labels: Luke.Net Fix For: Lucene.Net 2.9.4 Attachments: luke-net-bin.zip, luke-net-src.zip Create a port of Java Luke to .NET for use with Lucene.Net See attachments for a 1.4 compatible version or https://bitbucket.org/thoward/luke.net-incbuating for a partial implementation that is 2.9.2 compatible. The attached version was contributed by Pasha Bizhan, and the bitbucket version was contributed by Aaron Powell (above version is a fork, original at https://bitbucket.org/slace/luke.net). If source code from either is used, a software grant must be provided from the original authors. The final version should be 2.9.4 compatible and implement most or all features of Java Luke 1.0.1 (see http://code.google.com/p/luke/ ). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-397) Resolution of the legal issues
[ https://issues.apache.org/jira/browse/LUCENENET-397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014117#comment-13014117 ] Sergey Mirvoda commented on LUCENENET-397: -- We decided to rename project and re implement it from scratch as much as possible but based on top of Pasha's work. Resolution of the legal issues -- Key: LUCENENET-397 URL: https://issues.apache.org/jira/browse/LUCENENET-397 Project: Lucene.Net Issue Type: Sub-task Components: Lucene.Net Contrib Reporter: Scott Lombard Assignee: Troy Howard Priority: Blocker Labels: Luke.Net Fix For: Lucene.Net 2.9.4 Resolution of the legal issues around ingesting the code into Lucene.Net. Coordinate with Aaron Powell to obtain software grant paperwork. Per Stefan Bodewig (Incubating Mentor): All it takes is: * attach the code to a JIRA ticket. * have software grants signed by all contributors to the original code base. * write a single page for the Incubator site * start a vote on Incubator general and wait for 72 hours. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Apache Lucene 3.1.0 is available
March 2011, Apache Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/java (see note below). See the CHANGES.txt file included with the release for a full list of details. Lucene 3.1 Release Highlights * Numerous performance improvements: faster exact PhraseQuery; merging favors segments with deletions; primary key lookup is faster; IndexWriter.addIndexes(Directory[]) uses file copy instead of merging; various Directory performance improvements; compound file is dynamically turned off for large segments; fully deleted segments are dropped on commit; faster snowball analyzers (in contrib); ConcurrentMergeScheduler is more careful about setting priority of merge threads. * ReusableAnalyzerBase makes it easier to reuse TokenStreams correctly. * Improved Analysis capabilities: Improved Unicode support, including Unicode 4, more friendly term handling (CharTermAttribute), easier object reuse and better support for protected words in lossy token filters (e.g. stemmers). * ConstantScoreQuery now allows directly wrapping a Query. * IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates. * IndexWriter.getReader is replaced by IndexReader.open(IndexWriter). In addition you can now specify whether deletes should be resolved when you open an NRT reader. * MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher. * On 64bit Windows and Solaris JVMs, MMapDirectory is now the default implementation (returned by FSDirectory.open). MMapDirectory also enables unmapping if the JVM supports it. * New TotalHitCountCollector just counts total number of hits. * ReaderFinishedListener API enables external caches to evict entries once a segment is finished. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Apache Solr 3.1.0 available
March 2011, Apache Solr 3.1 available The Lucene PMC is pleased to announce the release of Apache Solr 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below). See the CHANGES.txt file included with the release for a full list of details as well as instructions on upgrading. What's in a Version? The version number for Solr 3.1 was chosen to reflect the merge of development with Lucene, which is currently also on 3.1. Going forward, we expect the Solr version to be the same as the Lucene version. Solr 3.1 contains Lucene 3.1 and is the release after Solr 1.4.1. Solr 3.1 Release Highlights * Numeric range facets (similar to date faceting). * New spatial search, including spatial filtering, boosting and sorting capabilities. * Example Velocity driven search UI at http://localhost:8983/solr/browse * A new termvector-based highlighter * Extend dismax (edismax) query parser which addresses some missing features in the dismax query parser along with some extensions. * Several more components now support distributed mode: TermsComponent, SpellCheckComponent. * A new Auto Suggest component. * Ability to sort by functions. * JSON document indexing * CSV response format * Apache UIMA integration for metadata extraction * Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well as new analysis capabilities. * Numerous improvements, bug fixes, and optimizations. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Apache Solr 3.1.0
March 2011, Apache Solr 3.1 available The Lucene PMC is pleased to announce the release of Apache Solr 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below). See the CHANGES.txt file included with the release for a full list of details as well as instructions on upgrading. What's in a Version? The version number for Solr 3.1 was chosen to reflect the merge of development with Lucene, which is currently also on 3.1. Going forward, we expect the Solr version to be the same as the Lucene version. Solr 3.1 contains Lucene 3.1 and is the release after Solr 1.4.1. Solr 3.1 Release Highlights * Numeric range facets (similar to date faceting). * New spatial search, including spatial filtering, boosting and sorting capabilities. * Example Velocity driven search UI at http://localhost:8983/solr/browse * A new termvector-based highlighter * Extend dismax (edismax) query parser which addresses some missing features in the dismax query parser along with some extensions. * Several more components now support distributed mode: TermsComponent, SpellCheckComponent. * A new Auto Suggest component. * Ability to sort by functions. * JSON document indexing * CSV response format * Apache UIMA integration for metadata extraction * Leverages Lucene 3.1 and it's inherent optimizations and bug fixes as well as new analysis capabilities. * Numerous improvements, bug fixes, and optimizations. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access.
Apache Lucene 3.1.0
March 2011, Apache Lucene 3.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.1. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at http://www.apache.org/dyn/closer.cgi/lucene/java (see note below). See the CHANGES.txt file included with the release for a full list of details. Lucene 3.1 Release Highlights * Numerous performance improvements: faster exact PhraseQuery; merging favors segments with deletions; primary key lookup is faster; IndexWriter.addIndexes(Directory[]) uses file copy instead of merging; various Directory performance improvements; compound file is dynamically turned off for large segments; fully deleted segments are dropped on commit; faster snowball analyzers (in contrib); ConcurrentMergeScheduler is more careful about setting priority of merge threads. * ReusableAnalyzerBase makes it easier to reuse TokenStreams correctly. * Improved Analysis capabilities: Improved Unicode support, including Unicode 4, more friendly term handling (CharTermAttribute), easier object reuse and better support for protected words in lossy token filters (e.g. stemmers). * ConstantScoreQuery now allows directly wrapping a Query. * IndexWriter is now configured with a new separate builder API, IndexWriterConfig. You can now control IndexWriter's previously fixed internal thread limit by calling setMaxThreadStates. * IndexWriter.getReader is replaced by IndexReader.open(IndexWriter). In addition you can now specify whether deletes should be resolved when you open an NRT reader. * MultiSearcher is deprecated; ParallelMultiSearcher has been absorbed directly into IndexSearcher. * On 64bit Windows and Solaris JVMs, MMapDirectory is now the default implementation (returned by FSDirectory.open). MMapDirectory also enables unmapping if the JVM supports it. * New TotalHitCountCollector just counts total number of hits. * ReaderFinishedListener API enables external caches to evict entries once a segment is finished. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. -- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org
Re: Brainstorming on Improving the Release Process
On Thu, 31 Mar 2011 09:51 -0400, Robert Muir rcm...@gmail.com wrote: On Thu, Mar 31, 2011 at 9:40 AM, Upayavira u...@odoko.co.uk wrote: Are you willing to say more? I have a little time, and have done a lot of work with Ant. Maybe I could help. Upayavira Thanks, there is some followup discussion on this JIRA issue: https://issues.apache.org/jira/browse/SOLR-2002 The prototype patch I refer to in the comments where solr build system is changed to extend lucene's is the latest _merged.patch on the issue: https://issues.apache.org/jira/secure/attachment/12456811/SOLR-2002_merged.patch (Additionally as sort of a followup there are more comments/ideas about additional things we could do besides just refactoring the build system to be faster and simpler) As a first step I think the patch needs to be brought up to trunk (it gets out of date fast). I mentioned on the issue we can simply create a branch to make coordination easier. A branch might seem silly for a thing like this, but it would at least allow us to work together and people could contribute parts (e.g. PMD integration or something) without having to juggle huge out of sync patches. Thx. I'll take a look in the (uk) morning. Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas
Add assertQScore() to SolrTestCaseJ4 to account for small deltas - Key: SOLR-2451 URL: https://issues.apache.org/jira/browse/SOLR-2451 Project: Solr Issue Type: Improvement Affects Versions: Next Reporter: David Smiley Priority: Minor Attachments: SOLR-2451_assertQScore.patch Attached is a patch that adds the following method to SolrTestCaseJ4: (just javadoc signature shown) {code:java} /** * Validates that the document at the specified index in the results has the specified score, within 0.0001. */ public static void assertQScore(SolrQueryRequest req, int docIdx, float targetScore) { {code} This is especially useful for geospatial in which slightly different precision deltas might occur when trying different geospatial indexing strategies are used, assuming the score is some geospatial distance. This patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2451) Add assertQScore() to SolrTestCaseJ4 to account for small deltas
[ https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2451: --- Attachment: SOLR-2451_assertQScore.patch Add assertQScore() to SolrTestCaseJ4 to account for small deltas - Key: SOLR-2451 URL: https://issues.apache.org/jira/browse/SOLR-2451 Project: Solr Issue Type: Improvement Affects Versions: Next Reporter: David Smiley Priority: Minor Attachments: SOLR-2451_assertQScore.patch Attached is a patch that adds the following method to SolrTestCaseJ4: (just javadoc signature shown) {code:java} /** * Validates that the document at the specified index in the results has the specified score, within 0.0001. */ public static void assertQScore(SolrQueryRequest req, int docIdx, float targetScore) { {code} This is especially useful for geospatial in which slightly different precision deltas might occur when trying different geospatial indexing strategies are used, assuming the score is some geospatial distance. This patch makes a simple modification to DistanceFunctionTest to use it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3006) Javadocs warnings should fail the build
[ https://issues.apache.org/jira/browse/LUCENE-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-3006: Attachment: LUCENE-3006.patch Here's the patch I just committed. Javadocs warnings should fail the build --- Key: LUCENE-3006 URL: https://issues.apache.org/jira/browse/LUCENE-3006 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.2, 4.0 Reporter: Grant Ingersoll Attachments: LUCENE-3006-javadoc-warning-cleanup.patch, LUCENE-3006.patch, LUCENE-3006.patch, LUCENE-3006.patch We should fail the build when there are javadocs warnings, as this should not be the Release Manager's job to fix all at once right before the release. See http://www.lucidimagination.com/search/document/14bd01e519f39aff/brainstorming_on_improving_the_release_process -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [HUDSON] Lucene-Solr-tests-only-trunk - Build # 6565 - Failure
I just committed a fix for this simon On Thu, Mar 31, 2011 at 5:28 PM, Simon Willnauer simon.willna...@googlemail.com wrote: This on is weird seems like there is a synchronized missing on FieldInfoBiMap#containsConsistent I try to reproduce first. simon On Thu, Mar 31, 2011 at 11:37 AM, Apache Hudson Server hud...@hudson.apache.org wrote: Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6565/ 3 tests failed. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.index.FieldInfos.putInternal(FieldInfos.java:280) at org.apache.lucene.index.FieldInfos.clone(FieldInfos.java:302) at org.apache.lucene.index.SegmentInfo.clone(SegmentInfo.java:345) at org.apache.lucene.index.SegmentInfos.clone(SegmentInfos.java:374) at org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:165) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:244) REGRESSION: org.apache.lucene.index.TestSegmentTermDocs.test Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) at org.apache.lucene.index.TestSegmentTermDocs.tearDown(TestSegmentTermDocs.java:45) REGRESSION: org.apache.lucene.index.codecs.preflex.TestSurrogates.testSurrogatesOrder Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:521) Build Log (for compile errors): [...truncated 3276 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2338) improved per-field similarity integration into schema.xml
[ https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-2338. --- Resolution: Fixed Fix Version/s: 4.0 Committed revision 1087430. Thanks hoss and yonik for feedback. improved per-field similarity integration into schema.xml - Key: SOLR-2338 URL: https://issues.apache.org/jira/browse/SOLR-2338 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: SOLR-2338.patch, SOLR-2338.patch, SOLR-2338.patch Currently since LUCENE-2236, we can enable Similarity per-field, but in schema.xml there is only a 'global' factory for the SimilarityProvider. In my opinion this is too low-level because to customize Similarity on a per-field basis, you have to set your own CustomSimilarityProvider with similarity class=.../ and manage the per-field mapping yourself in java code. Instead I think it would be better if you just specify the Similarity in the FieldType, like after analyzer. As far as the example, one idea from LUCENE-1360 was to make a short_text or metadata_text used by the various metadata fields in the example that has better norm quantization for its shortness... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014188#comment-13014188 ] Robert Muir commented on SOLR-2061: --- I think this issue just needs the maven parts to be resynced to the fact that lucene's tests-framework jar was renamed? Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Robert Muir Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned SOLR-2061: - Assignee: Steven Rowe (was: Robert Muir) Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6584 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6584/ 3 tests failed. REGRESSION: org.apache.solr.client.solrj.embedded.SolrExampleJettyTest.testCommitWithin Error Message: expected:0 but was:1 Stack Trace: junit.framework.AssertionFailedError: expected:0 but was:1 at org.apache.solr.client.solrj.SolrExampleTests.testCommitWithin(SolrExampleTests.java:365) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) REGRESSION: org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration Error Message: null Stack Trace: org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:183) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:333) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:242) at org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:216) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:16662/solr within 5000 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:69) at org.apache.solr.cloud.ZkController.init(ZkController.java:104) at org.apache.solr.core.CoreContainer.initZooKeeper(CoreContainer.java:164) REGRESSION: org.apache.solr.cloud.ZkControllerTest.testUploadToCloud Error Message: KeeperErrorCode = ConnectionLoss for /configs/config1/synonyms.txt Stack Trace: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/config1/synonyms.txt at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:347) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:308) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:290) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:255) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:384) at org.apache.solr.common.cloud.SolrZkClient.setData(SolrZkClient.java:410) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:520) at org.apache.solr.cloud.ZkControllerTest.testUploadToCloud(ZkControllerTest.java:191) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) Build Log (for compile errors): [...truncated 8836 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-2155: --- Attachment: SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch Attached is a new patch. The highlights are: * Requires the latest Solr trunk -- probably anything in the last few months: If this is ultimately going to get committed then this needed to happen. There are only some slight differences so if you really need an earlier trunk then I'm sure you'll figure it out. * Adds support for sorting, including multi-value: Use the existing geodist() function query with a lat-lon constant and a reference to your geohash based field. Note that this works by loading all points from the field into memory, resolving each underlying full-length geohash into the lat lon into a data structure which is a ListPoint2D[]. This is improved over Bill's patch, surely, but it could use some optimization. It's not optimized for the single-value case either; that's a definite TODO. * Polygon/WKT features have been omitted due to LGPL licensing concerns of JTS. I've left hooks for their implementation to make adding on this capability that already existed easy. You'll easily figure it out if you are so inclined. I might ad this as a patch shortly (not to be committed) when I get some time; but longer term it will re-surface under a separate project. Don't worry; it'll be painless to use if you need it. * This might be controversial but as part of this patch, I removed the ghhsin() and geohash() function queries. Their presence was confusing; I simply don't see what point there is too them now that this patch fleshes out the geohash capability. * I decided to pre-register my SpatialGeoHashFilterQParser as geohashfilt, instead of requiring you to do so in solrconfig.xml. You could use geofilt for point-radius queries but I prefer this one since I can specify the bbox explicitly. There are a few slight changes to GeoHashPrefixFilter that crept in from unfinished work (notably tying sorting to filtering in an efficient way) but it is harmless. Bill, thanks for kick-starting the multi-value sorting. I re-used most of your code. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6590 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6590/ 1 tests failed. REGRESSION: org.apache.solr.cloud.ZkSolrClientTest.testConnect Error Message: Could not connect to ZooKeeper 127.0.0.1:39750/solr within 3 ms Stack Trace: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 127.0.0.1:39750/solr within 3 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:121) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:84) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:65) at org.apache.solr.cloud.ZkSolrClientTest.testConnect(ZkSolrClientTest.java:43) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) Build Log (for compile errors): [...truncated 8701 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6591 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6591/ 1 tests failed. REGRESSION: org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator Error Message: expected:[,䀘䀌 䰁䨀@ 䀀 ကࠀЀ] but was:[foobar] Stack Trace: at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.assertEqualCollation(TestPerfTasksLogic.java:969) at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator(TestPerfTasksLogic.java:939) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) Build Log (for compile errors): [...truncated 6348 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 6592 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/6592/ 1 tests failed. FAILED: org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator Error Message: expected:[,䀘䀌 䰁䨀@ 䀀 ကࠀЀ] but was:[foobar] Stack Trace: at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.assertEqualCollation(TestPerfTasksLogic.java:969) at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testCollator(TestPerfTasksLogic.java:939) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1221) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1149) Build Log (for compile errors): [...truncated 6359 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing
[ https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014355#comment-13014355 ] Hoss Man commented on LUCENE-3009: -- I question if we really need to bother with binary lucene / module tar/zip artifacts -- if we only had source release packages, then the build.xml files make it clear exactly what the dpeendencies for each piece of code are. for Solr, a large percentage of hte user base doesn't know anything about java -- so it definitely makes sense to have artifacts with precompiled jars; but if you're using the java libraries directly, you're a java programer, and you should be able to run ant compile on a src release (or use the maven to fetch the published jars with poms that link to the appropriate dependencies) binary packaging: lucene modules/contribs that depend on jars are confusing Key: LUCENE-3009 URL: https://issues.apache.org/jira/browse/LUCENE-3009 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Fix For: 3.2, 4.0 In the binary release, i noticed lucene contribs (for example benchmark) that rely upon jar files, don't include them, nor do they have a README telling you they depend upon them, nor is there any hint they actually have any dependencies at all! we should improve this either by including the jars you need or by including a README.txt telling you what a particular module/contrib depends upon. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing
[ https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014357#comment-13014357 ] Robert Muir commented on LUCENE-3009: - When i brought up the idea of source code only, it didn't seem too popular. That being said, if we go source code only, the maven stuff should be source-code only too. binary packaging: lucene modules/contribs that depend on jars are confusing Key: LUCENE-3009 URL: https://issues.apache.org/jira/browse/LUCENE-3009 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Fix For: 3.2, 4.0 In the binary release, i noticed lucene contribs (for example benchmark) that rely upon jar files, don't include them, nor do they have a README telling you they depend upon them, nor is there any hint they actually have any dependencies at all! we should improve this either by including the jars you need or by including a README.txt telling you what a particular module/contrib depends upon. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3009) binary packaging: lucene modules/contribs that depend on jars are confusing
[ https://issues.apache.org/jira/browse/LUCENE-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014362#comment-13014362 ] Steven Rowe commented on LUCENE-3009: - bq, the maven stuff should be source-code only too. -1. (mutually exclusive concepts) binary packaging: lucene modules/contribs that depend on jars are confusing Key: LUCENE-3009 URL: https://issues.apache.org/jira/browse/LUCENE-3009 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Fix For: 3.2, 4.0 In the binary release, i noticed lucene contribs (for example benchmark) that rely upon jar files, don't include them, nor do they have a README telling you they depend upon them, nor is there any hint they actually have any dependencies at all! we should improve this either by including the jars you need or by including a README.txt telling you what a particular module/contrib depends upon. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3003) Move UnInvertedField into Lucene core
[ https://issues.apache.org/jira/browse/LUCENE-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-3003: Attachment: byte_size_32-bit-openjdk6.txt Attached: 32-bit results Move UnInvertedField into Lucene core - Key: LUCENE-3003 URL: https://issues.apache.org/jira/browse/LUCENE-3003 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3003.patch, LUCENE-3003.patch, byte_size_32-bit-openjdk6.txt Solr's UnInvertedField lets you quickly lookup all terms ords for a given doc/field. Like, FieldCache, it inverts the index to produce this, and creates a RAM-resident data structure holding the bits; but, unlike FieldCache, it can handle multiple values per doc, and, it does not hold the term bytes in RAM. Rather, it holds only term ords, and then uses TermsEnum to resolve ord - term. This is great eg for faceting, where you want to use int ords for all of your counting, and then only at the end you need to resolve the top N ords to their text. I think this is a useful core functionality, and we should move most of it into Lucene's core. It's a good complement to FieldCache. For this first baby step, I just move it into core and refactor Solr's usage of it. After this, as separate issues, I think there are some things we could explore/improve: * The first-pass that allocates lots of tiny byte[] looks like it could be inefficient. Maybe we could use the byte slices from the indexer for this... * We can improve the RAM efficiency of the TermIndex: if the codec supports ords, and we are operating on one segment, we should just use it. If not, we can use a more RAM-efficient data structure, eg an FST mapping to the ord. * We may be able to improve on the main byte[] representation by using packed ints instead of delta-vInt? * Eventually we should fold this ability into docvalues, ie we'd write the byte[] image at indexing time, and then loading would be fast, instead of uninverting -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2061) Generate jar containing test classes.
[ https://issues.apache.org/jira/browse/SOLR-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated SOLR-2061: -- Attachment: SOLR-2061.patch This patch brings the Maven aspects up to snuff. All tests pass under Ant and Maven. {{generate-maven-artifacts}} generates the test-frameword jars, and they are signed by {{sign-artifacts}}. Unless there are objections, I'll commit this tomorrow, then backport to branch_3x. Generate jar containing test classes. - Key: SOLR-2061 URL: https://issues.apache.org/jira/browse/SOLR-2061 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 3.1 Reporter: Drew Farris Assignee: Steven Rowe Priority: Minor Fix For: 3.2, 4.0 Attachments: SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch, SOLR-2061.patch Follow-on to LUCENE-2609 for the solr build -- it would be useful to generate and deploy a jar contaiing the test classes so other projects could write unit tests using the framework in Solr. This may take care of SOLR-717 as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org