RE: 3.0.3 Pre-Release Nuget Packages
I also want to point out we brought back .NET 3.5 compatibility - hopefully that gets some people excited From: geobmx...@hotmail.com To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org Subject: 3.0.3 Pre-Release Nuget Packages Date: Mon, 6 Aug 2012 13:55:01 -0700 Hey All, I've hidden the two depreciated Nuget packages (Lucene and Lucene Contrib). I've also added pre-release (3.0.3-RC) Packages for Lucene.Net and Lucene.Net.Contrib. If you have time, I would ask that you take them for a test drive and provide us any feedback you have. Thanks all,~Prescott
Stemming Indonesian in Lucene
I am interested in Lucene implement stemming Indonesian. I look at lucene no algorithm Nazief and Adriani. I am still a beginner and ask directions to implement it. -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-Indonesian-in-Lucene-tp3999321.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4289) highlighter idf calculation problems
[ https://issues.apache.org/jira/browse/LUCENE-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429001#comment-13429001 ] Uwe Schindler commented on LUCENE-4289: --- Too funny, +1 to fix. Backport? highlighter idf calculation problems Key: LUCENE-4289 URL: https://issues.apache.org/jira/browse/LUCENE-4289 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-4289.patch * highlighter uses numDocs instead of maxDoc * fastvectorhighlighter uses numDocs - numDeletedDocs instead of maxDoc (will go negative if more than half of docs are marked deleted) * fastvectorhighlighter calls docFreq and computes IDF per-position when it won't change (inefficient) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429006#comment-13429006 ] Gili Nachum commented on LUCENE-2501: - Issue resolved successfully. Even when increasing the degree of concurrency, I can no longer reproduce with 16 threads over 4 core machine. Thank you Michael! ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice -- Key: LUCENE-2501 URL: https://issues.apache.org/jira/browse/LUCENE-2501 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 3.0.1 Reporter: Tim Smith Attachments: LUCENE-2501.patch I'm seeing the following exception during indexing: {code} Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) at org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) ... 37 more {code} This seems to be caused by the following code: {code} final int level = slice[upto] 15; final int newLevel = nextLevelArray[level]; final int newSize = levelSizeArray[newLevel]; {code} this can result in level being a value between 0 and 14 the array nextLevelArray is only of size 10 i suspect the solution would be to either max the level to 10, or to add more entries to the nextLevelArray so it has 15 entries however, i don't know if something more is going wrong here and this is just where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429015#comment-13429015 ] Raintung Li commented on SOLR-3684: --- For 1, I want to test index's the throughput in solr cloud, start 1000 threads in the Jmeter, solr cloud server Jetty max threads is 1. Usually pressure test throughput achieve the max, then keep or down smoothly, the average last status is stable. In this case, the JVM look like the hungup, always do full gc, the cache for StandardTokenizer cost too many memory and thread still alive that cause the cache can't release, new request still come, the throughput become very bad. For 2, how to create the per-field analyzer? Is it the same analyzer? analyzer.tokenStream had been declare final, how to create the tokenStream the different fields? For one thread use the same tokenstream it is safe, TokenStreamComponents it is thread's cache. Could you give more information? Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name. */ SolrIndexAnalyzer() { super(new solrFieldReuseStrategy()); analyzers = analyzerCache(); } protected HashMapString, Analyzer analyzerCache() { HashMapString, Analyzer cache = new HashMapString, Analyzer(); for (SchemaField f : getFields().values()) { Analyzer analyzer =
[jira] [Created] (LUCENE-4290) basic highlighter that uses postings offsets
Robert Muir created LUCENE-4290: --- Summary: basic highlighter that uses postings offsets Key: LUCENE-4290 URL: https://issues.apache.org/jira/browse/LUCENE-4290 Project: Lucene - Core Issue Type: New Feature Components: modules/other Reporter: Robert Muir We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this. Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests) Additionally I didnt do any benchmarking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4290) basic highlighter that uses postings offsets
[ https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4290: Attachment: LUCENE-4290.patch basic highlighter that uses postings offsets Key: LUCENE-4290 URL: https://issues.apache.org/jira/browse/LUCENE-4290 Project: Lucene - Core Issue Type: New Feature Components: modules/other Reporter: Robert Muir Attachments: LUCENE-4290.patch We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this. Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests) Additionally I didnt do any benchmarking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ibrahim updated LUCENE-4216: Attachment: ArabicTokenizer.java I have decreased the offset by the difference in length before and after Tashkeel, On the other, I really do not know what it means. I have tested it in both cases with multi-value field (since offset is affecting end()) but found it is working. Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: ArabicTokenizer.java, myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429024#comment-13429024 ] Uwe Schindler commented on LUCENE-4216: --- Hi, {code:java} /** A tokenizer that will return tokens in the arabic alphabet. This tokenizer * is a bit rude since it also filters digits and punctuation, even in an arabic * part of stream. Well... I've planned to write a * universal, highly configurable, character tokenizer. * @author Pierrick Brihaye, 2003 */ {code} You don't need to implement your own ArabicTokenizer, just subclass the abstract Lucene class CharTokenizer which has all the functionality this comment in your source code offers. The change is easy: Subclass directly and remove all code exept isArabicChar and rename this method to isTokenChar (it takes int not char, but thats just a cast). The Tashkel stuff should be done with PatternReplaceFilter wrapped on top of this Tokenizer, there is no need to have this in the Tokenizer itsself and makes code complex. Then you can 100% be sure that all offsets are correct, the code you use is a duüplicate and it is too risky to reinvent the wheel if a well-tested variant is available with the Lucene distribution. It is much easier, trust me, no need to implement any crazy reset,... methods! Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: ArabicTokenizer.java, myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429025#comment-13429025 ] Uwe Schindler commented on LUCENE-4216: --- It is also much more performant, as your code creates regex mathcers all the time and copies the token chars to new Strings all the time instead of working directly on the CharTermAttribute (which extends CharSequence, so can do regexes directly). Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: ArabicTokenizer.java, myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429027#comment-13429027 ] Mikhail Khludnev commented on SOLR-3684: Hello, Q1 gives one more usage for SOLR-3585. It uses dedicated thread pool with limited capacity to proceed updates. So, the core challenge will be solved. Raintung, updating with the storm of small messages is not common for search engines world. Usual way is collecting them in bulks and index by modest number of threads. Sooner or later indexing hits io limit, therefore there is no profit to utilize CPU's by huge amount of indexing threads. Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name. */ SolrIndexAnalyzer() { super(new solrFieldReuseStrategy()); analyzers = analyzerCache(); } protected HashMapString, Analyzer analyzerCache() { HashMapString, Analyzer cache = new HashMapString, Analyzer(); for (SchemaField f : getFields().values()) { Analyzer analyzer = f.getType().getAnalyzer(); cache.put(f.getName(), analyzer); } return cache; } @Override protected Analyzer getWrappedAnalyzer(String fieldName) { Analyzer analyzer = analyzers.get(fieldName); return analyzer != null ? analyzer :
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429038#comment-13429038 ] Raintung Li commented on SOLR-3684: --- Hi Mikhail, It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues. My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory. BTW, the default Jetty thread config is 1 in the solr, in this cause the every server's alive threads are more than 1000. Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name. */ SolrIndexAnalyzer() { super(new solrFieldReuseStrategy()); analyzers = analyzerCache(); } protected HashMapString, Analyzer analyzerCache() { HashMapString, Analyzer cache = new HashMapString, Analyzer(); for (SchemaField f : getFields().values()) { Analyzer analyzer = f.getType().getAnalyzer(); cache.put(f.getName(), analyzer); } return cache; } @Override protected Analyzer getWrappedAnalyzer(String fieldName) { Analyzer analyzer = analyzers.get(fieldName); return analyzer !=
[jira] [Resolved] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
[ https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2501. Resolution: Fixed Fix Version/s: 3.6 5.0 4.0 Thanks for bringing closure, Gili. ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice -- Key: LUCENE-2501 URL: https://issues.apache.org/jira/browse/LUCENE-2501 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 3.0.1 Reporter: Tim Smith Fix For: 4.0, 5.0, 3.6 Attachments: LUCENE-2501.patch I'm seeing the following exception during indexing: {code} Caused by: java.lang.ArrayIndexOutOfBoundsException: 14 at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118) at org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490) at org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511) at org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104) at org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085) ... 37 more {code} This seems to be caused by the following code: {code} final int level = slice[upto] 15; final int newLevel = nextLevelArray[level]; final int newSize = levelSizeArray[newLevel]; {code} this can result in level being a value between 0 and 14 the array nextLevelArray is only of size 10 i suspect the solution would be to either max the level to 10, or to add more entries to the nextLevelArray so it has 15 entries however, i don't know if something more is going wrong here and this is just where the exception hits from a deeper issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets
[ https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429049#comment-13429049 ] Michael McCandless commented on LUCENE-4290: Wow :) This looks very nice! Should we move EMPTY into DocsAndPositionsEnum? This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters. I like the EMPTY_INDEXREADER (so MTQs do no rewrite work). basic highlighter that uses postings offsets Key: LUCENE-4290 URL: https://issues.apache.org/jira/browse/LUCENE-4290 Project: Lucene - Core Issue Type: New Feature Components: modules/other Reporter: Robert Muir Attachments: LUCENE-4290.patch We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this. Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests) Additionally I didnt do any benchmarking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429038#comment-13429038 ] Raintung Li edited comment on SOLR-3684 at 8/6/12 9:51 AM: --- Hi Mikhail, It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues. My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory. BTW, the default Jetty thread config is 1 in the solr, in this case the every server's alive threads are more than 1000. was (Author: raintung.li): Hi Mikhail, It isn't really storm that only 1000 client send the message, and we have three solr index servers, and all servers have the same issues. My suggestion just want to reduce wasteful memory, although memory is cheap now. To improve the performance to avoid io limit, we save into the memory, but also need calculate the memory usage even if JVM help us to manage the memory. BTW, the default Jetty thread config is 1 in the solr, in this cause the every server's alive threads are more than 1000. Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of
[jira] [Updated] (SOLR-3473) Distributed deduplication broken
[ https://issues.apache.org/jira/browse/SOLR-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated SOLR-3473: Attachment: SOLR-3473-trunk-2.patch Hello - Could the deleteByQuery issue you mention be fixed with SOLR-3473? I've attached an updated patch for today's trunk. The previous patch was missing the signature field but i added it to one schema. Now other tests seem to fail because they don't see the sig field but do use the update chain. Anyway, it seems the BasicDistributedZkTest passes but i'm not very sure, there's too much log output but it doesn't fail. Distributed deduplication broken Key: SOLR-3473 URL: https://issues.apache.org/jira/browse/SOLR-3473 Project: Solr Issue Type: Bug Components: SolrCloud, update Affects Versions: 4.0-ALPHA Reporter: Markus Jelsma Fix For: 4.0 Attachments: SOLR-3473-trunk-2.patch, SOLR-3473.patch, SOLR-3473.patch Solr's deduplication via the SignatureUpdateProcessor is broken for distributed updates on SolrCloud. Mark Miller: {quote} Looking again at the SignatureUpdateProcessor code, I think that indeed this won't currently work with distrib updates. Could you file a JIRA issue for that? The problem is that we convert update commands into solr documents - and that can cause a loss of info if an update proc modifies the update command. I think the reason that you see a multiple values error when you try the other order is because of the lack of a document clone (the other issue I mentioned a few emails back). Addressing that won't solve your issue though - we have to come up with a way to propagate the currently lost info on the update command. {quote} Please see the ML thread for the full discussion: http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets
[ https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429089#comment-13429089 ] Robert Muir commented on LUCENE-4290: - {quote} Should we move EMPTY into DocsAndPositionsEnum? {quote} maybe it can be either moved or removed if the code is fixed :) In this first patch its used both as a sentinel for a stopping condition and as a placeholder for term doesnt exist in this segment. The former i think is no longer necessary and the latter is probably overkill. {quote} This isn't just a cutover from term vectors to postings right? It actually scores each passage as if it were its own hit/document matching a search? Ie the passage ranking/selection differs from the two existing highlighters. {quote} Right: I think its different in a number of ways. I hope it should be really fast: but again I didnt even bother benchmarking yet. Its also limited in some ways since its just a prototype. basic highlighter that uses postings offsets Key: LUCENE-4290 URL: https://issues.apache.org/jira/browse/LUCENE-4290 Project: Lucene - Core Issue Type: New Feature Components: modules/other Reporter: Robert Muir Attachments: LUCENE-4290.patch We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this. Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests) Additionally I didnt do any benchmarking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.7.0_05) - Build # 112 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/112/ Java: 32bit/jdk1.7.0_05 -server -XX:+UseSerialGC All tests passed Build Log: [...truncated 20006 lines...] javadocs-lint: [...truncated 1674 lines...] BUILD FAILED C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\build.xml:47: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:524: The following error occurred while executing this line: C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:514: exec returned: 1 Total time: 41 minutes 52 seconds Build step 'Invoke Ant' marked build as failure Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3649) The javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(ListString ids).
[ https://issues.apache.org/jira/browse/SOLR-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-3649: - Attachment: SOLR-3649.patch Here's a patch that fixes the test for deleting by multiple ids + a proposed fix. The javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(ListString ids). -- Key: SOLR-3649 URL: https://issues.apache.org/jira/browse/SOLR-3649 Project: Solr Issue Type: Bug Components: clients - java Reporter: Mark Miller Priority: Minor Fix For: 4.0, 5.0 Attachments: SOLR-3649.patch A single Id gets deleted from the index as opposed to the full list. It appears properly in the logs - shows delete of all Ids sent, although all but one remain in the index. As reported on the mailing list http://lucene.472066.n3.nabble.com/Solr-4-Alpha-SolrJ-Indexing-Issue-tp3995781p3996074.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption
[ https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429132#comment-13429132 ] Markus Jelsma commented on SOLR-3685: - Each node has two cores and allow only one warming searcher at any time. The problem is triggered on start up after graceful shutdown as well as a hard power off. I've seen it happening not only when the whole cluster if restarted (i don't think i've ever done that) but just one node of the 6 shard 2 replica test cluster. The attached log is of one node being restarted out of the whole cluster. Could the off-heap RAM be part of data being sent over the wire? We've worked around the problem for now by getting more RAM. solrcloud crashes on startup due to excessive memory consumption Key: SOLR-3685 URL: https://issues.apache.org/jira/browse/SOLR-3685 Project: Solr Issue Type: Bug Components: replication (java), SolrCloud Affects Versions: 4.0-ALPHA Environment: Debian GNU/Linux Squeeze 64bit Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43 Reporter: Markus Jelsma Priority: Critical Fix For: 4.1 Attachments: info.log There's a serious problem with restarting nodes, not cleaning old or unused index directories and sudden replication and Java being killed by the OS due to excessive memory allocation. Since SOLR-1781 was fixed index directories get cleaned up when a node is being restarted cleanly, however, old or unused index directories still pile up if Solr crashes or is being killed by the OS, happening here. We have a six-node 64-bit Linux test cluster with each node having two shards. There's 512MB RAM available and no swap. Each index is roughly 27MB so about 50MB per node, this fits easily and works fine. However, if a node is being restarted, Solr will consistently crash because it immediately eats up all RAM. If swap is enabled Solr will eat an additional few 100MB's right after start up. This cannot be solved by restarting Solr, it will just crash again and leave index directories in place until the disk is full. The only way i can restart a node safely is to delete the index directories and have it replicate from another node. If i then restart the node it will crash almost consistently. I'll attach a log of one of the nodes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429166#comment-13429166 ] Erik Hatcher commented on SOLR-1725: bq. How do these tests pass under Ant? Maybe this is due to some libraries Ant itself is including in the classpath of the tests running? I'll go ahead and re-open this issue so it is red-flagged as something we should resolve before 4.0 final release. Perhaps we can include a scripting implementation in Solr, at least for testing purposes but maybe also to ship with to ensure this works out of the box on all JVMs. jruby.jar would be nice to have handy always :) Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Assignee: Erik Hatcher Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reopened SOLR-1725: Assignee: (was: Erik Hatcher) re-opening to have to have the tests (specifically the failing Maven run) looked at. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-1725: --- Priority: Critical (was: Major) marking the re-opening as critical to fix, hopefully at least before 4.0 final. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429173#comment-13429173 ] Robert Muir commented on SOLR-3684: --- {quote} BTW, the default Jetty thread config is 1 in the solr, {quote} Can we address this default thread config with a patch? This doesn't seem good, I guess if someone doesn't fix this I can easily DoS Solrs into eating up all their RAM until rebooted. Something like 100 seems just fine for QueuedThreadPool, so it will block in such cases (and probably just end out being faster overall). {quote} For 2, how to create the per-field analyzer? Is it the same analyzer? analyzer.tokenStream had been declare final, how to create the tokenStream the different fields? For one thread use the same tokenstream it is safe, TokenStreamComponents it is thread's cache. Could you give more information? {quote} Well basically your patch should be a nice improvement about 99.9% of the time. There is a (maybe only theoretical) case where someone has a lucene Analyzer MyAnalyzer configured as: {quote} fieldType name=text_custom class=solr.TextField analyzer class=com.mypackage.MyAnalyzer/ /fieldType ... field name=foo type=text_custom .../ field name=bar type=text_custom .../ ... {quote} If MyAnalyzer has different behavior for foo versus bar, then reuse-by-field-type will be incorrect. I'll think about a workaround, maybe nobody is even doing this or depends on this. But I just don't know if the same thing could happen for custom fieldtypes or whatever. Its just the kind of thing that could be a sneaky bug in the future. But I agree with the patch! I'll see if we can address it somehow. Separately I think we should also open an issue to reduce these jflex buffer sizes. char[16k] seems like serious overkill, the other tokenizers in lucene use char[4k]. Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer,
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429176#comment-13429176 ] Uwe Schindler commented on SOLR-1725: - Hi, this is not a problem at all. OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path. This is why it works on ant in FreeBSD Jenkins. Rhino is the javascript engine, missing in openjdks for legal reasons. Rhino is shipped with official jdks and is mandatory, so thats a stupid freebsd issue. Steven should add it to maven builds, too. You can resolve issue. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429178#comment-13429178 ] Steven Rowe commented on SOLR-1725: --- Thanks Uwe, I'll add rhino to maven builds. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser
[ https://issues.apache.org/jira/browse/SOLR-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429183#comment-13429183 ] srinivas commented on SOLR-3703: Jack, After adding autoGeneratePhraseQueries=true to the fieldType, we are good. Thanks a lot!!. I will close this ticket. Thanks Srini Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser - Key: SOLR-3703 URL: https://issues.apache.org/jira/browse/SOLR-3703 Project: Solr Issue Type: Bug Affects Versions: 3.6 Environment: Linux Reporter: srinivas I noticed, escape character which is in the query, is getting ignored in solr 3.6 with lucene parser. If I give edismax, then it is giving expected results for the following query. select?q=author:David\ DukedefType=lucene Would render the same results as: select?q=author:(David OR Duke)defType=lucene But select?q=author:David\ DukedefType=edismax Would render the same results as: select?q=author:David DukedefType=lucene Regards Srini -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4291) consider reducing jflex buffer sizes
Robert Muir created LUCENE-4291: --- Summary: consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429198#comment-13429198 ] Steven Rowe commented on LUCENE-4291: - +1. For tokenizers, the buffer needs to be able to hold a token (and its trailing context, if lookahead is used), but nothing more. 16k tokens are likely extremely rare. 4k seems reasonable to me - it's still way bigger than most people are likely to hit over normal text input. {{HTMLStripCharFilter}} is a bit different, since it buffers HTML constructs rather than tokens. In the face of malformed input (e.g. an opening angle bracket '' with no closing angle bracket ''), the scanner might buffer the entire remaining input. In contrast, {{LegacyHTMLStripCharFilter}}, the pre-JFlex implementation, limits this kind of buffering, to 8k max chars IIRC. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4289) highlighter idf calculation problems
[ https://issues.apache.org/jira/browse/LUCENE-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4289. - Resolution: Fixed Fix Version/s: 3.6.2 5.0 4.0 I backported too. Note in 3.6 fast-vector-highlighter is unaffected, it doesn't compute IDF. highlighter idf calculation problems Key: LUCENE-4289 URL: https://issues.apache.org/jira/browse/LUCENE-4289 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.0, 5.0, 3.6.2 Attachments: LUCENE-4289.patch * highlighter uses numDocs instead of maxDoc * fastvectorhighlighter uses numDocs - numDeletedDocs instead of maxDoc (will go negative if more than half of docs are marked deleted) * fastvectorhighlighter calls docFreq and computes IDF per-position when it won't change (inefficient) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429203#comment-13429203 ] Robert Muir commented on LUCENE-4291: - {quote} For tokenizers, the buffer needs to be able to hold a token (and its trailing context, if lookahead is used), but nothing more. 16k tokens are likely extremely rare. 4k seems reasonable to me - it's still way bigger than most people are likely to hit over normal text input. {quote} Yes, I think its reasonable too: especially since maxTokenLength is 255 by default. {quote} HTMLStripCharFilter is a bit different, since it buffers HTML constructs rather than tokens. In the face of malformed input (e.g. an opening angle bracket '' with no closing angle bracket ''), the scanner might buffer the entire remaining input. In contrast, LegacyHTMLStripCharFilter, the pre-JFlex implementation, limits this kind of buffering, to 8k max chars IIRC. {quote} OK, I can leave this one alone. We can revisit if we can make CharFilters reusable (not simple to do cleanly today). Its not as much of an issue since nothing is hanging on to it. I'll work up a patch. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3715) improve tlog concurrency
Yonik Seeley created SOLR-3715: -- Summary: improve tlog concurrency Key: SOLR-3715 URL: https://issues.apache.org/jira/browse/SOLR-3715 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Right now log record serialization is synchronized. We can improve concurrency by serializing to a ram buffer outside synchronization. The cost will be RAM usage for buffering, and more complex concurrency in the tlog itself (i.e. we must ensure that a close does not happen without flushing all in-RAM buffers) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3715) improve tlog concurrency
[ https://issues.apache.org/jira/browse/SOLR-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-3715: -- Assignee: Yonik Seeley improve tlog concurrency Key: SOLR-3715 URL: https://issues.apache.org/jira/browse/SOLR-3715 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Yonik Seeley Right now log record serialization is synchronized. We can improve concurrency by serializing to a ram buffer outside synchronization. The cost will be RAM usage for buffering, and more complex concurrency in the tlog itself (i.e. we must ensure that a close does not happen without flushing all in-RAM buffers) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b49) - Build # 225 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/225/ Java: 32bit/jdk1.8.0-ea-b49 -server -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.solr.servlet.SolrRequestParserTest.testStreamURL Error Message: connect timed out Stack Trace: java.net.SocketTimeoutException: connect timed out at __randomizedtesting.SeedInfo.seed([B59DCD42307FDE67:ECA8F351445D1352]:0) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:395) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1668) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1663) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1662) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1245) at org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:85) at org.apache.solr.servlet.SolrRequestParserTest.testStreamURL(SolrRequestParserTest.java:137) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:474) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at
[jira] [Updated] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4291: Attachment: LUCENE-4291.patch Here's a patch: with regenerations. Note that, by default 'ant jflex' gave me an error for all the includes (as of jflex r612). So thats why you see changes like: {noformat} -%include src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex +%include HTMLCharacterEntities.jflex {noformat} It seems jflex now expects these file paths to be relative to the input file? consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429214#comment-13429214 ] Steven Rowe commented on SOLR-1725: --- bq. OpenJDK on FreeBSD contains no scripting engine. So it was added in ants lib path. How? I've found the necessary jars, at {{/usr/home/hudson/tools/java/openjdk-missing-libs/}}, but I can't see how Ant's lib path includes them. I looked at {{~hudson/.profile}}, and {{lib/}} and {{bin/ant}} under {{/usr/home/hudson/tools/ant/apache-ant-1.8.2}} - none of these refer to the directory containing {{js.jar}} and {{script-js.jar}}. I'm asking because I'd like to set Maven up similarly to Ant. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Build failed in Jenkins: Lucene-trunk-Linux-Java7-64 #105
See builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/105/ -- [...truncated 37784 lines...] -clover.setup: [echo] Code coverage with Atlassian Clover enabled. [ivy:cachepath] :: resolving dependencies :: com.cenqua.clover#clover-caller;working [ivy:cachepath] confs: [master] [ivy:cachepath] found com.cenqua.clover#clover;2.6.3 in public [ivy:cachepath] :: resolution report :: resolve 14ms :: artifacts dl 0ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | master | 1 | 0 | 0 | 0 || 1 | 0 | - [clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778) [clover-setup] Loaded from: /var/lib/jenkins/.ant/lib/clover-2.6.3.jar [clover-setup] Clover: Open Source License registered to Apache. [clover-setup] Clover is enabled with initstring 'builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/clover/db/coverage.db' clover: compile-core: compile-test-framework: ivy-availability-check: ivy-fail: ivy-configure: [ivy:configure] :: loading settings :: file = builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/ivy-settings.xml resolve: init: compile-lucene-core: compile-core: common.compile-test: install-junit4-taskdef: -clover.disable: -clover.setup: [echo] Code coverage with Atlassian Clover enabled. [ivy:cachepath] :: resolving dependencies :: com.cenqua.clover#clover-caller;working [ivy:cachepath] confs: [master] [ivy:cachepath] found com.cenqua.clover#clover;2.6.3 in public [ivy:cachepath] :: resolution report :: resolve 13ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | master | 1 | 0 | 0 | 0 || 1 | 0 | - [clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778) [clover-setup] Loaded from: /var/lib/jenkins/.ant/lib/clover-2.6.3.jar [clover-setup] Clover: Open Source License registered to Apache. [clover-setup] Clover is enabled with initstring 'builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/clover/db/coverage.db' clover: validate: common.test: [mkdir] Created dir: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/suggest/test [junit4:junit4] JUnit4 says olá! Master seed: C7DF8F67F5F8636C [junit4:junit4] Executing 17 suites with 1 JVM. [junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.BytesRefSortersTest [junit4:junit4] Completed in 1.37s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.TestSort [junit4:junit4] IGNOR/A 0.23s | TestSort.testLargerRandom [junit4:junit4] Assumption #1: 'nightly' test group is disabled (@Nightly) [junit4:junit4] Completed in 14.45s, 6 tests, 1 skipped [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.WFSTCompletionTest [junit4:junit4] Completed in 4.20s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestNGramDistance [junit4:junit4] Completed in 0.62s, 4 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.FSTCompletionTest [junit4:junit4] Completed in 22.41s, 12 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.LookupBenchmarkTest [junit4:junit4] Completed in 0.12s, 0 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.TestHighFrequencyDictionary [junit4:junit4] Completed in 0.74s, 1 test [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestLuceneDictionary [junit4:junit4] Completed in 1.96s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestLevenshteinDistance [junit4:junit4] Completed in 0.32s, 2 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestSpellChecker [junit4:junit4] Completed in 12.05s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.suggest.TestTermFreqIterator [junit4:junit4] Completed in 5.11s, 3 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestDirectSpellChecker [junit4:junit4] Completed in 3.80s, 6 tests [junit4:junit4] [junit4:junit4] Suite: org.apache.lucene.search.spell.TestWordBreakSpellChecker
[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429217#comment-13429217 ] Tom Burton-West commented on LUCENE-4286: - We haven't had a request for this specific feature from readers, we are just assuming that the 10% of Han queries in our logs that consist of a single character represent real use cases and we don't want such queries to produce zero results or produce misleading results. Tom Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429219#comment-13429219 ] Robert Muir commented on SOLR-1725: --- {noformat} [rcmuir@lucene /home/hudson/.ant/lib]$ ls -la total 1843 drwxr-xr-x 2 hudson hudson 5 Mar 30 15:46 . drwxr-xr-x 3 hudson hudson 8 May 13 12:41 .. -rw-r--r-- 1 hudson hudson 947592 Mar 30 15:45 ivy-2.2.0.jar -rw-r--r-- 1 hudson hudson 701049 Jul 27 2006 js.jar -rw-r--r-- 1 hudson hudson 34607 Oct 16 2006 script-js.jar {noformat} Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429218#comment-13429218 ] Robert Muir commented on SOLR-1725: --- I think they are added to ~hudson/.ant/lib ? Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jenkins build is back to normal : Lucene-trunk-Linux-Java7-64 #106
See builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/106/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429221#comment-13429221 ] Steven Rowe commented on LUCENE-4291: - bq. Gerwin Klein recently fixed [JFlex issue 3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929], with exactly this change. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429221#comment-13429221 ] Steven Rowe edited comment on LUCENE-4291 at 8/6/12 4:08 PM: - bq. It seems jflex now expects these file paths to be relative to the input file? Gerwin Klein recently fixed [JFlex issue 3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929], with exactly this change. was (Author: steve_rowe): bq. Gerwin Klein recently fixed [JFlex issue 3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929], with exactly this change. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429222#comment-13429222 ] Steven Rowe commented on SOLR-1725: --- Thanks Robert, I see them now. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads
[ https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429227#comment-13429227 ] Dawid Weiss commented on LUCENE-4280: - TestLazyProxSkipping again. {code} [junit4:junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping [junit4:junit4] OK 0.01s J0 | TestLazyProxSkipping.testSeek [junit4:junit4] OK 1.05s J0 | TestLazyProxSkipping.testLazySkipping [junit4:junit4] (@AfterClass output) [junit4:junit4] 2 Aug 06, 2012 3:47:18 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks [junit4:junit4] 2 WARNING: Will linger awaiting termination of 1 leaked thread(s). [junit4:junit4] 2 Aug 06, 2012 3:47:38 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks [junit4:junit4] 2 SEVERE: 1 thread leaked from SUITE scope at org.apache.lucene.index.TestLazyProxSkipping: [junit4:junit4] 21) Thread[id=116, name=LuceneTestCase-18-thread-1, state=WAITING, group=TGRP-TestLazyProxSkipping] [junit4:junit4] 2 at sun.misc.Unsafe.park(Native Method) [junit4:junit4] 2 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [junit4:junit4] 2 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) [junit4:junit4] 2 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [junit4:junit4] 2 at java.lang.Thread.run(Thread.java:722) [junit4:junit4] 2 Aug 06, 2012 3:47:38 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll [junit4:junit4] 2 INFO: Starting to interrupt leaked threads: [junit4:junit4] 21) Thread[id=116, name=LuceneTestCase-18-thread-1, state=WAITING, group=TGRP-TestLazyProxSkipping] [junit4:junit4] 2 Aug 06, 2012 3:47:41 PM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll [junit4:junit4] 2 SEVERE: There are still zombie threads that couldn't be terminated: [junit4:junit4] 21) Thread[id=116, name=LuceneTestCase-18-thread-1, state=WAITING, group=TGRP-TestLazyProxSkipping] [junit4:junit4] 2 at sun.misc.Unsafe.park(Native Method) [junit4:junit4] 2 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [junit4:junit4] 2 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) [junit4:junit4] 2 at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) [junit4:junit4] 2 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [junit4:junit4] 2 at java.lang.Thread.run(Thread.java:722) [junit4:junit4] 2 NOTE: test params are: codec=Lucene40: {tokens=PostingsFormat(name=MockRandom)}, sim=RandomSimilarityProvider(queryNorm=false,coord=false): {tokens=DFR I(n)B3(800.0)}, locale=sl, timezone=America/Resolute [junit4:junit4] 2 NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.7.0_03 (64-bit)/cpus=8,threads=2,free=130600992,total=261095424 [junit4:junit4] 2 NOTE: All tests run in this JVM: [TestBooleanOr, TestDirectory, TestMultiTermConstantScore, TestIndexFileDeleter, TestSetOnce, Nested1, TestStressIndexing2, TestRegexpRandom2, TestStressAdvance, TestSpansAdvanced, TestAssertions, TestFieldCacheRewriteMethod, TestPrefixInBooleanQuery, TestMultiPhraseQuery, TestMatchAllDocsQuery, TestLock, TestSimilarity2, TestNamedSPILoader, TestSort, TestBytesRefHash, TestOmitTf, TestVirtualMethod, TestLazyProxSkipping] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestLazyProxSkipping -Dtests.seed=55A3CB2FF25AC1A5 -Dtests.slow=true -Dtests.locale=sl -Dtests.timezone=America/Resolute -Dtests.file.encoding=ISO-8859-1 [junit4:junit4] 2 [junit4:junit4] ERROR 0.00s J0 | TestLazyProxSkipping (suite) [junit4:junit4] Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.lucene.index.TestLazyProxSkipping: [junit4:junit4]1) Thread[id=116, name=LuceneTestCase-18-thread-1, state=WAITING, group=TGRP-TestLazyProxSkipping] [junit4:junit4] at sun.misc.Unsafe.park(Native Method) [junit4:junit4] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [junit4:junit4] at
[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429228#comment-13429228 ] Robert Muir commented on LUCENE-4286: - The combined unigram+bigram technique is a general technique, which I think is useful to support. For examples see: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6782 http://members.unine.ch/jacques.savoy/Papers/NTCIR6.pdf There are more references and studies linked from those. Tom would have to do tests for his index-time-only approach: I can't speak for that. Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429231#comment-13429231 ] Robert Muir commented on LUCENE-4291: - OK thanks, that explains it! I'd like to commit this if there are no objections. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429234#comment-13429234 ] Steven Rowe commented on LUCENE-4291: - bq. I'd like to commit this if there are no objections. +1, patch looks good. consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429240#comment-13429240 ] Hoss Man commented on SOLR-1725: I (think i) fixed the assumptions in these tests to actually skip properly if the engines aren't available... Committed revision 1369874. - trunk Committed revision 1369875. - 4x Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads
[ https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429244#comment-13429244 ] Michael McCandless commented on LUCENE-4280: I committed a fix for TestLazyProxSkipping (it wasn't closing the reader). TestReaderClosed leaks threads -- Key: LUCENE-4280 URL: https://issues.apache.org/jira/browse/LUCENE-4280 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss Assignee: Robert Muir Priority: Minor {code} -ea -Dtests.seed=9449688B90185FA5 -Dtests.iters=100 {code} reproduces 100% for me, multiple thread leak out from newSearcher's internal threadfactory: {code} Aug 02, 2012 8:46:05 AM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks SEVERE: 6 threads leaked from SUITE scope at org.apache.lucene.index.TestReaderClosed: 1) Thread[id=13, name=LuceneTestCase-1-thread-1, state=WAITING, group=TGRP-TestReaderClosed] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2) Thread[id=15, name=LuceneTestCase-3-thread-1, state=WAITING, group=TGRP-TestReaderClosed] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 3) Thread[id=17, name=LuceneTestCase-5-thread-1, state=WAITING, group=TGRP-TestReaderClosed] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 4) Thread[id=18, name=LuceneTestCase-6-thread-1, state=WAITING, group=TGRP-TestReaderClosed] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 5) Thread[id=16, name=LuceneTestCase-4-thread-1, state=WAITING, group=TGRP-TestReaderClosed] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 6) Thread[id=14, name=LuceneTestCase-2-thread-1, state=WAITING,
[jira] [Resolved] (LUCENE-4291) consider reducing jflex buffer sizes
[ https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4291. - Resolution: Fixed Fix Version/s: 5.0 4.0 consider reducing jflex buffer sizes Key: LUCENE-4291 URL: https://issues.apache.org/jira/browse/LUCENE-4291 Project: Lucene - Core Issue Type: Task Components: modules/analysis Reporter: Robert Muir Fix For: 4.0, 5.0 Attachments: LUCENE-4291.patch Spinoff from SOLR-3684. Most lucene tokenizers have some buffer size, e.g. in CharTokenizer/ICUTokenizer its char[4096]. But the jflex tokenizers use char[16384] by default, which seems overkill. I'm not sure we really see any performance bonus by having such a huge buffer size as a default. There is a jflex parameter to set this: I think we should consider reducing it. In a configuration like solr, tokenizers are reused per-thread-per-field, so these can easily stack up in RAM. Additionally CharFilters are not reused so the configuration in e.g. HtmlStripCharFilter might not be great since its per-document garbage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429287#comment-13429287 ] Robert Muir commented on SOLR-3684: --- FYI: I lowered the jflex buffer sizes from 32kb to 8kb in LUCENE-4291. So I think we should still: # Address this default jetty threadpool size of max=10,000. This is the real issue. # See if we can deal with the crazy corner case so we can impl your patch (reuse by fieldtype), which I think is a good separate improvement. Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name. */ SolrIndexAnalyzer() { super(new solrFieldReuseStrategy()); analyzers = analyzerCache(); } protected HashMapString, Analyzer analyzerCache() { HashMapString, Analyzer cache = new HashMapString, Analyzer(); for (SchemaField f : getFields().values()) { Analyzer analyzer = f.getType().getAnalyzer(); cache.put(f.getName(), analyzer); } return cache; } @Override protected Analyzer getWrappedAnalyzer(String fieldName) { Analyzer analyzer = analyzers.get(fieldName); return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer(); } @Override protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
Re: svn commit: r1369892 [3/3] - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/analysis/ lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/ lucene/analysis/common/src/java/o
Hi, see the diff below. Just to explain why the DFA changed, the 3.4 backwards impl was previously %include'ing the wrong files it seems, it was including them from the 'current' StandardTokenizer directory before. On Mon, Aug 6, 2012 at 1:36 PM, rm...@apache.org wrote: Modified: lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex?rev=1369892r1=1369891r2=1369892view=diff == --- lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex (original) +++ lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex Mon Aug 6 17:36:34 2012 @@ -39,8 +39,9 @@ import org.apache.lucene.analysis.tokena %implements StandardTokenizerInterface %function getNextToken %char +%buffer 4096 -%include src/java/org/apache/lucene/analysis/standard/SUPPLEMENTARY.jflex-macro +%include SUPPLEMENTARY.jflex-macro ... -%include src/java/org/apache/lucene/analysis/standard/ASCIITLD.jflex-macro +%include ASCIITLD.jflex-macro -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[ANNOUNCE] Lucene/Solr @ ApacheCon Europe - August 13th Deadline for CFP and Travel Assistance applications
ApacheCon Europe will be happening 5-8 November 2012 in Sinsheim, Germany at the Rhein-Neckar-Arena. Early bird tickets go on sale this Monday, 6 August. http://www.apachecon.eu/ The Lucene/Solr track is shaping up to be quite impressive this year, so make your plans to attend and submit your session proposals ASAP! -- CALL FOR PAPERS -- The Call for Participation for ApacheCon Europe has been extended to 13 August! To submit a presentation and for more details, visit http://www.apachecon.eu/cfp/ Post a banner on your Website to show your support for ApacheCon Europe or North America (24-28 February 2013 in Portland, OR)! Download at http://www.apache.org/events/logos-banners/ We look forward to seeing you! -the Apache Conference Committee ApacheCon Planners --- TRAVEL ASSISTANCE --- We're pleased to announce Travel Assistance (TAC) applications for ApacheCon Europe 2012 are now open! The Travel Assistance Committee exists to help those that would like to attend ApacheCon events, but are unable to do so for financial reasons. For more info on this years Travel Assistance application criteria please visit the TAC website at http://www.apache.org/travel/ . Some important dates... The original application period officially opened on 23rd July, 2012. Applicants have until the 13th August 2012 to submit their applications (which should contain as much supporting material as required to efficiently and accurately process your request), this will enable the Travel Assistance Committee to announce successful awards on or shortly after the 24th August, 2012. As always TAC expects to deal with a range of applications from many diverse backgrounds so we encourage (as always) anyone thinking about sending in a TAC application to get it in ASAP. We look forward to greeting everyone in Sinsheim, Germany in November. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error
Dawid Weiss created LUCENE-4292: --- Summary: TestPerfTasksLogic.testBGSearchTaskThreads assertion error Key: LUCENE-4292 URL: https://issues.apache.org/jira/browse/LUCENE-4292 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss {code} build 06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | TestPerfTasksLogic.testBGSearchTaskThreads build 06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: java.lang.AssertionError build 06-Aug-2012 19:45:55[junit4:junit4]at __randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.fail(Assert.java:92) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:43) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:54) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) build 06-Aug-2012 19:45:55[junit4:junit4]at java.lang.reflect.Method.invoke(Method.java:597) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) build 06-Aug-2012 19:45:55[junit4:junit4]at
[jira] [Assigned] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error
[ https://issues.apache.org/jira/browse/LUCENE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-4292: -- Assignee: Michael McCandless TestPerfTasksLogic.testBGSearchTaskThreads assertion error -- Key: LUCENE-4292 URL: https://issues.apache.org/jira/browse/LUCENE-4292 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss Assignee: Michael McCandless {code} build 06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | TestPerfTasksLogic.testBGSearchTaskThreads build 06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: java.lang.AssertionError build 06-Aug-2012 19:45:55[junit4:junit4]at __randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.fail(Assert.java:92) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:43) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:54) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) build 06-Aug-2012 19:45:55[junit4:junit4]at java.lang.reflect.Method.invoke(Method.java:597) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) build 06-Aug-2012 19:45:55[junit4:junit4]at
Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java
+// NOTE: cannot assert this, because on a super-slow +// system, it could be after waiting 0.5 seconds that Thanks Mike. Interesting because it's not that super-slow windows machine. A dated 2 core AMD but I wouldn't say it's a snail. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error
[ https://issues.apache.org/jira/browse/LUCENE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4292. Resolution: Fixed Fix Version/s: 5.0 4.0 I commented out the assertion for this test ... it's not valid. TestPerfTasksLogic.testBGSearchTaskThreads assertion error -- Key: LUCENE-4292 URL: https://issues.apache.org/jira/browse/LUCENE-4292 Project: Lucene - Core Issue Type: Bug Reporter: Dawid Weiss Assignee: Michael McCandless Fix For: 4.0, 5.0 {code} build 06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | TestPerfTasksLogic.testBGSearchTaskThreads build 06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: java.lang.AssertionError build 06-Aug-2012 19:45:55[junit4:junit4]at __randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.fail(Assert.java:92) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:43) build 06-Aug-2012 19:45:55[junit4:junit4]at org.junit.Assert.assertTrue(Assert.java:54) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) build 06-Aug-2012 19:45:55[junit4:junit4]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) build 06-Aug-2012 19:45:55[junit4:junit4]at java.lang.reflect.Method.invoke(Method.java:597) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) build 06-Aug-2012 19:45:55[junit4:junit4]at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) build 06-Aug-2012 19:45:55[junit4:junit4]at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java
On Mon, Aug 6, 2012 at 2:28 PM, Dawid Weiss dawid.we...@gmail.com wrote: +// NOTE: cannot assert this, because on a super-slow +// system, it could be after waiting 0.5 seconds that Thanks Mike. Interesting because it's not that super-slow windows machine. A dated 2 core AMD but I wouldn't say it's a snail. Hmmm well somehow those 2 search threads weren't scheduled (enough) before the 0.5 seconds was up. This was the same case that previously would have lead to deadlock (BG search threads hadn't started before the wait was done). Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java
Hmmm well somehow those 2 search threads weren't scheduled (enough) before the 0.5 seconds was up. Very likely. 500ms isn't that much when you have competing threads and some other processes in the background (which was possibly the case). D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429387#comment-13429387 ] Steven Rowe commented on SOLR-1725: --- After Hoss's commits, both ASF Jenkins Maven jobs have run, and under both jobs, tests that previously were failing under Maven due to the lack of a javascript engine in the classpath are now being skipped. After those jobs started, I committed a change to {{dev/nightly/common-maven.sh}} that includes the two rhino jars in the Maven JVM boot class path: r1369936. I've enqueued the Maven jobs again on ASF Jenkins. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2510) migrate solr analysis factories to analyzers module
[ https://issues.apache.org/jira/browse/LUCENE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429393#comment-13429393 ] Steven Rowe commented on LUCENE-2510: - Solr tests have been failing under Maven on ASF Jenkins since the LUCENE-4044 commits on 7/25, because the POMs for two analysis modules (morfologik and phonetic) didn't include {{$\{project.build.resources}}} definitions for {{src/resources/}}, the location of the SPI configuration files {{META-INF/services/o.a.l.analysis.util.*Factory}}. I've added {{src/resources/}} to these two modules' POMs: - r1369961: trunk - r1369980: branch_4x migrate solr analysis factories to analyzers module --- Key: LUCENE-2510 URL: https://issues.apache.org/jira/browse/LUCENE-2510 Project: Lucene - Core Issue Type: Task Components: modules/analysis Affects Versions: 4.0-ALPHA Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: LUCENE-2510-movefactories.sh, LUCENE-2510-movefactories.sh, LUCENE-2510-multitermcomponent.patch, LUCENE-2510-multitermcomponent.patch, LUCENE-2510-parent-classes.patch, LUCENE-2510-parent-classes.patch, LUCENE-2510-parent-classes.patch, LUCENE-2510-resourceloader-bw.patch, LUCENE-2510-simplify-tests.patch, LUCENE-2510.patch, LUCENE-2510.patch, LUCENE-2510.patch In LUCENE-2413 all TokenStreams were consolidated into the analyzers module. This is a good step, but I think the next step is to put the Solr factories into the analyzers module, too. This would make analyzers artifacts plugins to both lucene and solr, with benefits such as: * users could use the old analyzers module with solr, too. This is a good step to use real library versions instead of Version for backwards compat. * analyzers modules such as smartcn and icu, that aren't currently available to solr users due to large file sizes or dependencies, would be simple optional plugins to solr and easily available to users that want them. Rough sketch in this thread: http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module Practically, I havent looked much and don't really have a plan for how this will work yet, so ideas are very welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1369987 - in /lucene/dev/nightly: common-maven.sh hudson-settings.sh
-RHINO_LIBS_DIR=/usr/home/hudson/tools/java/openjdk-missing-libs +RHINO_LIBS_DIR=$HOME/tools/java/openjdk-missing-libs Thanks Uwe. - Steve
Re: How do you interpret the values returned by RunAutomaton.getCharIntervals() ?
If you show the automaton with toDot or toString it should be clear where those codepoints come from. - Anders On 04-08-2012 02:34, Ashwin Jayaprakash wrote: Hi, I was playing with the RunAutomaton class and I was not sure about the meaning of the results returned by the RunAutomaton.getCharIntervals() method. The JavaDoc for that method says Returns array of codepoint class interval start points.. I tried it on a simple regex string (ij{2,5}\uE001k789opq) and I couldn't explain why there were4 extra values returned - 0x3a (:), 0x6c (l), 0x72 (r) and 0xe002 (Unicode private use codepoint). These 4 characters were +1 step from the characters 9, k, q and 0xe001 respectively, all of which are in the regex from which the automaton was built. Does anyone know why this is happening? All the codepoints in the regex pattern have a length of just 1 char. So, why the extra chars? What I was tying to really do was to extract the identifiers in the pattern, which this method almost does except for some inexplicable, extra values. I was really looking for an array with 7, 8, 9, i, j, k, o, p, q, 0xe001. Code: import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.RegExp; import org.apache.lucene.util.automaton.RunAutomaton; ... .. public static void main(String[] args) { String s = ij{2,5}\uE001k789opq; RegExp r = new RegExp(s); Automaton a = r.toAutomaton(); RunAutomaton ra = new RunAutomaton(a, Character.MAX_CODE_POINT, false) { }; System.out.println(Char intervals for: + s); for (int i : ra.getCharIntervals()) { System.out.println( + Integer.toHexString(i) + = + new String(Character.toChars(i))); } } Output: Char intervals for: ij{2,5}?k789opq 0 = 37 = 7 38 = 8 39 = 9 3a = : 69 = i 6a = j 6b = k 6c = l 6f = o 70 = p 71 = q 72 = r e001 = ? e002 = ? Thanks, Ashwin. -- Anders Moeller amoel...@cs.au.dk http://cs.au.dk/~amoeller - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser
[ https://issues.apache.org/jira/browse/SOLR-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivas closed SOLR-3703. -- Resolution: Fixed Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser - Key: SOLR-3703 URL: https://issues.apache.org/jira/browse/SOLR-3703 Project: Solr Issue Type: Bug Affects Versions: 3.6 Environment: Linux Reporter: srinivas I noticed, escape character which is in the query, is getting ignored in solr 3.6 with lucene parser. If I give edismax, then it is giving expected results for the following query. select?q=author:David\ DukedefType=lucene Would render the same results as: select?q=author:(David OR Duke)defType=lucene But select?q=author:David\ DukedefType=edismax Would render the same results as: select?q=author:David DukedefType=lucene Regards Srini -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #49: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/49/ No tests ran. Build Log: [...truncated 8471 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429458#comment-13429458 ] Uwe Schindler commented on SOLR-1725: - I think, this is all fine: - Java 6 spec requires a JavaScript engine to be shipped with JDK, it is just missing at FreeBSD's package (there is an issue open upstream). If JavaScript is not there for Java 6 it is like missing UTF8 charset :-) - I strongly -1 shipping with additional scripting engines. No need for that. If user Foo wants to script Solr with engine Bar, he can add the SPI Jar to classpath. No need to ship. This is why SPI was invented! We should maybe only fix Solr's classloader to be set as context classloader, too. SPIs cannot be loaded from $SOLR_HOME/lib, because context classloader does not see the jars. We fixed that for codecs and analyzer SPI JARs in Solr, but the most correct solution would be to enable Solr's threads to see the ResourceLoader as context classloader. Then you can add scripting engines, XML parsers, charset providers, locales,... just like plugins or codecs or analyzerfactories into the Solr home's lib folder without adding to WAR. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429468#comment-13429468 ] Hoss Man commented on SOLR-1725: bq. Java 6 spec requires a JavaScript engine to be shipped with JDK i didn't know that ... i couldn't find anything in the docs that suggested certain engines were mandatory, hence the assuptions i nthe test (the maven tests just indicated that those assumptions ere broken) bq. I strongly -1 shipping with additional scripting engines i didn't see anyone suggesting that ... no argument there. bq. We should maybe only fix Solr's classloader to be set as context classloader, too. that sounds like an ortoginal issue ... great idea, didn't know it was possible, please go ahead and do it, but let's track it in it's own issue Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429472#comment-13429472 ] Hoss Man commented on SOLR-1725: {quote} bq. I strongly -1 shipping with additional scripting engines i didn't see anyone suggesting that ... no argument there. {quote} sorry .. i overlooked that part of erik's comment .. i'm with Uwe: let's let users add their own script engines as plugins Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429480#comment-13429480 ] Lance Norskog commented on LUCENE-4286: --- If you do unigrams and bigrams in separate fields, you can bias bigrams over unigrams. We did that with one customer and it really helped. Our text was technical and tended towards long words: lots of bigrams trigrams. Have you tried the Smart Chinese toolkit? It produces a lot less bigrams. Our project worked well with it. I would try that, with misfires further broken into bigrams, over general bigramming. C.f. [SOLR-3653] about the misfires part. In general we found Chinese-language search a really hard problem, and doubly so when nobody on the team speaks Chinese. Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429487#comment-13429487 ] Uwe Schindler commented on SOLR-1725: - Hoss, you are right, it is not required that JS is available, the Java 6 specs says [http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]: {quote} JSR 223: Scripting for the Java Platform A large percentage of Java developers also use scripting languages. While the Java language is suitable for many tasks, and especially for writing robust, long-lived applications, scripting languages are useful for many other tasks. JSR 223 defines a framework for connecting interpreters of arbitrary scripting languages to Java programs. It includes facilities for locating the available scripting engines, invoking scripts from Java code and vice versa, and making Java application objects visible to scripts. The framework is divided into two parts, the Scripting API and an optional Web Scripting Framework. This feature will incorporate just the Scripting API into this version of the Java SE platform. There will be no requirement that any particular scripting language be supported by the platform; implementors may choose to include support for the scripting language(s) of their choice as they see fit. [ JSR 223; javax.script ] {quote} But all JDKs on all platforms except FreeBSD contain them. So we should have the error messages printed on failure to lookup engine and the assumption in test as you committed. But as Erik says, too: No need to ship engines. Its just bloat because there are millions of them :-) Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429487#comment-13429487 ] Uwe Schindler edited comment on SOLR-1725 at 8/6/12 10:11 PM: -- Hoss, you are right, it is not required that JS is available, the Java 6 specs says [http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]: {quote} JSR 223: Scripting for the Java Platform A large percentage of Java developers also use scripting languages. While the Java language is suitable for many tasks, and especially for writing robust, long-lived applications, scripting languages are useful for many other tasks. JSR 223 defines a framework for connecting interpreters of arbitrary scripting languages to Java programs. It includes facilities for locating the available scripting engines, invoking scripts from Java code and vice versa, and making Java application objects visible to scripts. The framework is divided into two parts, the Scripting API and an optional Web Scripting Framework. This feature will incorporate just the Scripting API into this version of the Java SE platform. There will be no requirement that any particular scripting language be supported by the platform; implementors may choose to include support for the scripting language(s) of their choice as they see fit. [ JSR 223; javax.script ] {quote} But all JDKs on all platforms except FreeBSD contain them. So we should have the error messages printed on failure to lookup engine and the assumption in test as you committed. But as Hoss says, too: No need to ship engines. Its just bloat because there are millions of them :-) was (Author: thetaphi): Hoss, you are right, it is not required that JS is available, the Java 6 specs says [http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]: {quote} JSR 223: Scripting for the Java Platform A large percentage of Java developers also use scripting languages. While the Java language is suitable for many tasks, and especially for writing robust, long-lived applications, scripting languages are useful for many other tasks. JSR 223 defines a framework for connecting interpreters of arbitrary scripting languages to Java programs. It includes facilities for locating the available scripting engines, invoking scripts from Java code and vice versa, and making Java application objects visible to scripts. The framework is divided into two parts, the Scripting API and an optional Web Scripting Framework. This feature will incorporate just the Scripting API into this version of the Java SE platform. There will be no requirement that any particular scripting language be supported by the platform; implementors may choose to include support for the scripting language(s) of their choice as they see fit. [ JSR 223; javax.script ] {quote} But all JDKs on all platforms except FreeBSD contain them. So we should have the error messages printed on failure to lookup engine and the assumption in test as you committed. But as Erik says, too: No need to ship engines. Its just bloat because there are millions of them :-) Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the
[jira] [Created] (SOLR-3716) Make SolrResourceLoaders ClassLoader available as context class loader
Uwe Schindler created SOLR-3716: --- Summary: Make SolrResourceLoaders ClassLoader available as context class loader Key: SOLR-3716 URL: https://issues.apache.org/jira/browse/SOLR-3716 Project: Solr Issue Type: Bug Components: scripts and tools Reporter: Uwe Schindler Fix For: 4.0, 5.0 SOLR-1725 and other issues (recent changes to analysis factories and codecs) make it possible to plug in extensions like analyzer factories, codecs, scripting engines or TIKA parsers (TIKA extraction plugin!!!) as SPIs. The current problem (we solved this alreeady for codecs and analyzer factories with a classloader-reload hack: LUCENE-4259) is the following: You have to unpack WAR file and repack with the missing JAR files. If you would do it the solr way and put those jars into the $SOLR_HOME/lib folder like plugins, they are not seen. The problem is that plugins loaded by solr are loaded using SolrResourceLoader's classloader (configureable via solrconfig.xml), but as this classloader is not also context classloader, SPI does not look into it, so scripting engines, TIKA plugins, (previously codecs) are not seen. We should investigate how to manage setting the context classloader of all threads solr ever sees to point to our own solr classloader. When we do this, I also suggest to only ship with TIKA core libs but not tika-parsers and the big dependency hell. TIKA parsers are also loaded via SPI, so user can download the TIKA parser distribution and drop into $SOLR_HOME/lib. By that a user can also use only those extraction plugins really needed. The current solr distribution only consists of mostly useless JAR files (for many users) for Solr Extraction handler. We dont need to ship with all of them, we can just tell the user how to install the needed SPIs. The same for analysis-extras (user only needs to copy morphologic JAR or smartchinese JAR into $SOLR_HOME/lib - this works already!!!). No need for the hull contrib. Scripting engines is the same. We should just ship with some scripts (ANT based) to download the JAR files into $SOLR_HOME. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429528#comment-13429528 ] Uwe Schindler commented on SOLR-1725: - I opened SOLR-3716. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585 ] Steven Rowe commented on SOLR-1725: --- On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow both Ant and Maven build jobs to enable scripting tests. I copied {{js.jar}} and {{script-js.jar}} from {{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to {{/usr/local/openjdk{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which was skipped (as expected). I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585 ] Steven Rowe edited comment on SOLR-1725 at 8/6/12 11:44 PM: On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow both Ant and Maven build jobs to enable scripting tests. I copied {{js.jar}} and {{script-js.jar}} from {{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to {{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which was skipped (as expected). I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}. was (Author: steve_rowe): On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow both Ant and Maven build jobs to enable scripting tests. I copied {{js.jar}} and {{script-js.jar}} from {{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to {{/usr/local/openjdk{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which was skipped (as expected). I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585 ] Steven Rowe edited comment on SOLR-1725 at 8/6/12 11:45 PM: On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow both Ant and Maven build jobs to enable scripting tests. I copied {{js.jar}} and {{script-js.jar}} from {{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to {{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests under the Maven branch_4x job have succeeded, except for {{testJRuby()}}, which was skipped (as expected). I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}. was (Author: steve_rowe): On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow both Ant and Maven build jobs to enable scripting tests. I copied {{js.jar}} and {{script-js.jar}} from {{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to {{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which was skipped (as expected). I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Priority: Critical Labels: UpdateProcessor Fix For: 4.0 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.
[ https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-3647: --- This was actually a fairly bad bug as brought up on the user list a week or two back - the std zk client does not deal with connection loss well, but worse, once it's had a connection expiration, you have to make a new client - you cannot use the old one. So if the distrib queue zk client ever gets expired, it will continually hit expiration exceptions as you try to use it again - so no nodes can publish states (other issues too, but thats a big one). This can put in in an infinite recovery loop. DistrubtedQueue should use our Solr zk client rather than the std zk client. Key: SOLR-3647 URL: https://issues.apache.org/jira/browse/SOLR-3647 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0, 5.0 This will let us easily do retries on connection loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.
[ https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3647: -- Issue Type: Bug (was: Improvement) DistrubtedQueue should use our Solr zk client rather than the std zk client. Key: SOLR-3647 URL: https://issues.apache.org/jira/browse/SOLR-3647 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0, 5.0 This will let us easily do retries on connection loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.
[ https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-3647. --- Resolution: Fixed reopened to change from improvement to bug DistrubtedQueue should use our Solr zk client rather than the std zk client. Key: SOLR-3647 URL: https://issues.apache.org/jira/browse/SOLR-3647 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.0, 5.0 This will let us easily do retries on connection loss. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3716) Make SolrResourceLoaders ClassLoader available as context class loader
[ https://issues.apache.org/jira/browse/SOLR-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429653#comment-13429653 ] Lance Norskog commented on SOLR-3716: - Thanks for flushing out another problem in how classpaths work. I have a small question: how would I add a Java SecurityManager class into this mix? I would like to set a security manager object for each core that governs the activities of code in that core: loading a 3-megabyte synonym file, loading a jar file that calls out to the DHS, whatever. (Why? A hosted Solr business is a lot easier if you can run someone's collection configs in a sandbox.) Make SolrResourceLoaders ClassLoader available as context class loader -- Key: SOLR-3716 URL: https://issues.apache.org/jira/browse/SOLR-3716 Project: Solr Issue Type: Bug Components: scripts and tools Reporter: Uwe Schindler Fix For: 4.0, 5.0 SOLR-1725 and other issues (recent changes to analysis factories and codecs) make it possible to plug in extensions like analyzer factories, codecs, scripting engines or TIKA parsers (TIKA extraction plugin!!!) as SPIs. The current problem (we solved this alreeady for codecs and analyzer factories with a classloader-reload hack: LUCENE-4259) is the following: You have to unpack WAR file and repack with the missing JAR files. If you would do it the solr way and put those jars into the $SOLR_HOME/lib folder like plugins, they are not seen. The problem is that plugins loaded by solr are loaded using SolrResourceLoader's classloader (configureable via solrconfig.xml), but as this classloader is not also context classloader, SPI does not look into it, so scripting engines, TIKA plugins, (previously codecs) are not seen. We should investigate how to manage setting the context classloader of all threads solr ever sees to point to our own solr classloader. When we do this, I also suggest to only ship with TIKA core libs but not tika-parsers and the big dependency hell. TIKA parsers are also loaded via SPI, so user can download the TIKA parser distribution and drop into $SOLR_HOME/lib. By that a user can also use only those extraction plugins really needed. The current solr distribution only consists of mostly useless JAR files (for many users) for Solr Extraction handler. We dont need to ship with all of them, we can just tell the user how to install the needed SPIs. The same for analysis-extras (user only needs to copy morphologic JAR or smartchinese JAR into $SOLR_HOME/lib - this works already!!!). No need for the hull contrib. Scripting engines is the same. We should just ship with some scripts (ANT based) to download the JAR files into $SOLR_HOME. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_05) - Build # 125 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/125/ Java: 64bit/jdk1.7.0_05 -XX:+UseConcMarkSweepGC 1 tests failed. REGRESSION: org.apache.solr.spelling.suggest.SuggesterTest.testRebuild Error Message: Exception during query Stack Trace: java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([A9A31C1A44AB23F5:F286BE5970AB596F]:0) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:486) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:453) at org.apache.solr.spelling.suggest.SuggesterTest.testRebuild(SuggesterTest.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: java.lang.RuntimeException: REQUEST FAILED: xpath=//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='ac']/int[@name='numFound'][.='2'] xml response was: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime1/int/lstlst name=spellchecklst name=suggestions//lst
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429682#comment-13429682 ] Yonik Seeley commented on SOLR-3684: bq. Address this default jetty threadpool size of max=10,000. This is the real issue. I had thought that jetty reused a small number of threads - O(n_concurrent_connections), regardless of what the max number of threads were? Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name. */ SolrIndexAnalyzer() { super(new solrFieldReuseStrategy()); analyzers = analyzerCache(); } protected HashMapString, Analyzer analyzerCache() { HashMapString, Analyzer cache = new HashMapString, Analyzer(); for (SchemaField f : getFields().values()) { Analyzer analyzer = f.getType().getAnalyzer(); cache.put(f.getName(), analyzer); } return cache; } @Override protected Analyzer getWrappedAnalyzer(String fieldName) { Analyzer analyzer = analyzers.get(fieldName); return analyzer != null ? analyzer : getDynamicFieldType(fieldName).getAnalyzer(); } @Override protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) { return components; } } private class SolrQueryAnalyzer extends SolrIndexAnalyzer {
[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index
[ https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429688#comment-13429688 ] Robert Muir commented on SOLR-3684: --- It does: I think the reuse is not the problem but the max? By default i think it keeps min threads always (default 10), but our max of 10,000 allows it to temporarily spike huge (versus blocking). from looking at the jetty code, by default these will die off after 60s, which is fine, but we enrolled so many entries into e.g. Analyzer's or SegmentReader's CloseableThreadlocals, that when they die off and the CTL does a purge, its just a ton of garbage. Really there isnt much benefit here in using so many threads at indexing time (dwpt's max threads is 8, unless changed in IndexWriterConfig, and this would have other bad side effects). At query time I think something closer to jetty's default of 254 would actually be better too. But i looked at the history of this file, and it seems the reason it was set to 10,000 was to prevent a deadlock (SOLR-683) ? Is there a better solution to this now so that we can reduce this max? Separately I've been fixing the analyzers that do hog ram because machines are getting more cores, so I think its worth it. But I think it would be nice if we can fix this max=10,000 Frequently full gc while do pressure index -- Key: SOLR-3684 URL: https://issues.apache.org/jira/browse/SOLR-3684 Project: Solr Issue Type: Improvement Components: multicore Affects Versions: 4.0-ALPHA Environment: System: Linux Java process: 4G memory Jetty: 1000 threads Index: 20 field Core: 5 Reporter: Raintung Li Priority: Critical Labels: garbage, performance Fix For: 4.0 Attachments: patch.txt Original Estimate: 168h Remaining Estimate: 168h Recently we test the Solr index throughput and performance, configure the 20 fields do test, the field type is normal text_general, start 1000 threads for Jetty, and define 5 cores. After test continued for some time, the solr process throughput is down very quickly. After check the root cause, find the java process always do the full GC. Check the heap dump, the main object is StandardTokenizer, it is be saved in the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer. In the Solr, will use the PerFieldReuseStrategy for the default reuse component strategy, that means one field has one own StandardTokenizer if it use standard analyzer, and standardtokenizer will occur 32KB memory because of zzBuffer char array. The worst case: Total memory = live threads*cores*fields*32KB In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, and those object only thread die can be released. Suggestion: Every request only handles by one thread that means one document only analyses by one thread. For one thread will parse the document’s field step by step, so the same field type can use the same reused component. While thread switches the same type’s field analyzes only reset the same component input stream, it can save a lot of memory for same type’s field. Total memory will be = live threads*cores*(different fields types)*32KB The source code modifies that it is simple; I can provide the modification patch for IndexSchema.java: private class SolrIndexAnalyzer extends AnalyzerWrapper { private class SolrFieldReuseStrategy extends ReuseStrategy { /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public TokenStreamComponents getReusableComponents(String fieldName) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); return componentsPerField != null ? componentsPerField.get(analyzers.get(fieldName)) : null; } /** * {@inheritDoc} */ @SuppressWarnings(unchecked) public void setReusableComponents(String fieldName, TokenStreamComponents components) { MapAnalyzer, TokenStreamComponents componentsPerField = (MapAnalyzer, TokenStreamComponents) getStoredValue(); if (componentsPerField == null) { componentsPerField = new HashMapAnalyzer, TokenStreamComponents(); setStoredValue(componentsPerField); } componentsPerField.put(analyzers.get(fieldName), components); } } protected final static HashMapString, Analyzer analyzers; /** * Implementation of {@link ReuseStrategy} that reuses components per-field by * maintaining a Map of TokenStreamComponent per field name.
[JENKINS] Lucene-Solr-tests-only-4.x-java7 - Build # 260 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x-java7/260/ 1 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler Error Message: ERROR: SolrIndexSearcher opens=76 closes=75 Stack Trace: java.lang.AssertionError: ERROR: SolrIndexSearcher opens=76 closes=75 at __randomizedtesting.SeedInfo.seed([48D5CDE332603C61]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:216) at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:754) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 7236 lines...] [junit4:junit4] Suite: org.apache.solr.handler.TestReplicationHandler [junit4:junit4] (@BeforeClass output) [junit4:junit4] 2 7 T46 oejs.Server.doStart jetty-8.1.2.v20120308 [junit4:junit4] 2 12 T46 oejs.AbstractConnector.doStart Started SocketConnector@0.0.0.0:42529 [junit4:junit4] 2 13 T46 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) [junit4:junit4] 2 14 T46 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: ./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master [junit4:junit4] 2 15 T46 oasc.SolrResourceLoader.init new SolrResourceLoader for deduced Solr Home: './org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/' [junit4:junit4] 2 49 T46 oass.SolrDispatchFilter.init SolrDispatchFilter.init() [junit4:junit4] 2 50 T46 oasc.SolrResourceLoader.locateSolrHome JNDI not configured for solr (NoInitialContextEx) [junit4:junit4] 2 50 T46 oasc.SolrResourceLoader.locateSolrHome using system property solr.solr.home: ./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master [junit4:junit4] 2 51 T46 oasc.CoreContainer$Initializer.initialize looking for solr.xml: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x-java7/checkout/solr/build/solr-core/test/J1/./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/solr.xml [junit4:junit4] 2 52 T46 oasc.CoreContainer.init New CoreContainer 451485183 [junit4:junit4] 2 52 T46 oasc.CoreContainer$Initializer.initialize no solr.xml file found - using default [junit4:junit4] 2 53 T46 oasc.CoreContainer.load Loading CoreContainer using Solr Home: './org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/' [junit4:junit4] 2 53 T46 oasc.SolrResourceLoader.init new SolrResourceLoader for directory: './org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/' [junit4:junit4] 2 88 T46 oasc.CoreContainer.load Registering Log Listener [junit4:junit4] 2 129 T46 oasc.CoreContainer.create Creating SolrCore 'collection1' using
VOTE: 4.0-BETA
Artifacts here: http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0bRC0-rev1370099/ The list of changes since 4.0-ALPHA is pretty large: lots of important bugs were fixed. This passes the smoketester (if you use it, you must use python3 now), so here is my +1. I think we should get it out and iterate towards the final release. P.S.: I will clean up JIRA etc as discussed before, so I don't ruin Hossman's day. If we need to respin we can just move the additional issues into CHANGES/JIRA section and then respin. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3717) DirectoryFactory.close() is never called
Hoss Man created SOLR-3717: -- Summary: DirectoryFactory.close() is never called Key: SOLR-3717 URL: https://issues.apache.org/jira/browse/SOLR-3717 Project: Solr Issue Type: Bug Reporter: Hoss Man Fix For: 5.0, 4.0 While working on SOLR-3699 i noticed that DirectoryFactory implements Closable (and thus: has a close() method) but (unless i'm missing something) never gets closed. I suspect the code that use to close() the DirectoryFactory got refactored into oblivion when SolrCoreState was introduced, and reloading a SolrCore started reusing the same DirectoryFactory. it seems like either DirectoryFactory should no longer have a close() method, or something at the CoreContainer level should ensure that all DirectoryFactories are closed when shuting down -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig
[ https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-3699: --- Attachment: SOLR-3699.patch Figured out the problem in my last patch: i was ignorant of the full DirectoryFactory API and didn't realize i should be calling doneWithDirectory(). I think this new patch is good to go, but i don't want to commit w/o review from someone who understands the DirectoryFactory semantics better (already opened SOLR-3717 because something looks wonky about the API, don't want to mess up and just fix a symptom here instead of the real problem SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig --- Key: SOLR-3699 URL: https://issues.apache.org/jira/browse/SOLR-3699 Project: Solr Issue Type: Bug Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-3699.patch, SOLR-3699.patch, SOLR-3699.patch in LUCENE-4278 i had to add a hack to force SimpleFSDir for CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on certain errors. This might indicate a problem that leaks happen if certain errors happen (e.g. not handled in finally) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig
[ https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned SOLR-3699: -- Assignee: Mark Miller Mark: can you sanity check this patch for me? SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig --- Key: SOLR-3699 URL: https://issues.apache.org/jira/browse/SOLR-3699 Project: Solr Issue Type: Bug Reporter: Robert Muir Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-3699.patch, SOLR-3699.patch, SOLR-3699.patch in LUCENE-4278 i had to add a hack to force SimpleFSDir for CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on certain errors. This might indicate a problem that leaks happen if certain errors happen (e.g. not handled in finally) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stemming Indonesian in Lucene
Hello, Have you looked at http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/id/IndonesianStemmer.java ? This uses a different algorithm, but maybe it gives you some ideas: http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf On Sun, Aug 5, 2012 at 11:37 PM, Emiliana Suci emily_elz...@yahoo.com wrote: I am interested in Lucene implement stemming Indonesian. I look at lucene no algorithm Nazief and Adriani. I am still a beginner and ask directions to implement it. -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-Indonesian-in-Lucene-tp3999321.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets
[ https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429916#comment-13429916 ] Robert Muir commented on LUCENE-4290: - I get some improvements here in performance (for non-prox queries) by hacking up luceneutil to test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors. However, I don't think this will be realistically useful until we have the new block layout from the pfor branch: prox queries are hurt by the interleaving in the stream (just like if you use payloads), unrelated to highlighting. I tried to do more experiments like 'wikibig' in luceneutil but i ran out of disk space. Once we have the block layout landed lets revisit this: it gives a much smaller index, faster indexing, and I think will work well when thats sorted out. basic highlighter that uses postings offsets Key: LUCENE-4290 URL: https://issues.apache.org/jira/browse/LUCENE-4290 Project: Lucene - Core Issue Type: New Feature Components: modules/other Reporter: Robert Muir Attachments: LUCENE-4290.patch We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can efficiently compress character offsets in the postings list, but nothing yet makes use of this. Here is a simple highlighter that uses them: it doesn't have many tests or fancy features, but I think its ok for the sandbox/ (maybe with a couple more tests) Additionally I didnt do any benchmarking. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X
[ https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ibrahim updated LUCENE-4216: Attachment: ArabicTokenizer.java ArabicAnalyzer.java greatly appreciated. it worked out without the low level implementation for incrementToken(). Token X exceeds length of provided text sized X --- Key: LUCENE-4216 URL: https://issues.apache.org/jira/browse/LUCENE-4216 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.0-ALPHA Environment: Windows 7, jdk1.6.0_27 Reporter: Ibrahim Attachments: ArabicAnalyzer.java, ArabicTokenizer.java, ArabicTokenizer.java, myApp.zip I'm facing this exception: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم exceeds length of provided text sized 170 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233) at classes.myApp$16$1.run(myApp.java:1508) I tried to find anything wrong in my code when i start migrating Lucene 3.6 to 4.0 without successful. i found similar issues with HTMLStripCharFilter e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm triggering this here to see if there is really a bug or it is something wrong in my code with v4. The code that im using: final Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query)); ... final TokenStream tokenStream = TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, analyzer); final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, doc.get(Line), false, 10); Please note that this is working fine with v3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org