RE: 3.0.3 Pre-Release Nuget Packages

2012-08-06 Thread Prescott Nasser
I also want to point out we brought back .NET 3.5 compatibility - hopefully 
that gets some people excited
  From: geobmx...@hotmail.com
 To: lucene-net-dev@lucene.apache.org; lucene-net-u...@lucene.apache.org
 Subject: 3.0.3 Pre-Release Nuget Packages
 Date: Mon, 6 Aug 2012 13:55:01 -0700
 
 
 
 
 Hey All, I've hidden the two depreciated Nuget packages (Lucene and Lucene 
 Contrib). I've also added pre-release (3.0.3-RC) Packages for Lucene.Net and 
 Lucene.Net.Contrib. If you have time, I would ask that you take them for a 
 test drive and provide us any feedback you have. Thanks all,~Prescott 

  

Stemming Indonesian in Lucene

2012-08-06 Thread Emiliana Suci
I am interested in Lucene implement stemming Indonesian. I look at lucene no
algorithm Nazief and Adriani. I am still a beginner and ask directions to
implement it.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-Indonesian-in-Lucene-tp3999321.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4289) highlighter idf calculation problems

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429001#comment-13429001
 ] 

Uwe Schindler commented on LUCENE-4289:
---

Too funny, +1 to fix. Backport?

 highlighter idf calculation problems
 

 Key: LUCENE-4289
 URL: https://issues.apache.org/jira/browse/LUCENE-4289
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4289.patch


 * highlighter uses numDocs instead of maxDoc
 * fastvectorhighlighter uses numDocs - numDeletedDocs instead of maxDoc (will 
 go negative if more than half of docs are marked deleted)
 * fastvectorhighlighter calls docFreq and computes IDF per-position when it 
 won't change (inefficient)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-06 Thread Gili Nachum (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429006#comment-13429006
 ] 

Gili Nachum commented on LUCENE-2501:
-

Issue resolved successfully. Even when increasing the degree of concurrency, I 
can no longer reproduce with 16 threads over 4 core machine. 
Thank you Michael!

 ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
 --

 Key: LUCENE-2501
 URL: https://issues.apache.org/jira/browse/LUCENE-2501
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.1
Reporter: Tim Smith
 Attachments: LUCENE-2501.patch


 I'm seeing the following exception during indexing:
 {code}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
 at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
 at 
 org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
 at 
 org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
 at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
 ... 37 more
 {code}
 This seems to be caused by the following code:
 {code}
 final int level = slice[upto]  15;
 final int newLevel = nextLevelArray[level];
 final int newSize = levelSizeArray[newLevel];
 {code}
 this can result in level being a value between 0 and 14
 the array nextLevelArray is only of size 10
 i suspect the solution would be to either max the level to 10, or to add more 
 entries to the nextLevelArray so it has 15 entries
 however, i don't know if something more is going wrong here and this is just 
 where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Raintung Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429015#comment-13429015
 ] 

Raintung Li commented on SOLR-3684:
---

For 1, I want to test index's the throughput in solr cloud, start 1000 threads 
in the Jmeter, solr cloud server Jetty max threads is 1.
Usually pressure test throughput achieve the max, then keep or down smoothly, 
the average last status is stable.  In this case, the JVM look like the hungup, 
always do full gc, the cache for StandardTokenizer cost too many memory and 
thread still alive that cause the cache can't release, new request still come, 
the throughput become very bad.

For 2, how to create the per-field analyzer? Is it the same analyzer? 
analyzer.tokenStream had been declare final, how to create the tokenStream the 
different fields? For one thread use the same tokenstream it is safe, 
TokenStreamComponents it is thread's cache. Could you give more information?



 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.
  */
 
 SolrIndexAnalyzer() {
   super(new solrFieldReuseStrategy());
   analyzers = analyzerCache();
 }
 protected HashMapString, Analyzer analyzerCache() {
   HashMapString, Analyzer cache = new HashMapString, Analyzer();
   for (SchemaField f : getFields().values()) {
 Analyzer analyzer = 

[jira] [Created] (LUCENE-4290) basic highlighter that uses postings offsets

2012-08-06 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4290:
---

 Summary: basic highlighter that uses postings offsets
 Key: LUCENE-4290
 URL: https://issues.apache.org/jira/browse/LUCENE-4290
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Reporter: Robert Muir


We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can 
efficiently compress character offsets in the postings list, but nothing yet 
makes use of this.

Here is a simple highlighter that uses them: it doesn't have many tests or 
fancy features, but I think its ok for the sandbox/ (maybe with a couple more 
tests)

Additionally I didnt do any benchmarking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4290) basic highlighter that uses postings offsets

2012-08-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4290:


Attachment: LUCENE-4290.patch

 basic highlighter that uses postings offsets
 

 Key: LUCENE-4290
 URL: https://issues.apache.org/jira/browse/LUCENE-4290
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Reporter: Robert Muir
 Attachments: LUCENE-4290.patch


 We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can 
 efficiently compress character offsets in the postings list, but nothing yet 
 makes use of this.
 Here is a simple highlighter that uses them: it doesn't have many tests or 
 fancy features, but I think its ok for the sandbox/ (maybe with a couple more 
 tests)
 Additionally I didnt do any benchmarking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-08-06 Thread Ibrahim (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ibrahim updated LUCENE-4216:


Attachment: ArabicTokenizer.java

I have decreased the offset by the difference in length before and after 
Tashkeel,
On the other, I really do not know what it means. I have tested it in both 
cases with multi-value field (since offset is affecting end()) but found it is 
working.


 Token X exceeds length of provided text sized X
 ---

 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
 Attachments: ArabicTokenizer.java, myApp.zip


 I'm facing this exception:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
 exceeds length of provided text sized 170
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
   at classes.myApp$16$1.run(myApp.java:1508)
 I tried to find anything wrong in my code when i start migrating Lucene 3.6 
 to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
 e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
 triggering this here to see if there is really a bug or it is something wrong 
 in my code with v4. The code that im using:
 final Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
 ...
 final TokenStream tokenStream = 
 TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
 analyzer);
 final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
 doc.get(Line), false, 10);
 Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429024#comment-13429024
 ] 

Uwe Schindler commented on LUCENE-4216:
---

Hi,

{code:java}
/** A tokenizer that will return tokens in the arabic alphabet. This tokenizer
 * is a bit rude since it also filters digits and punctuation, even in an arabic
 * part of stream. Well... I've planned to write a
 * universal, highly configurable, character tokenizer.
 * @author Pierrick Brihaye, 2003
 */
{code}

You don't need to implement your own ArabicTokenizer, just subclass the 
abstract Lucene class CharTokenizer which has all the functionality this 
comment in your source code offers. The change is easy: Subclass directly and 
remove all code exept isArabicChar and rename this method to isTokenChar (it 
takes int not char, but thats just a cast). The Tashkel stuff should be done 
with PatternReplaceFilter wrapped on top of this Tokenizer, there is no need to 
have this in the Tokenizer itsself and makes code complex. Then you can 100% be 
sure that all offsets are correct, the code you use is a duüplicate and it is 
too risky to reinvent the wheel if a well-tested variant is available with the 
Lucene distribution. It is much easier, trust me, no need to implement any 
crazy reset,... methods!

 Token X exceeds length of provided text sized X
 ---

 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
 Attachments: ArabicTokenizer.java, myApp.zip


 I'm facing this exception:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
 exceeds length of provided text sized 170
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
   at classes.myApp$16$1.run(myApp.java:1508)
 I tried to find anything wrong in my code when i start migrating Lucene 3.6 
 to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
 e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
 triggering this here to see if there is really a bug or it is something wrong 
 in my code with v4. The code that im using:
 final Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
 ...
 final TokenStream tokenStream = 
 TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
 analyzer);
 final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
 doc.get(Line), false, 10);
 Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429025#comment-13429025
 ] 

Uwe Schindler commented on LUCENE-4216:
---

It is also much more performant, as your code creates regex mathcers all the 
time and copies the token chars to new Strings all the time instead of working 
directly on the CharTermAttribute (which extends CharSequence, so can do 
regexes directly).

 Token X exceeds length of provided text sized X
 ---

 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
 Attachments: ArabicTokenizer.java, myApp.zip


 I'm facing this exception:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
 exceeds length of provided text sized 170
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
   at classes.myApp$16$1.run(myApp.java:1508)
 I tried to find anything wrong in my code when i start migrating Lucene 3.6 
 to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
 e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
 triggering this here to see if there is really a bug or it is something wrong 
 in my code with v4. The code that im using:
 final Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
 ...
 final TokenStream tokenStream = 
 TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
 analyzer);
 final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
 doc.get(Line), false, 10);
 Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429027#comment-13429027
 ] 

Mikhail Khludnev commented on SOLR-3684:


Hello, 

Q1 gives one more usage for SOLR-3585. It uses dedicated thread pool with 
limited capacity to proceed updates. So, the core challenge will be solved. 

Raintung, 
updating with the storm of small messages is not common for search engines 
world. Usual way is collecting them in bulks and index by modest number of 
threads. Sooner or later indexing hits io limit, therefore there is no profit 
to utilize CPU's by huge amount of indexing threads.  


 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.
  */
 
 SolrIndexAnalyzer() {
   super(new solrFieldReuseStrategy());
   analyzers = analyzerCache();
 }
 protected HashMapString, Analyzer analyzerCache() {
   HashMapString, Analyzer cache = new HashMapString, Analyzer();
   for (SchemaField f : getFields().values()) {
 Analyzer analyzer = f.getType().getAnalyzer();
 cache.put(f.getName(), analyzer);
   }
   return cache;
 }
 @Override
 protected Analyzer getWrappedAnalyzer(String fieldName) {
   Analyzer analyzer = analyzers.get(fieldName);
   return analyzer != null ? analyzer : 
 

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Raintung Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429038#comment-13429038
 ] 

Raintung Li commented on SOLR-3684:
---

Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three 
solr index servers, and all servers have the same issues. 

My suggestion just want to reduce wasteful memory, although memory is cheap 
now. To improve the performance to avoid io limit, we save into the memory, but 
also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 1 in the solr, in this cause the 
every server's alive threads are more than 1000.






 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.
  */
 
 SolrIndexAnalyzer() {
   super(new solrFieldReuseStrategy());
   analyzers = analyzerCache();
 }
 protected HashMapString, Analyzer analyzerCache() {
   HashMapString, Analyzer cache = new HashMapString, Analyzer();
   for (SchemaField f : getFields().values()) {
 Analyzer analyzer = f.getType().getAnalyzer();
 cache.put(f.getName(), analyzer);
   }
   return cache;
 }
 @Override
 protected Analyzer getWrappedAnalyzer(String fieldName) {
   Analyzer analyzer = analyzers.get(fieldName);
   return analyzer != 

[jira] [Resolved] (LUCENE-2501) ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice

2012-08-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2501.


   Resolution: Fixed
Fix Version/s: 3.6
   5.0
   4.0

Thanks for bringing closure, Gili.

 ArrayIndexOutOfBoundsException in ByteBlockPool.allocSlice
 --

 Key: LUCENE-2501
 URL: https://issues.apache.org/jira/browse/LUCENE-2501
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 3.0.1
Reporter: Tim Smith
 Fix For: 4.0, 5.0, 3.6

 Attachments: LUCENE-2501.patch


 I'm seeing the following exception during indexing:
 {code}
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 14
 at org.apache.lucene.index.ByteBlockPool.allocSlice(ByteBlockPool.java:118)
 at 
 org.apache.lucene.index.TermsHashPerField.writeByte(TermsHashPerField.java:490)
 at 
 org.apache.lucene.index.TermsHashPerField.writeVInt(TermsHashPerField.java:511)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.writeProx(FreqProxTermsWriterPerField.java:104)
 at 
 org.apache.lucene.index.FreqProxTermsWriterPerField.newTerm(FreqProxTermsWriterPerField.java:120)
 at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:468)
 at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:174)
 at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:774)
 at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:757)
 at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2085)
 ... 37 more
 {code}
 This seems to be caused by the following code:
 {code}
 final int level = slice[upto]  15;
 final int newLevel = nextLevelArray[level];
 final int newSize = levelSizeArray[newLevel];
 {code}
 this can result in level being a value between 0 and 14
 the array nextLevelArray is only of size 10
 i suspect the solution would be to either max the level to 10, or to add more 
 entries to the nextLevelArray so it has 15 entries
 however, i don't know if something more is going wrong here and this is just 
 where the exception hits from a deeper issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets

2012-08-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429049#comment-13429049
 ] 

Michael McCandless commented on LUCENE-4290:


Wow :)  This looks very nice!

Should we move EMPTY into DocsAndPositionsEnum?

This isn't just a cutover from term vectors to postings right?  It actually 
scores each passage as if it were its own hit/document matching a search?  Ie 
the passage ranking/selection differs from the two existing highlighters.

I like the EMPTY_INDEXREADER (so MTQs do no rewrite work).

 basic highlighter that uses postings offsets
 

 Key: LUCENE-4290
 URL: https://issues.apache.org/jira/browse/LUCENE-4290
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Reporter: Robert Muir
 Attachments: LUCENE-4290.patch


 We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can 
 efficiently compress character offsets in the postings list, but nothing yet 
 makes use of this.
 Here is a simple highlighter that uses them: it doesn't have many tests or 
 fancy features, but I think its ok for the sandbox/ (maybe with a couple more 
 tests)
 Additionally I didnt do any benchmarking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Raintung Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429038#comment-13429038
 ] 

Raintung Li edited comment on SOLR-3684 at 8/6/12 9:51 AM:
---

Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three 
solr index servers, and all servers have the same issues. 

My suggestion just want to reduce wasteful memory, although memory is cheap 
now. To improve the performance to avoid io limit, we save into the memory, but 
also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 1 in the solr, in this case the 
every server's alive threads are more than 1000.






  was (Author: raintung.li):
Hi Mikhail,

It isn't really storm that only 1000 client send the message, and we have three 
solr index servers, and all servers have the same issues. 

My suggestion just want to reduce wasteful memory, although memory is cheap 
now. To improve the performance to avoid io limit, we save into the memory, but 
also need calculate the memory usage even if JVM help us to manage the memory.

BTW, the default Jetty thread config is 1 in the solr, in this cause the 
every server's alive threads are more than 1000.





  
 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of 

[jira] [Updated] (SOLR-3473) Distributed deduplication broken

2012-08-06 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-3473:


Attachment: SOLR-3473-trunk-2.patch

Hello - Could the deleteByQuery issue you mention be fixed with SOLR-3473? I've 
attached an updated patch for today's trunk. The previous patch was missing the 
signature field but i added it to one schema. Now other tests seem to fail 
because they don't see the sig field but do use the update chain.

Anyway, it seems the BasicDistributedZkTest passes but i'm not very sure, 
there's too much log output but it doesn't fail.

 Distributed deduplication broken
 

 Key: SOLR-3473
 URL: https://issues.apache.org/jira/browse/SOLR-3473
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, update
Affects Versions: 4.0-ALPHA
Reporter: Markus Jelsma
 Fix For: 4.0

 Attachments: SOLR-3473-trunk-2.patch, SOLR-3473.patch, SOLR-3473.patch


 Solr's deduplication via the SignatureUpdateProcessor is broken for 
 distributed updates on SolrCloud.
 Mark Miller:
 {quote}
 Looking again at the SignatureUpdateProcessor code, I think that indeed this 
 won't currently work with distrib updates. Could you file a JIRA issue for 
 that? The problem is that we convert update commands into solr documents - 
 and that can cause a loss of info if an update proc modifies the update 
 command.
 I think the reason that you see a multiple values error when you try the 
 other order is because of the lack of a document clone (the other issue I 
 mentioned a few emails back). Addressing that won't solve your issue though - 
 we have to come up with a way to propagate the currently lost info on the 
 update command.
 {quote}
 Please see the ML thread for the full discussion: 
 http://lucene.472066.n3.nabble.com/SolrCloud-deduplication-td3984657.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429089#comment-13429089
 ] 

Robert Muir commented on LUCENE-4290:
-

{quote}
Should we move EMPTY into DocsAndPositionsEnum?
{quote}

maybe it can be either moved or removed if the code is fixed :)

In this first patch its used both as a sentinel for a stopping condition and as
a placeholder for term doesnt exist in this segment. The former i think is
no longer necessary and the latter is probably overkill.

{quote}
This isn't just a cutover from term vectors to postings right? It actually 
scores each passage as if it were its own hit/document matching a search? Ie 
the passage ranking/selection differs from the two existing highlighters.
{quote}

Right: I think its different in a number of ways. I hope it should be really 
fast: but
again I didnt even bother benchmarking yet.

Its also limited in some ways since its just a prototype.


 basic highlighter that uses postings offsets
 

 Key: LUCENE-4290
 URL: https://issues.apache.org/jira/browse/LUCENE-4290
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Reporter: Robert Muir
 Attachments: LUCENE-4290.patch


 We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can 
 efficiently compress character offsets in the postings list, but nothing yet 
 makes use of this.
 Here is a simple highlighter that uses them: it doesn't have many tests or 
 fancy features, but I think its ok for the sandbox/ (maybe with a couple more 
 tests)
 Additionally I didnt do any benchmarking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Windows (32bit/jdk1.7.0_05) - Build # 112 - Failure!

2012-08-06 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows/112/
Java: 32bit/jdk1.7.0_05 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 20006 lines...]
javadocs-lint:

[...truncated 1674 lines...]
BUILD FAILED
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\build.xml:47: The following error 
occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:524: The 
following error occurred while executing this line:
C:\Jenkins\workspace\Lucene-Solr-4.x-Windows\lucene\build.xml:514: exec 
returned: 1

Total time: 41 minutes 52 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3649) The javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(ListString ids).

2012-08-06 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren updated SOLR-3649:
-

Attachment: SOLR-3649.patch

Here's a patch that fixes the test for deleting by multiple ids + a proposed 
fix.

 The javabin update request handler does not seem to be working properly when 
 calling solrj method*HttpSolrServer.deleteById(ListString ids).
 --

 Key: SOLR-3649
 URL: https://issues.apache.org/jira/browse/SOLR-3649
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Reporter: Mark Miller
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: SOLR-3649.patch


 A single Id gets deleted from the index as opposed to the full list. It
 appears properly in the logs - shows delete of all Ids sent, although all
 but one remain in the index.
 As reported on the mailing list 
 http://lucene.472066.n3.nabble.com/Solr-4-Alpha-SolrJ-Indexing-Issue-tp3995781p3996074.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3685) solrcloud crashes on startup due to excessive memory consumption

2012-08-06 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429132#comment-13429132
 ] 

Markus Jelsma commented on SOLR-3685:
-

Each node has two cores and allow only one warming searcher at any time. The 
problem is triggered on start up after graceful shutdown as well as a hard 
power off. I've seen it happening not only when the whole cluster if restarted 
(i don't think i've ever done that) but just one node of the 6 shard 2 replica 
test cluster.

The attached log is of one node being restarted out of the whole cluster.

Could the off-heap RAM be part of data being sent over the wire?

We've worked around the problem for now by getting more RAM.

 solrcloud crashes on startup due to excessive memory consumption
 

 Key: SOLR-3685
 URL: https://issues.apache.org/jira/browse/SOLR-3685
 Project: Solr
  Issue Type: Bug
  Components: replication (java), SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Debian GNU/Linux Squeeze 64bit
 Solr 5.0-SNAPSHOT 1365667M - markus - 2012-07-25 19:09:43
Reporter: Markus Jelsma
Priority: Critical
 Fix For: 4.1

 Attachments: info.log


 There's a serious problem with restarting nodes, not cleaning old or unused 
 index directories and sudden replication and Java being killed by the OS due 
 to excessive memory allocation. Since SOLR-1781 was fixed index directories 
 get cleaned up when a node is being restarted cleanly, however, old or unused 
 index directories still pile up if Solr crashes or is being killed by the OS, 
 happening here.
 We have a six-node 64-bit Linux test cluster with each node having two 
 shards. There's 512MB RAM available and no swap. Each index is roughly 27MB 
 so about 50MB per node, this fits easily and works fine. However, if a node 
 is being restarted, Solr will consistently crash because it immediately eats 
 up all RAM. If swap is enabled Solr will eat an additional few 100MB's right 
 after start up.
 This cannot be solved by restarting Solr, it will just crash again and leave 
 index directories in place until the disk is full. The only way i can restart 
 a node safely is to delete the index directories and have it replicate from 
 another node. If i then restart the node it will crash almost consistently.
 I'll attach a log of one of the nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429166#comment-13429166
 ] 

Erik Hatcher commented on SOLR-1725:


bq. How do these tests pass under Ant?

Maybe this is due to some libraries Ant itself is including in the classpath of 
the tests running?

I'll go ahead and re-open this issue so it is red-flagged as something we 
should resolve before 4.0 final release.

Perhaps we can include a scripting implementation in Solr, at least for testing 
purposes but maybe also to ship with to ensure this works out of the box on all 
JVMs.  jruby.jar would be nice to have handy always :)

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Assignee: Erik Hatcher
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reopened SOLR-1725:


  Assignee: (was: Erik Hatcher)

re-opening to have to have the tests (specifically the failing Maven run) 
looked at.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-1725:
---

Priority: Critical  (was: Major)

marking the re-opening as critical to fix, hopefully at least before 4.0 
final.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429173#comment-13429173
 ] 

Robert Muir commented on SOLR-3684:
---

{quote}
BTW, the default Jetty thread config is 1 in the solr,
{quote}

Can we address this default thread config with a patch? This doesn't seem good, 
I guess if someone doesn't
fix this I can easily DoS Solrs into eating up all their RAM until rebooted. 
Something like 100 seems just
fine for QueuedThreadPool, so it will block in such cases (and probably just 
end out being faster overall).

{quote}
For 2, how to create the per-field analyzer? Is it the same analyzer? 
analyzer.tokenStream had been declare final, how to create the tokenStream the 
different fields? For one thread use the same tokenstream it is safe, 
TokenStreamComponents it is thread's cache. Could you give more information?
{quote}

Well basically your patch should be a nice improvement about 99.9% of the time. 
There is a (maybe only theoretical)
case where someone has a lucene Analyzer MyAnalyzer configured as:
{quote}
fieldType name=text_custom class=solr.TextField
  analyzer class=com.mypackage.MyAnalyzer/
/fieldType
...
field name=foo type=text_custom .../
field name=bar type=text_custom .../
...
{quote}

If MyAnalyzer has different behavior for foo versus bar, then 
reuse-by-field-type will be incorrect. I'll think
about a workaround, maybe nobody is even doing this or depends on this. But I 
just don't know if the same thing
could happen for custom fieldtypes or whatever. Its just the kind of thing that 
could be a sneaky bug in the future.

But I agree with the patch! I'll see if we can address it somehow.

Separately I think we should also open an issue to reduce these jflex buffer 
sizes. char[16k] seems like serious
overkill, the other tokenizers in lucene use char[4k].


 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, 

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429176#comment-13429176
 ] 

Uwe Schindler commented on SOLR-1725:
-

Hi, this is not a problem at all. OpenJDK on FreeBSD contains no scripting 
engine. So it was added in ants lib path. This is why it works on ant in 
FreeBSD Jenkins. Rhino is the javascript engine, missing in openjdks for legal 
reasons. Rhino is shipped with official jdks and is mandatory, so thats a 
stupid freebsd issue. Steven should add it to maven builds, too.

You can resolve issue.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429178#comment-13429178
 ] 

Steven Rowe commented on SOLR-1725:
---

Thanks Uwe, I'll add rhino to maven builds.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser

2012-08-06 Thread srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429183#comment-13429183
 ] 

srinivas commented on SOLR-3703:


Jack,

After adding autoGeneratePhraseQueries=true to the fieldType, we are good. 
Thanks a lot!!. I will close this ticket.

Thanks
Srini

 Escape character which is in the query, is getting ignored in solr 3.6 with 
 lucene parser
 -

 Key: SOLR-3703
 URL: https://issues.apache.org/jira/browse/SOLR-3703
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Linux
Reporter: srinivas

 I noticed, escape character which is in the query, is getting ignored in solr 
 3.6 with lucene parser. If I give edismax, then it is giving expected results 
 for the following query. 
 select?q=author:David\ DukedefType=lucene 
 Would render the same results as: 
 select?q=author:(David OR Duke)defType=lucene 
 But 
 select?q=author:David\ DukedefType=edismax 
 Would render the same results as: 
 select?q=author:David DukedefType=lucene 
 Regards
 Srini

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4291:
---

 Summary: consider reducing jflex buffer sizes
 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir


Spinoff from SOLR-3684.

Most lucene tokenizers have some buffer size, e.g. in 
CharTokenizer/ICUTokenizer its char[4096].

But the jflex tokenizers use char[16384] by default, which seems overkill. I'm 
not sure we really see any performance bonus by having such a huge buffer size 
as a default.

There is a jflex parameter to set this: I think we should consider reducing it.

In a configuration like solr, tokenizers are reused per-thread-per-field,
so these can easily stack up in RAM.

Additionally CharFilters are not reused so the configuration in e.g.
HtmlStripCharFilter might not be great since its per-document garbage.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429198#comment-13429198
 ] 

Steven Rowe commented on LUCENE-4291:
-

+1.  

For tokenizers, the buffer needs to be able to hold a token (and its trailing 
context, if lookahead is used), but nothing more.  16k tokens are likely 
extremely rare.  4k seems reasonable to me - it's still way bigger than most 
people are likely to hit over normal text input.

{{HTMLStripCharFilter}} is a bit different, since it buffers HTML constructs 
rather than tokens.  In the face of malformed input (e.g. an opening angle 
bracket '' with no closing angle bracket ''), the scanner might buffer the 
entire remaining input.  In contrast, {{LegacyHTMLStripCharFilter}}, the 
pre-JFlex implementation, limits this kind of buffering, to 8k max chars IIRC.


 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir

 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4289) highlighter idf calculation problems

2012-08-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4289.
-

   Resolution: Fixed
Fix Version/s: 3.6.2
   5.0
   4.0

I backported too. Note in 3.6 fast-vector-highlighter is unaffected, 
it doesn't compute IDF.

 highlighter idf calculation problems
 

 Key: LUCENE-4289
 URL: https://issues.apache.org/jira/browse/LUCENE-4289
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.0, 5.0, 3.6.2

 Attachments: LUCENE-4289.patch


 * highlighter uses numDocs instead of maxDoc
 * fastvectorhighlighter uses numDocs - numDeletedDocs instead of maxDoc (will 
 go negative if more than half of docs are marked deleted)
 * fastvectorhighlighter calls docFreq and computes IDF per-position when it 
 won't change (inefficient)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429203#comment-13429203
 ] 

Robert Muir commented on LUCENE-4291:
-

{quote}
For tokenizers, the buffer needs to be able to hold a token (and its trailing 
context, if lookahead is used), but nothing more. 16k tokens are likely 
extremely rare. 4k seems reasonable to me - it's still way bigger than most 
people are likely to hit over normal text input.
{quote}

Yes, I think its reasonable too: especially since maxTokenLength is 255 by 
default.

{quote}
HTMLStripCharFilter is a bit different, since it buffers HTML constructs rather 
than tokens. In the face of malformed input (e.g. an opening angle bracket '' 
with no closing angle bracket ''), the scanner might buffer the entire 
remaining input. In contrast, LegacyHTMLStripCharFilter, the pre-JFlex 
implementation, limits this kind of buffering, to 8k max chars IIRC.
{quote}

OK, I can leave this one alone. We can revisit if we can make CharFilters 
reusable (not simple to do cleanly today). Its not as much of an issue since 
nothing is hanging on to it.

I'll work up a patch.

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir

 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3715) improve tlog concurrency

2012-08-06 Thread Yonik Seeley (JIRA)
Yonik Seeley created SOLR-3715:
--

 Summary: improve tlog concurrency
 Key: SOLR-3715
 URL: https://issues.apache.org/jira/browse/SOLR-3715
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley


Right now log record serialization is synchronized.  We can improve concurrency 
by serializing to a ram buffer outside synchronization.  The cost will be RAM 
usage for buffering, and more complex concurrency in the tlog itself (i.e. we 
must ensure that a close does not happen without flushing all in-RAM buffers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3715) improve tlog concurrency

2012-08-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-3715:
--

Assignee: Yonik Seeley

 improve tlog concurrency
 

 Key: SOLR-3715
 URL: https://issues.apache.org/jira/browse/SOLR-3715
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Assignee: Yonik Seeley

 Right now log record serialization is synchronized.  We can improve 
 concurrency by serializing to a ram buffer outside synchronization.  The cost 
 will be RAM usage for buffering, and more complex concurrency in the tlog 
 itself (i.e. we must ensure that a close does not happen without flushing all 
 in-RAM buffers)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-ea-b49) - Build # 225 - Still Failing!

2012-08-06 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/225/
Java: 32bit/jdk1.8.0-ea-b49 -server -XX:+UseConcMarkSweepGC

1 tests failed.
REGRESSION:  org.apache.solr.servlet.SolrRequestParserTest.testStreamURL

Error Message:
connect timed out

Stack Trace:
java.net.SocketTimeoutException: connect timed out
at 
__randomizedtesting.SeedInfo.seed([B59DCD42307FDE67:ECA8F351445D1352]:0)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:395)
at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1668)
at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1663)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1662)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1245)
at 
org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:85)
at 
org.apache.solr.servlet.SolrRequestParserTest.testStreamURL(SolrRequestParserTest.java:137)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:474)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 

[jira] [Updated] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4291:


Attachment: LUCENE-4291.patch

Here's a patch: with regenerations.

Note that, by default 'ant jflex' gave me an error for all the includes (as of 
jflex r612).

So thats why you see changes like:
{noformat}
-%include 
src/java/org/apache/lucene/analysis/charfilter/HTMLCharacterEntities.jflex
+%include HTMLCharacterEntities.jflex
{noformat}

It seems jflex now expects these file paths to be relative to the input file?

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429214#comment-13429214
 ] 

Steven Rowe commented on SOLR-1725:
---

bq. OpenJDK on FreeBSD contains no scripting engine. So it was added in ants 
lib path.

How?  I've found the necessary jars, at 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}}, but I can't see how 
Ant's lib path includes them.  I looked at {{~hudson/.profile}}, and {{lib/}} 
and {{bin/ant}} under {{/usr/home/hudson/tools/ant/apache-ant-1.8.2}} - none of 
these refer to the directory containing {{js.jar}} and {{script-js.jar}}.

I'm asking because I'd like to set Maven up similarly to Ant.


 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Jenkins: Lucene-trunk-Linux-Java7-64 #105

2012-08-06 Thread builder
See builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/105/

--
[...truncated 37784 lines...]

-clover.setup:
 [echo] Code coverage with Atlassian Clover enabled.
[ivy:cachepath] :: resolving dependencies :: 
com.cenqua.clover#clover-caller;working
[ivy:cachepath] confs: [master]
[ivy:cachepath] found com.cenqua.clover#clover;2.6.3 in public
[ivy:cachepath] :: resolution report :: resolve 14ms :: artifacts dl 0ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  master  |   1   |   0   |   0   |   0   ||   1   |   0   |
-
[clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778)
[clover-setup] Loaded from: /var/lib/jenkins/.ant/lib/clover-2.6.3.jar
[clover-setup] Clover: Open Source License registered to Apache.
[clover-setup] Clover is enabled with initstring 
'builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/clover/db/coverage.db'

clover:

compile-core:

compile-test-framework:

ivy-availability-check:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/ivy-settings.xml

resolve:

init:

compile-lucene-core:

compile-core:

common.compile-test:

install-junit4-taskdef:

-clover.disable:

-clover.setup:
 [echo] Code coverage with Atlassian Clover enabled.
[ivy:cachepath] :: resolving dependencies :: 
com.cenqua.clover#clover-caller;working
[ivy:cachepath] confs: [master]
[ivy:cachepath] found com.cenqua.clover#clover;2.6.3 in public
[ivy:cachepath] :: resolution report :: resolve 13ms :: artifacts dl 1ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  master  |   1   |   0   |   0   |   0   ||   1   |   0   |
-
[clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778)
[clover-setup] Loaded from: /var/lib/jenkins/.ant/lib/clover-2.6.3.jar
[clover-setup] Clover: Open Source License registered to Apache.
[clover-setup] Clover is enabled with initstring 
'builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/clover/db/coverage.db'

clover:

validate:

common.test:
[mkdir] Created dir: 
builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/ws/checkout/lucene/build/suggest/test
[junit4:junit4] JUnit4 says olá! Master seed: C7DF8F67F5F8636C
[junit4:junit4] Executing 17 suites with 1 JVM.
[junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.BytesRefSortersTest
[junit4:junit4] Completed in 1.37s, 2 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.TestSort
[junit4:junit4] IGNOR/A 0.23s | TestSort.testLargerRandom
[junit4:junit4] Assumption #1: 'nightly' test group is disabled (@Nightly)
[junit4:junit4] Completed in 14.45s, 6 tests, 1 skipped
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.WFSTCompletionTest
[junit4:junit4] Completed in 4.20s, 2 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestNGramDistance
[junit4:junit4] Completed in 0.62s, 4 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.suggest.fst.FSTCompletionTest
[junit4:junit4] Completed in 22.41s, 12 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.suggest.LookupBenchmarkTest
[junit4:junit4] Completed in 0.12s, 0 tests
[junit4:junit4]  
[junit4:junit4] Suite: 
org.apache.lucene.search.suggest.TestHighFrequencyDictionary
[junit4:junit4] Completed in 0.74s, 1 test
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestLuceneDictionary
[junit4:junit4] Completed in 1.96s, 6 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestLevenshteinDistance
[junit4:junit4] Completed in 0.32s, 2 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestSpellChecker
[junit4:junit4] Completed in 12.05s, 6 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.suggest.TestTermFreqIterator
[junit4:junit4] Completed in 5.11s, 3 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestDirectSpellChecker
[junit4:junit4] Completed in 3.80s, 6 tests
[junit4:junit4]  
[junit4:junit4] Suite: org.apache.lucene.search.spell.TestWordBreakSpellChecker

[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-06 Thread Tom Burton-West (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429217#comment-13429217
 ] 

Tom Burton-West commented on LUCENE-4286:
-

We haven't had a request for this specific feature from readers, we are just 
assuming that the 10% of Han queries in our logs that consist of a single 
character represent real use cases and we don't want such queries to produce 
zero results or produce misleading results.

Tom

 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429219#comment-13429219
 ] 

Robert Muir commented on SOLR-1725:
---

{noformat}
[rcmuir@lucene /home/hudson/.ant/lib]$ ls -la
total 1843
drwxr-xr-x  2 hudson  hudson   5 Mar 30 15:46 .
drwxr-xr-x  3 hudson  hudson   8 May 13 12:41 ..
-rw-r--r--  1 hudson  hudson  947592 Mar 30 15:45 ivy-2.2.0.jar
-rw-r--r--  1 hudson  hudson  701049 Jul 27  2006 js.jar
-rw-r--r--  1 hudson  hudson   34607 Oct 16  2006 script-js.jar
{noformat}

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429218#comment-13429218
 ] 

Robert Muir commented on SOLR-1725:
---

I think they are added to ~hudson/.ant/lib ?

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Jenkins build is back to normal : Lucene-trunk-Linux-Java7-64 #106

2012-08-06 Thread builder
See builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64/106/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429221#comment-13429221
 ] 

Steven Rowe commented on LUCENE-4291:
-

bq. 

Gerwin Klein recently fixed [JFlex issue 
3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929],
 with exactly this change.

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429221#comment-13429221
 ] 

Steven Rowe edited comment on LUCENE-4291 at 8/6/12 4:08 PM:
-

bq. It seems jflex now expects these file paths to be relative to the input 
file?

Gerwin Klein recently fixed [JFlex issue 
3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929],
 with exactly this change.

  was (Author: steve_rowe):
bq. 

Gerwin Klein recently fixed [JFlex issue 
3420809|http://sourceforge.net/tracker/?func=detailaid=3420809group_id=14929atid=114929],
 with exactly this change.
  
 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429222#comment-13429222
 ] 

Steven Rowe commented on SOLR-1725:
---

Thanks Robert, I see them now.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads

2012-08-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429227#comment-13429227
 ] 

Dawid Weiss commented on LUCENE-4280:
-

TestLazyProxSkipping again.
{code}
[junit4:junit4] Suite: org.apache.lucene.index.TestLazyProxSkipping
[junit4:junit4] OK  0.01s J0 | TestLazyProxSkipping.testSeek
[junit4:junit4] OK  1.05s J0 | TestLazyProxSkipping.testLazySkipping
[junit4:junit4] (@AfterClass output)
[junit4:junit4]   2 Aug 06, 2012 3:47:18 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
[junit4:junit4]   2 WARNING: Will linger awaiting termination of 1 leaked 
thread(s).
[junit4:junit4]   2 Aug 06, 2012 3:47:38 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
[junit4:junit4]   2 SEVERE: 1 thread leaked from SUITE scope at 
org.apache.lucene.index.TestLazyProxSkipping: 
[junit4:junit4]   21) Thread[id=116, name=LuceneTestCase-18-thread-1, 
state=WAITING, group=TGRP-TestLazyProxSkipping]
[junit4:junit4]   2 at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2 at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[junit4:junit4]   2 at java.lang.Thread.run(Thread.java:722)
[junit4:junit4]   2 Aug 06, 2012 3:47:38 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
[junit4:junit4]   2 INFO: Starting to interrupt leaked threads:
[junit4:junit4]   21) Thread[id=116, name=LuceneTestCase-18-thread-1, 
state=WAITING, group=TGRP-TestLazyProxSkipping]
[junit4:junit4]   2 Aug 06, 2012 3:47:41 PM 
com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll
[junit4:junit4]   2 SEVERE: There are still zombie threads that couldn't be 
terminated:
[junit4:junit4]   21) Thread[id=116, name=LuceneTestCase-18-thread-1, 
state=WAITING, group=TGRP-TestLazyProxSkipping]
[junit4:junit4]   2 at sun.misc.Unsafe.park(Native Method)
[junit4:junit4]   2 at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4]   2 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
[junit4:junit4]   2 at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
[junit4:junit4]   2 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[junit4:junit4]   2 at java.lang.Thread.run(Thread.java:722)
[junit4:junit4]   2 NOTE: test params are: codec=Lucene40: 
{tokens=PostingsFormat(name=MockRandom)}, 
sim=RandomSimilarityProvider(queryNorm=false,coord=false): {tokens=DFR 
I(n)B3(800.0)}, locale=sl, timezone=America/Resolute
[junit4:junit4]   2 NOTE: Windows 7 6.1 amd64/Oracle Corporation 1.7.0_03 
(64-bit)/cpus=8,threads=2,free=130600992,total=261095424
[junit4:junit4]   2 NOTE: All tests run in this JVM: [TestBooleanOr, 
TestDirectory, TestMultiTermConstantScore, TestIndexFileDeleter, TestSetOnce, 
Nested1, TestStressIndexing2, TestRegexpRandom2, TestStressAdvance, 
TestSpansAdvanced, TestAssertions, TestFieldCacheRewriteMethod, 
TestPrefixInBooleanQuery, TestMultiPhraseQuery, TestMatchAllDocsQuery, 
TestLock, TestSimilarity2, TestNamedSPILoader, TestSort, TestBytesRefHash, 
TestOmitTf, TestVirtualMethod, TestLazyProxSkipping]
[junit4:junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestLazyProxSkipping -Dtests.seed=55A3CB2FF25AC1A5 -Dtests.slow=true 
-Dtests.locale=sl -Dtests.timezone=America/Resolute 
-Dtests.file.encoding=ISO-8859-1
[junit4:junit4]   2 
[junit4:junit4] ERROR   0.00s J0 | TestLazyProxSkipping (suite)
[junit4:junit4] Throwable #1: 
com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE 
scope at org.apache.lucene.index.TestLazyProxSkipping: 
[junit4:junit4]1) Thread[id=116, name=LuceneTestCase-18-thread-1, 
state=WAITING, group=TGRP-TestLazyProxSkipping]
[junit4:junit4] at sun.misc.Unsafe.park(Native Method)
[junit4:junit4] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit4:junit4] at 

[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429228#comment-13429228
 ] 

Robert Muir commented on LUCENE-4286:
-

The combined unigram+bigram technique is a general technique, which I think is 
useful to support.

For examples see:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.6782
http://members.unine.ch/jacques.savoy/Papers/NTCIR6.pdf

There are more references and studies linked from those.

Tom would have to do tests for his index-time-only approach: I can't speak 
for that.

 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429231#comment-13429231
 ] 

Robert Muir commented on LUCENE-4291:
-

OK thanks, that explains it! I'd like to commit this if there are no objections.

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429234#comment-13429234
 ] 

Steven Rowe commented on LUCENE-4291:
-

bq. I'd like to commit this if there are no objections.

+1, patch looks good.

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429240#comment-13429240
 ] 

Hoss Man commented on SOLR-1725:


I (think i) fixed the assumptions in these tests to actually skip properly if 
the engines aren't available...

Committed revision 1369874. - trunk
Committed revision 1369875. - 4x


 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4280) TestReaderClosed leaks threads

2012-08-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429244#comment-13429244
 ] 

Michael McCandless commented on LUCENE-4280:


I committed a fix for TestLazyProxSkipping (it wasn't closing the reader).

 TestReaderClosed leaks threads
 --

 Key: LUCENE-4280
 URL: https://issues.apache.org/jira/browse/LUCENE-4280
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Robert Muir
Priority: Minor

 {code}
 -ea
 -Dtests.seed=9449688B90185FA5
 -Dtests.iters=100
 {code}
 reproduces 100% for me, multiple thread leak out from newSearcher's internal 
 threadfactory:
 {code}
 Aug 02, 2012 8:46:05 AM com.carrotsearch.randomizedtesting.ThreadLeakControl 
 checkThreadLeaks
 SEVERE: 6 threads leaked from SUITE scope at 
 org.apache.lucene.index.TestReaderClosed: 
1) Thread[id=13, name=LuceneTestCase-1-thread-1, state=WAITING, 
 group=TGRP-TestReaderClosed]
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
2) Thread[id=15, name=LuceneTestCase-3-thread-1, state=WAITING, 
 group=TGRP-TestReaderClosed]
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
3) Thread[id=17, name=LuceneTestCase-5-thread-1, state=WAITING, 
 group=TGRP-TestReaderClosed]
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
4) Thread[id=18, name=LuceneTestCase-6-thread-1, state=WAITING, 
 group=TGRP-TestReaderClosed]
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
5) Thread[id=16, name=LuceneTestCase-4-thread-1, state=WAITING, 
 group=TGRP-TestReaderClosed]
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
 at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
6) Thread[id=14, name=LuceneTestCase-2-thread-1, state=WAITING, 
 

[jira] [Resolved] (LUCENE-4291) consider reducing jflex buffer sizes

2012-08-06 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4291.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.0

 consider reducing jflex buffer sizes
 

 Key: LUCENE-4291
 URL: https://issues.apache.org/jira/browse/LUCENE-4291
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4291.patch


 Spinoff from SOLR-3684.
 Most lucene tokenizers have some buffer size, e.g. in 
 CharTokenizer/ICUTokenizer its char[4096].
 But the jflex tokenizers use char[16384] by default, which seems overkill. 
 I'm not sure we really see any performance bonus by having such a huge buffer 
 size as a default.
 There is a jflex parameter to set this: I think we should consider reducing 
 it.
 In a configuration like solr, tokenizers are reused per-thread-per-field,
 so these can easily stack up in RAM.
 Additionally CharFilters are not reused so the configuration in e.g.
 HtmlStripCharFilter might not be great since its per-document garbage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429287#comment-13429287
 ] 

Robert Muir commented on SOLR-3684:
---

FYI: I lowered the jflex buffer sizes from 32kb to 8kb in LUCENE-4291.

So I think we should still:
# Address this default jetty threadpool size of max=10,000. This is the real 
issue.
# See if we can deal with the crazy corner case so we can impl your patch 
(reuse by fieldtype), which I think is a good separate improvement.


 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.
  */
 
 SolrIndexAnalyzer() {
   super(new solrFieldReuseStrategy());
   analyzers = analyzerCache();
 }
 protected HashMapString, Analyzer analyzerCache() {
   HashMapString, Analyzer cache = new HashMapString, Analyzer();
   for (SchemaField f : getFields().values()) {
 Analyzer analyzer = f.getType().getAnalyzer();
 cache.put(f.getName(), analyzer);
   }
   return cache;
 }
 @Override
 protected Analyzer getWrappedAnalyzer(String fieldName) {
   Analyzer analyzer = analyzers.get(fieldName);
   return analyzer != null ? analyzer : 
 getDynamicFieldType(fieldName).getAnalyzer();
 }
 @Override
 protected TokenStreamComponents wrapComponents(String fieldName, 
 TokenStreamComponents components) {
   

Re: svn commit: r1369892 [3/3] - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/analysis/ lucene/analysis/common/src/java/org/apache/lucene/analysis/charfilter/ lucene/analysis/common/src/java/o

2012-08-06 Thread Robert Muir
Hi, see the diff below. Just to explain why the DFA changed, the 3.4
backwards impl was previously %include'ing the wrong files it seems,
it was including them from the 'current' StandardTokenizer directory
before.

On Mon, Aug 6, 2012 at 1:36 PM,  rm...@apache.org wrote:
 Modified: 
 lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex?rev=1369892r1=1369891r2=1369892view=diff
 ==
 --- 
 lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex
  (original)
 +++ 
 lucene/dev/branches/branch_4x/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/std34/UAX29URLEmailTokenizerImpl34.jflex
  Mon Aug  6 17:36:34 2012
 @@ -39,8 +39,9 @@ import org.apache.lucene.analysis.tokena
  %implements StandardTokenizerInterface
  %function getNextToken
  %char
 +%buffer 4096

 -%include 
 src/java/org/apache/lucene/analysis/standard/SUPPLEMENTARY.jflex-macro
 +%include SUPPLEMENTARY.jflex-macro
...
 -%include src/java/org/apache/lucene/analysis/standard/ASCIITLD.jflex-macro
 +%include ASCIITLD.jflex-macro




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[ANNOUNCE] Lucene/Solr @ ApacheCon Europe - August 13th Deadline for CFP and Travel Assistance applications

2012-08-06 Thread Chris Hostetter


ApacheCon Europe will be happening 5-8 November 2012 in Sinsheim, Germany 
at the Rhein-Neckar-Arena.  Early bird tickets go on sale this Monday, 6 
August.


  http://www.apachecon.eu/

The Lucene/Solr track is shaping up to be quite impressive this year, so 
make your plans to attend and submit your session proposals ASAP!


-- CALL FOR PAPERS --

The Call for Participation for ApacheCon Europe has been extended to 13 
August!


To submit a presentation and for more details, visit 
http://www.apachecon.eu/cfp/


Post a banner on your Website to show your support for ApacheCon Europe or 
North America (24-28 February 2013 in Portland, OR)! Download at 
http://www.apache.org/events/logos-banners/


We look forward to seeing you!

 -the Apache Conference Committee  ApacheCon Planners

--- TRAVEL ASSISTANCE ---

We're pleased to announce Travel Assistance (TAC) applications for 
ApacheCon Europe 2012 are now open!


The Travel Assistance Committee exists to help those that would like to 
attend ApacheCon events, but are unable to do so for financial reasons. 
For more info on this years Travel Assistance application criteria please 
visit the TAC website at  http://www.apache.org/travel/ .


Some important dates... The original application period officially opened 
on 23rd July, 2012. Applicants have until the 13th August 2012 to submit 
their applications (which should contain as much supporting material as 
required to efficiently and accurately process your request), this will 
enable the Travel Assistance Committee to announce successful awards on or 
shortly after the 24th August, 2012.


As always TAC expects to deal with a range of applications from many 
diverse backgrounds so we encourage (as always) anyone thinking about 
sending in a TAC application to get it in ASAP.


We look forward to greeting everyone in Sinsheim, Germany in November.



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error

2012-08-06 Thread Dawid Weiss (JIRA)
Dawid Weiss created LUCENE-4292:
---

 Summary: TestPerfTasksLogic.testBGSearchTaskThreads assertion error
 Key: LUCENE-4292
 URL: https://issues.apache.org/jira/browse/LUCENE-4292
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss


{code}
build   06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | 
TestPerfTasksLogic.testBGSearchTaskThreads
build   06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: 
java.lang.AssertionError
build   06-Aug-2012 19:45:55[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.junit.Assert.fail(Assert.java:92)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.junit.Assert.assertTrue(Assert.java:43)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.junit.Assert.assertTrue(Assert.java:54)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
java.lang.reflect.Method.invoke(Method.java:597)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
build   06-Aug-2012 19:45:55[junit4:junit4]at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
build   06-Aug-2012 19:45:55[junit4:junit4]at 

[jira] [Assigned] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error

2012-08-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4292:
--

Assignee: Michael McCandless

 TestPerfTasksLogic.testBGSearchTaskThreads assertion error
 --

 Key: LUCENE-4292
 URL: https://issues.apache.org/jira/browse/LUCENE-4292
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Michael McCandless

 {code}
 build 06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | 
 TestPerfTasksLogic.testBGSearchTaskThreads
 build 06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: 
 java.lang.AssertionError
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 __randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.fail(Assert.java:92)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.assertTrue(Assert.java:43)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.assertTrue(Assert.java:54)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 java.lang.reflect.Method.invoke(Method.java:597)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 

Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java

2012-08-06 Thread Dawid Weiss
 +// NOTE: cannot assert this, because on a super-slow
 +// system, it could be after waiting 0.5 seconds that

Thanks Mike. Interesting because it's not that super-slow windows
machine. A dated 2 core AMD but I wouldn't say it's a snail.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4292) TestPerfTasksLogic.testBGSearchTaskThreads assertion error

2012-08-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-4292.


   Resolution: Fixed
Fix Version/s: 5.0
   4.0

I commented out the assertion for this test ... it's not valid.

 TestPerfTasksLogic.testBGSearchTaskThreads assertion error
 --

 Key: LUCENE-4292
 URL: https://issues.apache.org/jira/browse/LUCENE-4292
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Dawid Weiss
Assignee: Michael McCandless
 Fix For: 4.0, 5.0


 {code}
 build 06-Aug-2012 19:45:55[junit4:junit4] FAILURE 1.44s | 
 TestPerfTasksLogic.testBGSearchTaskThreads
 build 06-Aug-2012 19:45:55[junit4:junit4] Throwable #1: 
 java.lang.AssertionError
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 __randomizedtesting.SeedInfo.seed([73A6DA79EDD783F8:AE931FA55514525A]:0)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.fail(Assert.java:92)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.assertTrue(Assert.java:43)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.junit.Assert.assertTrue(Assert.java:54)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.benchmark.byTask.TestPerfTasksLogic.testBGSearchTaskThreads(TestPerfTasksLogic.java:159)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 java.lang.reflect.Method.invoke(Method.java:597)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:345)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:769)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:429)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 build 06-Aug-2012 19:45:55[junit4:junit4]at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 

Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java

2012-08-06 Thread Michael McCandless
On Mon, Aug 6, 2012 at 2:28 PM, Dawid Weiss dawid.we...@gmail.com wrote:
 +// NOTE: cannot assert this, because on a super-slow
 +// system, it could be after waiting 0.5 seconds that

 Thanks Mike. Interesting because it's not that super-slow windows
 machine. A dated 2 core AMD but I wouldn't say it's a snail.

Hmmm well somehow those 2 search threads weren't scheduled (enough)
before the 0.5 seconds was up.

This was the same case that previously would have lead to deadlock (BG
search threads hadn't started before the wait was done).

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1369911 - /lucene/dev/trunk/lucene/benchmark/src/test/org/apache/lucene/benchmark/byTask/TestPerfTasksLogic.java

2012-08-06 Thread Dawid Weiss
 Hmmm well somehow those 2 search threads weren't scheduled (enough)
 before the 0.5 seconds was up.

Very likely. 500ms isn't that much when you have competing threads and
some other processes in the background (which was possibly the case).

D.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429387#comment-13429387
 ] 

Steven Rowe commented on SOLR-1725:
---

After Hoss's commits, both ASF Jenkins Maven jobs have run, and under both 
jobs, tests that previously were failing under Maven due to the lack of a 
javascript engine in the classpath are now being skipped.

After those jobs started, I committed a change to 
{{dev/nightly/common-maven.sh}} that includes the two rhino jars in the Maven 
JVM boot class path: r1369936.

I've enqueued the Maven jobs again on ASF Jenkins.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2510) migrate solr analysis factories to analyzers module

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429393#comment-13429393
 ] 

Steven Rowe commented on LUCENE-2510:
-

Solr tests have been failing under Maven on ASF Jenkins since the LUCENE-4044 
commits on 7/25, because the POMs for two analysis modules (morfologik and 
phonetic) didn't include {{$\{project.build.resources}}} definitions for 
{{src/resources/}}, the location of the SPI configuration files 
{{META-INF/services/o.a.l.analysis.util.*Factory}}.  

I've added {{src/resources/}} to these two modules' POMs:

- r1369961: trunk
- r1369980: branch_4x


 migrate solr analysis factories to analyzers module
 ---

 Key: LUCENE-2510
 URL: https://issues.apache.org/jira/browse/LUCENE-2510
 Project: Lucene - Core
  Issue Type: Task
  Components: modules/analysis
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 4.0, 5.0

 Attachments: LUCENE-2510-movefactories.sh, 
 LUCENE-2510-movefactories.sh, LUCENE-2510-multitermcomponent.patch, 
 LUCENE-2510-multitermcomponent.patch, LUCENE-2510-parent-classes.patch, 
 LUCENE-2510-parent-classes.patch, LUCENE-2510-parent-classes.patch, 
 LUCENE-2510-resourceloader-bw.patch, LUCENE-2510-simplify-tests.patch, 
 LUCENE-2510.patch, LUCENE-2510.patch, LUCENE-2510.patch


 In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
 This is a good step, but I think the next step is to put the Solr factories 
 into the analyzers module, too.
 This would make analyzers artifacts plugins to both lucene and solr, with 
 benefits such as:
 * users could use the old analyzers module with solr, too. This is a good 
 step to use real library versions instead of Version for backwards compat.
 * analyzers modules such as smartcn and icu, that aren't currently available 
 to solr users due to large file sizes or dependencies, would be simple 
 optional plugins to solr and easily available to users that want them.
 Rough sketch in this thread: 
 http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module
 Practically, I havent looked much and don't really have a plan for how this 
 will work yet, so ideas are very welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: svn commit: r1369987 - in /lucene/dev/nightly: common-maven.sh hudson-settings.sh

2012-08-06 Thread Steven A Rowe
 -RHINO_LIBS_DIR=/usr/home/hudson/tools/java/openjdk-missing-libs
 +RHINO_LIBS_DIR=$HOME/tools/java/openjdk-missing-libs

Thanks Uwe. - Steve


Re: How do you interpret the values returned by RunAutomaton.getCharIntervals() ?

2012-08-06 Thread Anders Møller
If you show the automaton with toDot or toString it should be clear 
where those codepoints come from.


- Anders

On 04-08-2012 02:34, Ashwin Jayaprakash wrote:

Hi, I was playing with the RunAutomaton class and I was not sure about
the meaning of the results returned by the
RunAutomaton.getCharIntervals() method.

The JavaDoc for that method says Returns array of codepoint class
interval start points.. I tried it on a simple regex string
(ij{2,5}\uE001k789opq) and I couldn't explain why there were4 extra
values returned - 0x3a (:), 0x6c (l), 0x72 (r) and 0xe002 (Unicode
private use codepoint). These 4 characters were +1 step from the
characters 9, k, q and 0xe001 respectively, all of which are in the
regex from which the automaton was built.

Does anyone know why this is happening? All the codepoints in the regex
pattern have a length of just 1 char. So, why the extra chars?

What I was tying to really do was to extract the identifiers in the
pattern, which this method almost does except for some inexplicable,
extra values. I was really looking for an array with 7, 8, 9, i, j, k,
o, p, q, 0xe001.

Code:
   import org.apache.lucene.util.automaton.Automaton;
   import org.apache.lucene.util.automaton.RegExp;
   import org.apache.lucene.util.automaton.RunAutomaton;

   ... ..

   public static void main(String[] args) {
   String s = ij{2,5}\uE001k789opq;

   RegExp r = new RegExp(s);
   Automaton a = r.toAutomaton();
   RunAutomaton ra = new RunAutomaton(a,
Character.MAX_CODE_POINT, false) {
   };

   System.out.println(Char intervals for:  + s);
   for (int i : ra.getCharIntervals()) {
   System.out.println(   + Integer.toHexString(i) +  = 
+ new String(Character.toChars(i)));
   }
   }

Output:
   Char intervals for: ij{2,5}?k789opq
 0 =
 37 = 7
 38 = 8
 39 = 9
 3a = :
 69 = i
 6a = j
 6b = k
 6c = l
 6f = o
 70 = p
 71 = q
 72 = r
 e001 = ?
 e002 = ?


Thanks,
Ashwin.



--
Anders Moeller
amoel...@cs.au.dk
http://cs.au.dk/~amoeller

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-3703) Escape character which is in the query, is getting ignored in solr 3.6 with lucene parser

2012-08-06 Thread srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivas closed SOLR-3703.
--

Resolution: Fixed

 Escape character which is in the query, is getting ignored in solr 3.6 with 
 lucene parser
 -

 Key: SOLR-3703
 URL: https://issues.apache.org/jira/browse/SOLR-3703
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.6
 Environment: Linux
Reporter: srinivas

 I noticed, escape character which is in the query, is getting ignored in solr 
 3.6 with lucene parser. If I give edismax, then it is giving expected results 
 for the following query. 
 select?q=author:David\ DukedefType=lucene 
 Would render the same results as: 
 select?q=author:(David OR Duke)defType=lucene 
 But 
 select?q=author:David\ DukedefType=edismax 
 Would render the same results as: 
 select?q=author:David DukedefType=lucene 
 Regards
 Srini

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #49: POMs out of sync

2012-08-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/49/

No tests ran.

Build Log:
[...truncated 8471 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429458#comment-13429458
 ] 

Uwe Schindler commented on SOLR-1725:
-

I think, this is all fine:
- Java 6 spec requires a JavaScript engine to be shipped with JDK, it is just 
missing at FreeBSD's package (there is an issue open upstream). If JavaScript 
is not there for Java 6 it is like missing UTF8 charset :-)
- I strongly -1 shipping with additional scripting engines. No need for that. 
If user Foo wants to script Solr with engine Bar, he can add the SPI Jar to 
classpath. No need to ship. This is why SPI was invented!

We should maybe only fix Solr's classloader to be set as context classloader, 
too. SPIs cannot be loaded from $SOLR_HOME/lib, because context classloader 
does not see the jars. We fixed that for codecs and analyzer SPI JARs in Solr, 
but the most correct solution would be to enable Solr's threads to see the 
ResourceLoader as context classloader. Then you can add scripting engines, XML 
parsers, charset providers, locales,... just like plugins or codecs or 
analyzerfactories into the Solr home's lib folder without adding to WAR.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429468#comment-13429468
 ] 

Hoss Man commented on SOLR-1725:


bq. Java 6 spec requires a JavaScript engine to be shipped with JDK

i didn't know that ... i couldn't find anything in the docs that suggested 
certain engines were mandatory, hence the assuptions i nthe test (the maven 
tests just indicated that those assumptions ere broken)

bq. I strongly -1 shipping with additional scripting engines
i didn't see anyone suggesting that ... no argument there.

bq. We should maybe only fix Solr's classloader to be set as context 
classloader, too.

that sounds like an ortoginal issue ... great idea, didn't know it was 
possible, please go ahead and do it, but let's track it in it's own issue

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429472#comment-13429472
 ] 

Hoss Man commented on SOLR-1725:


{quote}
bq. I strongly -1 shipping with additional scripting engines

i didn't see anyone suggesting that ... no argument there.
{quote}

sorry .. i overlooked that part of erik's comment .. i'm with Uwe: let's let 
users add their own script engines as plugins

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams

2012-08-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429480#comment-13429480
 ] 

Lance Norskog commented on LUCENE-4286:
---

If you do unigrams and bigrams in separate fields, you can bias bigrams over 
unigrams. We did that with one customer and it really helped. Our text was 
technical and tended towards long words: lots of bigrams  trigrams. Have you 
tried the Smart Chinese toolkit? It produces a lot less bigrams. Our project 
worked well with it. I would try that, with misfires further broken into 
bigrams, over general bigramming. C.f. [SOLR-3653] about the misfires part.

In general we found Chinese-language search a really hard problem, and doubly 
so when nobody on the team speaks Chinese. 


 Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
 -

 Key: LUCENE-4286
 URL: https://issues.apache.org/jira/browse/LUCENE-4286
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.0-ALPHA, 3.6.1
Reporter: Tom Burton-West
Priority: Minor
 Fix For: 4.0, 5.0

 Attachments: LUCENE-4286.patch, LUCENE-4286.patch


 Add an optional  flag to the CJKBigramFilter to tell it to also output 
 unigrams.   This would allow indexing of both bigrams and unigrams and at 
 query time the analyzer could analyze queries as bigrams unless the query 
 contained a single Han unigram.
 As an example here is a configuration a Solr fieldType with the analyzer for 
 indexing with the indexUnigrams flag set and the analyzer for querying 
 without the flag. 
 fieldType name=CJK autoGeneratePhraseQueries=false
 −
 analyzer type=index
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory indexUnigrams=true 
 han=true/
 /analyzer
 analyzer type=query
tokenizer class=solr.ICUTokenizerFactory/
filter class=solr.CJKBigramFilterFactory han=true/
 /analyzer
 /fieldType
 Use case: About 10% of our queries that contain Han characters are single 
 character queries.   The CJKBigram filter only outputs single characters when 
 there are no adjacent bigrammable characters in the input.  This means we 
 have to create a separate field to index Han unigrams in order to address 
 single character queries and then write application code to search that 
 separate field if we detect a single character Han query.  This is rather 
 kludgey.  With the optional flag, we could configure Solr as above  
 This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter 
 used to allow single word queries (although that uses word n-grams rather 
 than character n-grams.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429487#comment-13429487
 ] 

Uwe Schindler commented on SOLR-1725:
-

Hoss, you are right, it is not required that JS is available, the Java 6 specs 
says [http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]:

{quote}
JSR 223: Scripting for the Java Platform  
A large percentage of Java developers also use scripting languages. While the 
Java language is suitable for many tasks, and especially for writing robust, 
long-lived applications, scripting languages are useful for many other tasks.

JSR 223 defines a framework for connecting interpreters of arbitrary scripting 
languages to Java programs. It includes facilities for locating the available 
scripting engines, invoking scripts from Java code and vice versa, and making 
Java application objects visible to scripts. The framework is divided into two 
parts, the Scripting API and an optional Web Scripting Framework. This feature 
will incorporate just the Scripting API into this version of the Java SE 
platform.

There will be no requirement that any particular scripting language be 
supported by the platform; implementors may choose to include support for the 
scripting language(s) of their choice as they see fit.

[ JSR 223; javax.script ]
{quote}

But all JDKs on all platforms except FreeBSD contain them. So we should have 
the error messages printed on failure to lookup engine and the assumption in 
test as you committed.

But as Erik says, too: No need to ship engines. Its just bloat because there 
are millions of them :-)

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429487#comment-13429487
 ] 

Uwe Schindler edited comment on SOLR-1725 at 8/6/12 10:11 PM:
--

Hoss, you are right, it is not required that JS is available, the Java 6 specs 
says [http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]:

{quote}
JSR 223: Scripting for the Java Platform  
A large percentage of Java developers also use scripting languages. While the 
Java language is suitable for many tasks, and especially for writing robust, 
long-lived applications, scripting languages are useful for many other tasks.

JSR 223 defines a framework for connecting interpreters of arbitrary scripting 
languages to Java programs. It includes facilities for locating the available 
scripting engines, invoking scripts from Java code and vice versa, and making 
Java application objects visible to scripts. The framework is divided into two 
parts, the Scripting API and an optional Web Scripting Framework. This feature 
will incorporate just the Scripting API into this version of the Java SE 
platform.

There will be no requirement that any particular scripting language be 
supported by the platform; implementors may choose to include support for the 
scripting language(s) of their choice as they see fit.

[ JSR 223; javax.script ]
{quote}

But all JDKs on all platforms except FreeBSD contain them. So we should have 
the error messages printed on failure to lookup engine and the assumption in 
test as you committed.

But as Hoss says, too: No need to ship engines. Its just bloat because there 
are millions of them :-)

  was (Author: thetaphi):
Hoss, you are right, it is not required that JS is available, the Java 6 
specs says 
[http://download.oracle.com/otndocs/jcp/j2se-1.6.0-pr-spec-oth-JSpec/]:

{quote}
JSR 223: Scripting for the Java Platform  
A large percentage of Java developers also use scripting languages. While the 
Java language is suitable for many tasks, and especially for writing robust, 
long-lived applications, scripting languages are useful for many other tasks.

JSR 223 defines a framework for connecting interpreters of arbitrary scripting 
languages to Java programs. It includes facilities for locating the available 
scripting engines, invoking scripts from Java code and vice versa, and making 
Java application objects visible to scripts. The framework is divided into two 
parts, the Scripting API and an optional Web Scripting Framework. This feature 
will incorporate just the Scripting API into this version of the Java SE 
platform.

There will be no requirement that any particular scripting language be 
supported by the platform; implementors may choose to include support for the 
scripting language(s) of their choice as they see fit.

[ JSR 223; javax.script ]
{quote}

But all JDKs on all platforms except FreeBSD contain them. So we should have 
the error messages printed on failure to lookup engine and the assumption in 
test as you committed.

But as Erik says, too: No need to ship engines. Its just bloat because there 
are millions of them :-)
  
 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the 

[jira] [Created] (SOLR-3716) Make SolrResourceLoaders ClassLoader available as context class loader

2012-08-06 Thread Uwe Schindler (JIRA)
Uwe Schindler created SOLR-3716:
---

 Summary: Make SolrResourceLoaders ClassLoader available as context 
class loader
 Key: SOLR-3716
 URL: https://issues.apache.org/jira/browse/SOLR-3716
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Reporter: Uwe Schindler
 Fix For: 4.0, 5.0


SOLR-1725 and other issues (recent changes to analysis factories and codecs) 
make it possible to plug in extensions like analyzer factories, codecs, 
scripting engines or TIKA parsers (TIKA extraction plugin!!!) as SPIs. The 
current problem (we solved this alreeady for codecs and analyzer factories with 
a classloader-reload hack: LUCENE-4259) is the following:

You have to unpack WAR file and repack with the missing JAR files. If you would 
do it the solr way and put those jars into the $SOLR_HOME/lib folder like 
plugins, they are not seen. The problem is that plugins loaded by solr are 
loaded using SolrResourceLoader's classloader (configureable via 
solrconfig.xml), but as this classloader is not also context classloader, SPI 
does not look into it, so scripting engines, TIKA plugins, (previously codecs) 
are not seen.

We should investigate how to manage setting the context classloader of all 
threads solr ever sees to point to our own solr classloader.

When we do this, I also suggest to only ship with TIKA core libs but not 
tika-parsers and the big dependency hell. TIKA parsers are also loaded via SPI, 
so user can download the TIKA parser distribution and drop into $SOLR_HOME/lib. 
By that a user can also use only those extraction plugins really needed. The 
current solr distribution only consists of mostly useless JAR files (for many 
users) for Solr Extraction handler. We dont need to ship with all of them, we 
can just tell the user how to install the needed SPIs. The same for 
analysis-extras (user only needs to copy morphologic JAR or smartchinese JAR 
into $SOLR_HOME/lib - this works already!!!). No need for the hull contrib. 
Scripting engines is the same.

We should just ship with some scripts (ANT based) to download the JAR files 
into $SOLR_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429528#comment-13429528
 ] 

Uwe Schindler commented on SOLR-1725:
-

I opened SOLR-3716.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585
 ] 

Steven Rowe commented on SOLR-1725:
---

On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on 
the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow 
both Ant and Maven build jobs to enable scripting tests.  I copied {{js.jar}} 
and {{script-js.jar}} from 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to 
{{/usr/local/openjdk{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests 
under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which 
was skipped (as expected).

I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}.

 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585
 ] 

Steven Rowe edited comment on SOLR-1725 at 8/6/12 11:44 PM:


On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on 
the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow 
both Ant and Maven build jobs to enable scripting tests.  I copied {{js.jar}} 
and {{script-js.jar}} from 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to 
{{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests 
under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which 
was skipped (as expected).

I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}.

  was (Author: steve_rowe):
On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} 
on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow 
both Ant and Maven build jobs to enable scripting tests.  I copied {{js.jar}} 
and {{script-js.jar}} from 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to 
{{/usr/local/openjdk{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests 
under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which 
was skipped (as expected).

I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}.
  
 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-1725) Script based UpdateRequestProcessorFactory

2012-08-06 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429585#comment-13429585
 ] 

Steven Rowe edited comment on SOLR-1725 at 8/6/12 11:45 PM:


On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} on 
the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow 
both Ant and Maven build jobs to enable scripting tests.  I copied {{js.jar}} 
and {{script-js.jar}} from 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to 
{{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests 
under the Maven branch_4x job have succeeded, except for {{testJRuby()}}, which 
was skipped (as expected).

I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}.

  was (Author: steve_rowe):
On IRC, Uwe suggested adding the Rhino jars to {{$JAVA_HOME/jre/lib/ext/}} 
on the FreeBSD ASF Jenkins lucene slave ({{lucene.zones.apache.org}}) to allow 
both Ant and Maven build jobs to enable scripting tests.  I copied {{js.jar}} 
and {{script-js.jar}} from 
{{/usr/home/hudson/tools/java/openjdk-missing-libs/}} to 
{{/usr/local/openjdk\{6,7}/jre/lib/ext/}}, and the {{ScriptEngineTest}} tests 
under the Maven branch_4x job have succeeded, except for{{testJRuby()}}, which 
was skipped (as expected).

I also removed {{js.jar}} and {{script-js.jar}} from {{~hudson/.ant/lib/}}.
  
 Script based UpdateRequestProcessorFactory
 --

 Key: SOLR-1725
 URL: https://issues.apache.org/jira/browse/SOLR-1725
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Uri Boness
Priority: Critical
  Labels: UpdateProcessor
 Fix For: 4.0

 Attachments: SOLR-1725-rev1.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, 
 SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch


 A script based UpdateRequestProcessorFactory (Uses JDK6 script engine 
 support). The main goal of this plugin is to be able to configure/write 
 update processors without the need to write and package Java code.
 The update request processor factory enables writing update processors in 
 scripts located in {{solr.solr.home}} directory. The functory accepts one 
 (mandatory) configuration parameter named {{scripts}} which accepts a 
 comma-separated list of file names. It will look for these files under the 
 {{conf}} directory in solr home. When multiple scripts are defined, their 
 execution order is defined by the lexicographical order of the script file 
 name (so {{scriptA.js}} will be executed before {{scriptB.js}}).
 The script language is resolved based on the script file extension (that is, 
 a *.js files will be treated as a JavaScript script), therefore an extension 
 is mandatory.
 Each script file is expected to have one or more methods with the same 
 signature as the methods in the {{UpdateRequestProcessor}} interface. It is 
 *not* required to define all methods, only those hat are required by the 
 processing logic.
 The following variables are define as global variables for each script:
  * {{req}} - The SolrQueryRequest
  * {{rsp}}- The SolrQueryResponse
  * {{logger}} - A logger that can be used for logging purposes in the script

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.

2012-08-06 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-3647:
---


This was actually a fairly bad bug as brought up on the user list a week or two 
back - the std zk client does not deal with connection loss well, but worse, 
once it's had a connection expiration, you have to make a new client - you 
cannot use the old one. So if the distrib queue zk client ever gets expired, it 
will continually hit expiration exceptions as you try to use it again - so no 
nodes can publish states (other issues too, but thats a big one). This can put 
in in an infinite recovery loop.

 DistrubtedQueue should use our Solr zk client rather than the std zk client.
 

 Key: SOLR-3647
 URL: https://issues.apache.org/jira/browse/SOLR-3647
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 This will let us easily do retries on connection loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.

2012-08-06 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3647:
--

Issue Type: Bug  (was: Improvement)

 DistrubtedQueue should use our Solr zk client rather than the std zk client.
 

 Key: SOLR-3647
 URL: https://issues.apache.org/jira/browse/SOLR-3647
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 This will let us easily do retries on connection loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3647) DistrubtedQueue should use our Solr zk client rather than the std zk client.

2012-08-06 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-3647.
---

Resolution: Fixed

reopened to change from improvement to bug

 DistrubtedQueue should use our Solr zk client rather than the std zk client.
 

 Key: SOLR-3647
 URL: https://issues.apache.org/jira/browse/SOLR-3647
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 This will let us easily do retries on connection loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3716) Make SolrResourceLoaders ClassLoader available as context class loader

2012-08-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429653#comment-13429653
 ] 

Lance Norskog commented on SOLR-3716:
-

Thanks for flushing out another problem in how classpaths work.

I have a small question: how would I add a Java SecurityManager class into this 
mix? I would like to set a security manager object for each core that governs 
the activities of code in that core: loading a 3-megabyte synonym file, loading 
a jar file that calls out to the DHS, whatever. (Why? A hosted Solr business is 
a lot easier if you can run someone's collection configs in a sandbox.)

 Make SolrResourceLoaders ClassLoader available as context class loader
 --

 Key: SOLR-3716
 URL: https://issues.apache.org/jira/browse/SOLR-3716
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Reporter: Uwe Schindler
 Fix For: 4.0, 5.0


 SOLR-1725 and other issues (recent changes to analysis factories and codecs) 
 make it possible to plug in extensions like analyzer factories, codecs, 
 scripting engines or TIKA parsers (TIKA extraction plugin!!!) as SPIs. The 
 current problem (we solved this alreeady for codecs and analyzer factories 
 with a classloader-reload hack: LUCENE-4259) is the following:
 You have to unpack WAR file and repack with the missing JAR files. If you 
 would do it the solr way and put those jars into the $SOLR_HOME/lib folder 
 like plugins, they are not seen. The problem is that plugins loaded by solr 
 are loaded using SolrResourceLoader's classloader (configureable via 
 solrconfig.xml), but as this classloader is not also context classloader, SPI 
 does not look into it, so scripting engines, TIKA plugins, (previously 
 codecs) are not seen.
 We should investigate how to manage setting the context classloader of all 
 threads solr ever sees to point to our own solr classloader.
 When we do this, I also suggest to only ship with TIKA core libs but not 
 tika-parsers and the big dependency hell. TIKA parsers are also loaded via 
 SPI, so user can download the TIKA parser distribution and drop into 
 $SOLR_HOME/lib. By that a user can also use only those extraction plugins 
 really needed. The current solr distribution only consists of mostly useless 
 JAR files (for many users) for Solr Extraction handler. We dont need to ship 
 with all of them, we can just tell the user how to install the needed SPIs. 
 The same for analysis-extras (user only needs to copy morphologic JAR or 
 smartchinese JAR into $SOLR_HOME/lib - this works already!!!). No need for 
 the hull contrib. Scripting engines is the same.
 We should just ship with some scripts (ANT based) to download the JAR files 
 into $SOLR_HOME.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_05) - Build # 125 - Failure!

2012-08-06 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/125/
Java: 64bit/jdk1.7.0_05 -XX:+UseConcMarkSweepGC

1 tests failed.
REGRESSION:  org.apache.solr.spelling.suggest.SuggesterTest.testRebuild

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([A9A31C1A44AB23F5:F286BE5970AB596F]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:486)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:453)
at 
org.apache.solr.spelling.suggest.SuggesterTest.testRebuild(SuggesterTest.java:105)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:818)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:891)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:825)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:671)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:697)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:736)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:747)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)
Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='ac']/int[@name='numFound'][.='2']
xml response was: ?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/int/lstlst name=spellchecklst name=suggestions//lst

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429682#comment-13429682
 ] 

Yonik Seeley commented on SOLR-3684:


bq. Address this default jetty threadpool size of max=10,000. This is the real 
issue.

I had thought that jetty reused a small number of threads - 
O(n_concurrent_connections), regardless of what the max number of threads were?

 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.
  */
 
 SolrIndexAnalyzer() {
   super(new solrFieldReuseStrategy());
   analyzers = analyzerCache();
 }
 protected HashMapString, Analyzer analyzerCache() {
   HashMapString, Analyzer cache = new HashMapString, Analyzer();
   for (SchemaField f : getFields().values()) {
 Analyzer analyzer = f.getType().getAnalyzer();
 cache.put(f.getName(), analyzer);
   }
   return cache;
 }
 @Override
 protected Analyzer getWrappedAnalyzer(String fieldName) {
   Analyzer analyzer = analyzers.get(fieldName);
   return analyzer != null ? analyzer : 
 getDynamicFieldType(fieldName).getAnalyzer();
 }
 @Override
 protected TokenStreamComponents wrapComponents(String fieldName, 
 TokenStreamComponents components) {
   return components;
 }
   }
   private class SolrQueryAnalyzer extends SolrIndexAnalyzer {

[jira] [Commented] (SOLR-3684) Frequently full gc while do pressure index

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429688#comment-13429688
 ] 

Robert Muir commented on SOLR-3684:
---

It does: I think the reuse is not the problem but the max?

By default i think it keeps min threads always (default 10), but our max of 
10,000 allows it to temporarily
spike huge (versus blocking). from looking at the jetty code, by default these 
will die off after 60s, which is fine,
but we enrolled so many entries into e.g. Analyzer's or SegmentReader's 
CloseableThreadlocals, that when they die off
and the CTL does a purge, its just a ton of garbage.

Really there isnt much benefit here in using so many threads at indexing time 
(dwpt's max threads is 8, unless changed
in IndexWriterConfig, and this would have other bad side effects). At query 
time I think something closer to jetty's
default of 254 would actually be better too.

But i looked at the history of this file, and it seems the reason it was set to 
10,000 was to prevent a deadlock (SOLR-683) ?
Is there a better solution to this now so that we can reduce this max?

Separately I've been fixing the analyzers that do hog ram because machines are 
getting more cores, so I think its
worth it. But I think it would be nice if we can fix this max=10,000 

 Frequently full gc while do pressure index
 --

 Key: SOLR-3684
 URL: https://issues.apache.org/jira/browse/SOLR-3684
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Affects Versions: 4.0-ALPHA
 Environment: System: Linux
 Java process: 4G memory
 Jetty: 1000 threads 
 Index: 20 field
 Core: 5
Reporter: Raintung Li
Priority: Critical
  Labels: garbage, performance
 Fix For: 4.0

 Attachments: patch.txt

   Original Estimate: 168h
  Remaining Estimate: 168h

 Recently we test the Solr index throughput and performance, configure the 20 
 fields do test, the field type is normal text_general, start 1000 threads for 
 Jetty, and define 5 cores.
 After test continued for some time, the solr process throughput is down very 
 quickly. After check the root cause, find the java process always do the full 
 GC. 
 Check the heap dump, the main object is StandardTokenizer, it is be saved in 
 the CloseableThreadLocal by IndexSchema.SolrIndexAnalyzer.
 In the Solr, will use the PerFieldReuseStrategy for the default reuse 
 component strategy, that means one field has one own StandardTokenizer if it 
 use standard analyzer,  and standardtokenizer will occur 32KB memory because 
 of zzBuffer char array.
 The worst case: Total memory = live threads*cores*fields*32KB
 In the test case, the memory is 1000*5*20*32KB= 3.2G for StandardTokenizer, 
 and those object only thread die can be released.
 Suggestion:
 Every request only handles by one thread that means one document only 
 analyses by one thread.  For one thread will parse the document’s field step 
 by step, so the same field type can use the same reused component. While 
 thread switches the same type’s field analyzes only reset the same component 
 input stream, it can save a lot of memory for same type’s field.
 Total memory will be = live threads*cores*(different fields types)*32KB
 The source code modifies that it is simple; I can provide the modification 
 patch for IndexSchema.java: 
 private class SolrIndexAnalyzer extends AnalyzerWrapper {
 
   private class SolrFieldReuseStrategy extends ReuseStrategy {
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public TokenStreamComponents getReusableComponents(String 
 fieldName) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   return componentsPerField != null ? 
 componentsPerField.get(analyzers.get(fieldName)) : null;
 }
 /**
  * {@inheritDoc}
  */
 @SuppressWarnings(unchecked)
 public void setReusableComponents(String fieldName, 
 TokenStreamComponents components) {
   MapAnalyzer, TokenStreamComponents componentsPerField = 
 (MapAnalyzer, TokenStreamComponents) getStoredValue();
   if (componentsPerField == null) {
 componentsPerField = new HashMapAnalyzer, 
 TokenStreamComponents();
 setStoredValue(componentsPerField);
   }
   componentsPerField.put(analyzers.get(fieldName), components);
 }
   }
   
 protected final static HashMapString, Analyzer analyzers;
 /**
  * Implementation of {@link ReuseStrategy} that reuses components 
 per-field by
  * maintaining a Map of TokenStreamComponent per field name.

[JENKINS] Lucene-Solr-tests-only-4.x-java7 - Build # 260 - Failure

2012-08-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-4.x-java7/260/

1 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.handler.TestReplicationHandler

Error Message:
ERROR: SolrIndexSearcher opens=76 closes=75

Stack Trace:
java.lang.AssertionError: ERROR: SolrIndexSearcher opens=76 closes=75
at __randomizedtesting.SeedInfo.seed([48D5CDE332603C61]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:216)
at org.apache.solr.SolrTestCaseJ4.afterClass(SolrTestCaseJ4.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1995)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:754)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53)
at 
org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551)




Build Log:
[...truncated 7236 lines...]
[junit4:junit4] Suite: org.apache.solr.handler.TestReplicationHandler
[junit4:junit4] (@BeforeClass output)
[junit4:junit4]   2 7 T46 oejs.Server.doStart jetty-8.1.2.v20120308
[junit4:junit4]   2 12 T46 oejs.AbstractConnector.doStart Started 
SocketConnector@0.0.0.0:42529
[junit4:junit4]   2 13 T46 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
[junit4:junit4]   2 14 T46 oasc.SolrResourceLoader.locateSolrHome using system 
property solr.solr.home: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master
[junit4:junit4]   2 15 T46 oasc.SolrResourceLoader.init new 
SolrResourceLoader for deduced Solr Home: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/'
[junit4:junit4]   2 49 T46 oass.SolrDispatchFilter.init 
SolrDispatchFilter.init()
[junit4:junit4]   2 50 T46 oasc.SolrResourceLoader.locateSolrHome JNDI not 
configured for solr (NoInitialContextEx)
[junit4:junit4]   2 50 T46 oasc.SolrResourceLoader.locateSolrHome using system 
property solr.solr.home: 
./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master
[junit4:junit4]   2 51 T46 oasc.CoreContainer$Initializer.initialize looking 
for solr.xml: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-4.x-java7/checkout/solr/build/solr-core/test/J1/./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/solr.xml
[junit4:junit4]   2 52 T46 oasc.CoreContainer.init New CoreContainer 
451485183
[junit4:junit4]   2 52 T46 oasc.CoreContainer$Initializer.initialize no 
solr.xml file found - using default
[junit4:junit4]   2 53 T46 oasc.CoreContainer.load Loading CoreContainer using 
Solr Home: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/'
[junit4:junit4]   2 53 T46 oasc.SolrResourceLoader.init new 
SolrResourceLoader for directory: 
'./org.apache.solr.handler.TestReplicationHandler$SolrInstance-1344309052357/master/'
[junit4:junit4]   2 88 T46 oasc.CoreContainer.load Registering Log Listener
[junit4:junit4]   2 129 T46 oasc.CoreContainer.create Creating SolrCore 
'collection1' using 

VOTE: 4.0-BETA

2012-08-06 Thread Robert Muir
Artifacts here:
http://people.apache.org/~rmuir/staging_area/lucene-solr-4.0bRC0-rev1370099/

The list of changes since 4.0-ALPHA is pretty large: lots of important
bugs were fixed.

This passes the smoketester (if you use it, you must use python3 now),
so here is my +1. I think we should get it out and iterate towards the
final release.

P.S.: I will clean up JIRA etc as discussed before, so I don't ruin
Hossman's day. If we need to respin we can just move the additional
issues into CHANGES/JIRA section and then respin.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3717) DirectoryFactory.close() is never called

2012-08-06 Thread Hoss Man (JIRA)
Hoss Man created SOLR-3717:
--

 Summary: DirectoryFactory.close() is never called
 Key: SOLR-3717
 URL: https://issues.apache.org/jira/browse/SOLR-3717
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Fix For: 5.0, 4.0


While working on SOLR-3699 i noticed that DirectoryFactory implements Closable 
(and thus: has a close() method) but (unless i'm missing something) never gets 
closed.

I suspect the code that use to close() the DirectoryFactory got refactored into 
oblivion when SolrCoreState was introduced, and reloading a SolrCore started 
reusing the same DirectoryFactory.

it seems like either DirectoryFactory should no longer have a close() method, 
or something at the CoreContainer level should ensure that all 
DirectoryFactories are closed when shuting down

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig

2012-08-06 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3699:
---

Attachment: SOLR-3699.patch

Figured out the problem in my last patch: i was ignorant of the full 
DirectoryFactory API and didn't realize i should be calling doneWithDirectory().

I think this new patch is good to go, but i don't want to commit w/o review 
from someone who understands the DirectoryFactory semantics better (already 
opened SOLR-3717 because something looks wonky about the API, don't want to 
mess up and just fix a symptom here instead of the real problem

 SolrIndexWriter constructor leaks Directory if Exception creating 
 IndexWriterConfig
 ---

 Key: SOLR-3699
 URL: https://issues.apache.org/jira/browse/SOLR-3699
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-3699.patch, SOLR-3699.patch, SOLR-3699.patch


 in LUCENE-4278 i had to add a hack to force SimpleFSDir for 
 CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on 
 certain errors.
 This might indicate a problem that leaks happen if certain errors happen 
 (e.g. not handled in finally)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3699) SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig

2012-08-06 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-3699:
--

Assignee: Mark Miller

Mark: can you sanity check this patch for me?

 SolrIndexWriter constructor leaks Directory if Exception creating 
 IndexWriterConfig
 ---

 Key: SOLR-3699
 URL: https://issues.apache.org/jira/browse/SOLR-3699
 Project: Solr
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-3699.patch, SOLR-3699.patch, SOLR-3699.patch


 in LUCENE-4278 i had to add a hack to force SimpleFSDir for 
 CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on 
 certain errors.
 This might indicate a problem that leaks happen if certain errors happen 
 (e.g. not handled in finally)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Stemming Indonesian in Lucene

2012-08-06 Thread Robert Muir
Hello,

Have you looked at
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/id/IndonesianStemmer.java
?

This uses a different algorithm, but maybe it gives you some ideas:
http://www.illc.uva.nl/Publications/ResearchReports/MoL-2003-02.text.pdf

On Sun, Aug 5, 2012 at 11:37 PM, Emiliana Suci emily_elz...@yahoo.com wrote:
 I am interested in Lucene implement stemming Indonesian. I look at lucene no
 algorithm Nazief and Adriani. I am still a beginner and ask directions to
 implement it.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Stemming-Indonesian-in-Lucene-tp3999321.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4290) basic highlighter that uses postings offsets

2012-08-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429916#comment-13429916
 ] 

Robert Muir commented on LUCENE-4290:
-

I get some improvements here in performance (for non-prox queries) by hacking 
up luceneutil to 
test queries with postingshighlighter+offsets vs fastvectorhighlighter+vectors.

However, I don't think this will be realistically useful until we have the new 
block layout from the pfor branch:
prox queries are hurt by the interleaving in the stream (just like if you use 
payloads), unrelated to highlighting.

I tried to do more experiments like 'wikibig' in luceneutil but i ran out of 
disk space.

Once we have the block layout landed lets revisit this: it gives a much smaller 
index, faster indexing,
and I think will work well when thats sorted out.


 basic highlighter that uses postings offsets
 

 Key: LUCENE-4290
 URL: https://issues.apache.org/jira/browse/LUCENE-4290
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/other
Reporter: Robert Muir
 Attachments: LUCENE-4290.patch


 We added IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS so you can 
 efficiently compress character offsets in the postings list, but nothing yet 
 makes use of this.
 Here is a simple highlighter that uses them: it doesn't have many tests or 
 fancy features, but I think its ok for the sandbox/ (maybe with a couple more 
 tests)
 Additionally I didnt do any benchmarking.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4216) Token X exceeds length of provided text sized X

2012-08-06 Thread Ibrahim (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ibrahim updated LUCENE-4216:


Attachment: ArabicTokenizer.java
ArabicAnalyzer.java

greatly appreciated. it worked out without the low level implementation for 
incrementToken().

 Token X exceeds length of provided text sized X
 ---

 Key: LUCENE-4216
 URL: https://issues.apache.org/jira/browse/LUCENE-4216
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0-ALPHA
 Environment: Windows 7, jdk1.6.0_27
Reporter: Ibrahim
 Attachments: ArabicAnalyzer.java, ArabicTokenizer.java, 
 ArabicTokenizer.java, myApp.zip


 I'm facing this exception:
 org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token رأيكم 
 exceeds length of provided text sized 170
   at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:233)
   at classes.myApp$16$1.run(myApp.java:1508)
 I tried to find anything wrong in my code when i start migrating Lucene 3.6 
 to 4.0 without successful. i found similar issues with HTMLStripCharFilter 
 e.g. LUCENE-3690, LUCENE-2208 but not with SimpleHTMLFormatter so I'm 
 triggering this here to see if there is really a bug or it is something wrong 
 in my code with v4. The code that im using:
 final Highlighter highlighter = new Highlighter(new 
 SimpleHTMLFormatter(font color=red, /font), new QueryScorer(query));
 ...
 final TokenStream tokenStream = 
 TokenSources.getAnyTokenStream(defaultSearcher.getIndexReader(), j, Line, 
 analyzer);
 final TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, 
 doc.get(Line), false, 10);
 Please note that this is working fine with v3.6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org