date:20130619


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687883#comment-13687883
 ] 

Robert Muir commented on LUCENE-4583:
-

good god no.

DocValues are not stored fields... 

This reinforces the value of the limit!

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread selckin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687894#comment-13687894
 ] 

selckin commented on LUCENE-4583:
-

Ok, from the talks i watched on them  other info gathered it seemed like it 
would be a good fit, guess i really missed the point somewhere, can't find much 
info in the javadocs either, but guess this is for the user list and i 
shouldn't pollute this issue

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k


 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4583.
-

Resolution: Not A Problem

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Lukanin updated LUCENE-5030:
--

Attachment: nonlatin_fuzzySuggester.patch

now tests in FuzzySuggesterTest and AnalyzingSuggesterTest pass, except for 
AnalyzingSuggesterTest.testRandom (when preserveSep = true).

If I enable VERBOSE, I see, that suggestions are correct. I guess, there is a 
bug in the test, but I cannot find it.

Can you please review?

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687902#comment-13687902
 ] 

Robert Muir commented on LUCENE-5030:
-

I dont think changing SEP_LABEL from a single byte to 4 bytes is necessarily a 
good idea.

I think benchmarks (size and speed) should be run on this change before we jump 
into it, I'm also concerned about the determinization and shit being in the 
middle of an autosuggest request... this seems like it would be way way too 
slow.

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Estimating Solr memory requirements

2013-06-19 Thread Erick Erickson

OK, I seem to have stalled on this. Over part of the winter, I put
together a Swing-based program to help estimate Solr/Lucene memory
requirements, with all the usual caveats see:
https://github.com/ErickErickson/SolrMemoryEsitmator.

I have notes to myself that it's still deficient in several areas:
FieldValueCache estimates
tlog requirements
Memory required to re-open a searcher
Position and term vector memory requirements
And whatever I haven't thought about yet.

Of course it builds on Grant's spreadsheet (reads steals from it
shamelessly!) I'm hoping to have a friendlier interface. And _of
course_ I'd be willing to donate it to Solr as a util/contrib/whatever
if it fits.

So, what I'm about here is a few things:

 Anyone who wants to try it feel free. The build instructions are at the 
 above, but the short form is to clone it, ant jar and java -jar 
 dist/estimator.jar. Enter some field info and hit the Add/Save button then 
 hit the Dump calcs button to see what it does currently.

It also saves the estimates away in a file and shows all the steps it
goes through to perform the calculations. It'll also make rudimentary
field definitions from the entered data. You can come back to it later
and add to what you've already done.

 Make any improvements you see fit, particular to flesh out the deficiencies 
 listed above.

 Anyone who has, you know, graphic design/Swing skills please feel free to 
 make it better. I'm a newbie as far as using Swing is concerned, and the way 
 I align buttons and checkboxes is pretty hacky. But it works

 Any suggestions anyone wants to make. Suggestions in code are nicest of 
 course, but algorithms for calculating, say, position and tv memory usage 
 would be great as well! Isolated code snippets that I could incorporate would 
 be great too.

 Any info where I've gotten the calculations wrong or don't show enough info 
 to actually figure out whether they're correct or not.

Note that the goal for this is to give a rough idea of memory
requirements and be easy to use. The spreadsheet is a bit daunting to
someone who knows nothing about Solr so this might be an easier way to
get into it.

Thanks,
Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Erick Erickson (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-4939.
--

Resolution: Invalid

Please raise this issue on the user's list first to determine whether it's a 
bona-fide bug, I suspect a configuration error. If it is really a bug, we can 
re-open this.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at 
 oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434)
 at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687)
 at

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters

2013-06-19 Thread Artem Lukanin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687917#comment-13687917
 ] 

Artem Lukanin commented on LUCENE-5030:
---

Possibly we should change it to INFO_SEP2 (U+001E) as Michael suggested for 
TokenStreamToAutomaton?
Do you like 0x10 and 0x10fffe separators in TokenStreamToAutomaton? Won't 
they slow down the process?
I guess, Michael is the man, who runs benchmarks regularly? I don't know, how 
to do it...

 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat


[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687924#comment-13687924
 ] 

Uwe Schindler commented on SOLR-4939:
-

Check your firewall! I think your server may not have TCP access to the 
database server.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at 
 oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434)
 at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687)
 at oracle.net.ns.NSProtocol.connect(NSProtocol.java:247)

[jira] [Created] (SOLR-4940) Cluster crashed for : queries with large page number (OOM)

2013-06-19 Thread Bjoern Ebers (JIRA)

Bjoern Ebers created SOLR-4940:
--

 Summary: Cluster crashed for *:* queries with large page number 
(OOM)
 Key: SOLR-4940
 URL: https://issues.apache.org/jira/browse/SOLR-4940
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: One collection is sharded by 8 high mem machines.
Each shard has one replica (additional 8 machines).
The Solr instances are started with -Xmx16384m -Xms4096m.
The index contains around 230-240 million documents.
All Solr instances are connected to a ZooKeeper ensemble with 5 instances.
Reporter: Bjoern Ebers
Priority: Critical


executing the query on the large index: q=*:*page=1000max=1000
this cause to an OOM and crashed the whole cluster!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4940) Cluster crashed for : queries with large page number (OOM)


[ 
https://issues.apache.org/jira/browse/SOLR-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687938#comment-13687938
 ] 

Uwe Schindler commented on SOLR-4940:
-

see SOLR-1726

The main issue is: full-text search engine are only good in returning 
top-ranking results. If you increase the window of top-ranking results the 
underlying algortithms, which are optimized to do the find top-n fast, will 
need lots of memeory and get slow.

 Cluster crashed for *:* queries with large page number (OOM)
 

 Key: SOLR-4940
 URL: https://issues.apache.org/jira/browse/SOLR-4940
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: One collection is sharded by 8 high mem machines.
 Each shard has one replica (additional 8 machines).
 The Solr instances are started with -Xmx16384m -Xms4096m.
 The index contains around 230-240 million documents.
 All Solr instances are connected to a ZooKeeper ensemble with 5 instances.
Reporter: Bjoern Ebers
Priority: Critical

 executing the query on the large index: q=*:*page=1000max=1000
 this cause to an OOM and crashed the whole cluster!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Looking for community guidance on SOLR-4872

2013-06-19 Thread Benson Margulies

I write to seek guidance from the dev community on SOLR-4872.

This JIRA concerns lifecycle management for Solr schema components:
tokenizers, token filters, and char filters.

If you read the comments, you'll find three opinions from committers. What
follows are précis: read the JIRA to get the details.

Hoss is in favor of having close methods on these components and arranging
to have them called when a schema is torn down. Hoss is opposed to allowing
these objects to be SolrCoreAware.

Yonik is opposed to having such close methods and prefers SolrCoreAware, or
something like it, or letting component implementors use finalizers.

Rob Muir thinks that there should be a fix to the related LUCENE-2145,
which I see as complementary to this.

So, here I am. I'm not a committer. I'm a builder of Solr plugins, and,
from that standpoint, I think that there should be a lifecycle somehow,
because I try to apply a general principle of avoiding finalizers, and
because in some cases their unpredictable schedule can be a practical
problem.

Is there a committer in this community who is willing to work with me on
this? As things are, I can't see how to proceed, since I'm suspended
between two committers with apparently opposed views.

I have already implemented what I think of as the hard part, and, indeed,
the foundation of either approach. I have a close lifecycle that extends
down to the IndexSchema object and the TokenizerChain. So it remains to
decide whether that should in turn call ordinary close methods on the
tokenizers, token filters, and char filters, or rather look for some
optional lifecycle interface.

List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Benson Margulies

A small suggestion: identify the VP on the list of PMC and committers.

[lucene 4.3.1] solr webapp is put to null directory on maven build

Hello,

executing 'package' on Apache Solr Search Server pom
(maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

Apache Maven 3.0.4
OS: Ubuntu 12.04 LTS

Thanks,

Dmitry Kan

Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

also: ${build-directory} is not set anywhere in the project.


On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:

 Hello,

 executing 'package' on Apache Solr Search Server pom
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS

 Thanks,

 Dmitry Kan

[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_21) - Build # 6138 - Still Failing!

2013-06-19 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6138/
Java: 32bit/jdk1.7.0_21 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.index.TestFieldsReader.testExceptions

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0)
at org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196)
at org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343)
at 
org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113)
at 
org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184)
at 
org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863)
at 
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292)
at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3767)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3371)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1887)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1697)
at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650)
at 
org.apache.lucene.index.TestFieldsReader.testExceptions(TestFieldsReader.java:204)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)




Build Log:
[...truncated 355 lines...]
[junit4:junit4] Suite: org.apache.lucene.index.TestFieldsReader
[junit4:junit4]   2 NOTE: reproduce with: ant test  
-Dtestcase=TestFieldsReader -Dtests.method=testExceptions 
-Dtests.seed=A3AC19F388354DBF -Dtests.multiplier=3 -Dtests.slow=true 
-Dtests.locale=mt_MT -Dtests.timezone=Europe/Samara 
-Dtests.file.encoding=ISO-8859-1
[junit4:junit4] ERROR   1.49s J0 | TestFieldsReader.testExceptions 
[junit4:junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space
[junit4:junit4]at 
__randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0)
[junit4:junit4]at 
org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196)
[junit4:junit4]at 
org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184)
[junit4:junit4]at 
org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863)
[junit4:junit4]at 
org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292)
[junit4:junit4]at 
org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318)
[junit4:junit4]at 
org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103)
[junit4:junit4]at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
[junit4:junit4]at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
[junit4:junit4]at 
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
[junit4:junit4]

Re: Reestablishing a Solr node that ran on a completely crashed machine

2013-06-19 Thread Mark Miller


On Jun 19, 2013, at 2:20 AM, Per Steffensen st...@designware.dk wrote:

 On 6/18/13 2:15 PM, Mark Miller wrote:
 I don't know what the best method to use now is, but the slightly longer 
 term plan is to:
 
 * Have a new mode where you cannot preconfigure cores, only use the 
 collection's API.
 * ZK becomes the cluster state truth.
 * The Overseer takes actions to ensure cores live/die in different places 
 based on the truth in ZK.
 Not that we have to decide on this now, but I guess in my scenario I do not 
 see why the Overseer should be involved. The replica is already assigned to 
 run on the replaced machine with a specific IP/hostname (actually a 
 specific Solr node-name), so I guess that the Solr node itself on this 
 new/replaced machine should just go look in ZK when it starts up and realize 
 that it ought to run this and that replica and start loading them itself. I 
 recognize that the Overseer should/could be involved in relocating replica 
 for different reasons - loadbalancing, rack-awareness etc. But in cases where 
 a replica is already assigned to a certain node-name according to ZK state, 
 but the node is not preconfigured (in solr.xml) to run this replica, the node 
 itself should just realize that it ought to run it anyway and load it. But it 
 probably have to be thought through well. Just my immediate thoughts.

Specific node names have since been essentially deprecated - auto assigned 
generic node names are what we have transitioned to. You should easily be able 
to host a shard with a machine that has a different address without confusion. 

By and large, the Overseer will be able too assume responsibility for 
assignments (though I'm sure how much it will do will be configurable) at a 
high level. It will be able to do things like look at maxShardsPerNode and 
replicationFactor and periodically follow rules to make adjustments. 

The Overseer being in charge is more a conceptual idea though, not the 
implementation. When a core starts up and checks with ZK and sees the 
collection it belongs to no longer exists or something, it likely to just not 
load rather than wait for an Overseer to spot and it remove it later.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI

2013-06-19 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-4921:
--

Attachment: SOLR-4921.patch

Patch has the following improvements
# Better Layout
# Result Reporting, including errors
# Various other little fixes

You should be able to submit a variety of document types at this point and see 
the response.

Left to do:
# Icon for Collection drop down
# Wizard implementation
# General cleanup, comments
# File Upload
# Other things I've forgotten

 Support for Adding Documents via the Solr UI
 

 Key: SOLR-4921
 URL: https://issues.apache.org/jira/browse/SOLR-4921
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch


 For demos and prototyping, it would be nice if we could add documents via the 
 admin UI.
 Various things to support:
 1. Uploading XML, JSON, CSV, etc.
 2. Optionally also do file upload

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

After adding:

build-directorytarget/build-directory

the war file is put into the target subdir.


On a side note:

running solr with maven jetty plugin seem to work, which required two
artifacts (couldn't figure out where does jetty store the lib dir in this
mode):

command. mvn jetty:run-war

(configured in the jetty-maven-plugin):

  dependencies
dependency
  groupIdch.qos.logback/groupId
  artifactIdlogback-classic/artifactId
  version1.0.13/version
/dependency
dependency
  groupIdtomcat/groupId
  artifactIdcommons-logging/artifactId
  version4.0.6/version
/dependency
  /dependencies


when starting the webapp, however, solr tries to create a collection1:

17:02:53.108 [coreLoadExecutor-3-thread-1] INFO
 org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using
instanceDir: ${top-level}/solr/example/solr/collection1

Apparently, ${top-level} var isn't defined either.




On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote:

 also: ${build-directory} is not set anywhere in the project.


 On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:

 Hello,

 executing 'package' on Apache Solr Search Server pom
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.

 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS

 Thanks,

 Dmitry Kan

Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Yonik Seeley

On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com wrote:
 A small suggestion: identify the VP on the list of PMC and committers.

Why?
To the outside, this might suggest some sort of specialness that
doesn't exist for day to day development activities.
If someone has business with the PMC, they should email the PMC, not
individuals.

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexWriter commit user data takes a map

2013-06-19 Thread Steve Rowe

Hi Varun,

LUCENE-4575 did not change IW's user data to a Map.  That was done in 
LUCENE-1654.

Steve

On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com wrote:

 I was just curious as to why IW.setCommitData uses a map ?
 
 Looking back at LUCENE-1382 when committing user data was introduced it took 
 a string. 
 
 In LUCENE-4575 it was refactored and changed to a Map. From the comments I 
 couldn't really figure out why was it changed. 
 
 -- 
 
 
 Regards,
 Varun Thacker
 http://www.vthacker.in/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [lucene 4.3.1] solr webapp is put to null directory on maven build

2013-06-19 Thread Steve Rowe

Thanks for reporting, Dmitry, I'll take a look. - Steve

On Jun 19, 2013, at 10:06 AM, Dmitry Kan dmitry.luc...@gmail.com wrote:

 After adding:
 
 build-directorytarget/build-directory
 
 the war file is put into the target subdir.
 
 
 On a side note:
 
 running solr with maven jetty plugin seem to work, which required two 
 artifacts (couldn't figure out where does jetty store the lib dir in this 
 mode):
 
 command. mvn jetty:run-war
 
 (configured in the jetty-maven-plugin):
 
   dependencies
 dependency
   groupIdch.qos.logback/groupId
   artifactIdlogback-classic/artifactId
   version1.0.13/version
 /dependency
 dependency
   groupIdtomcat/groupId
   artifactIdcommons-logging/artifactId
   version4.0.6/version
 /dependency
   /dependencies
 
 
 when starting the webapp, however, solr tries to create a collection1:
 
 17:02:53.108 [coreLoadExecutor-3-thread-1] INFO  
 org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using 
 instanceDir: ${top-level}/solr/example/solr/collection1
 
 Apparently, ${top-level} var isn't defined either.
 
 
 
 
 On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote:
 also: ${build-directory} is not set anywhere in the project.
 
 
 On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote:
 Hello,
 
 executing 'package' on Apache Solr Search Server pom 
 (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory.
 
 Apache Maven 3.0.4
 OS: Ubuntu 12.04 LTS
 
 Thanks,
 
 Dmitry Kan
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Simon Willnauer

+1 on not specially marking it. If you really wanna know you can figure it
out via the asf website. I agree with yonik that the PMC should be
contacted!

simon


On Wed, Jun 19, 2013 at 4:13 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com
 wrote:
  A small suggestion: identify the VP on the list of PMC and committers.

 Why?
 To the outside, this might suggest some sort of specialness that
 doesn't exist for day to day development activities.
 If someone has business with the PMC, they should email the PMC, not
 individuals.

 -Yonik
 http://lucidworks.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: List your chair on https://lucene.apache.org/whoweare.html?

2013-06-19 Thread Mark Miller


On Jun 19, 2013, at 11:01 AM, Simon Willnauer simon.willna...@gmail.com wrote:

 +1 on not specially marking it. 

+1 - I like the way we currently handle this. 

- Mark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI

2013-06-19 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-4921:
--

Attachment: SOLR-4921.patch

Here's a start on file upload.  It kind of works right now if you hit the 
submit button twice (after changing the QT option to /update/extract).  There 
seems to be some oddities with variable bindings for creating the document_url 
based off of the handler path.

 Support for Adding Documents via the Solr UI
 

 Key: SOLR-4921
 URL: https://issues.apache.org/jira/browse/SOLR-4921
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 4.4

 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, 
 SOLR-4921.patch


 For demos and prototyping, it would be nice if we could add documents via the 
 admin UI.
 Various things to support:
 1. Uploading XML, JSON, CSV, etc.
 2. Optionally also do file upload

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

solrj content-length header missing

2013-06-19 Thread Payne, Joe

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add.../add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe



This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

[jira] [Commented] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.


[ 
https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094
 ] 

Mark Miller commented on SOLR-4916:
---

It doesn't greatly affect other parts of Solr, it's not some big experimental 
change, so I intend to first commit to 5x and see how jenkins likes things and 
then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop. I don't suspect that will be easily 
'pluggable', but it will be easy enough to change the ivy files to point to 
another Hadoop distro, fix any compile time errors (if there are any), run the 
tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.



 Add support to write and read Solr index files and transaction log files to 
 and from HDFS.
 --

 Key: SOLR-4916
 URL: https://issues.apache.org/jira/browse/SOLR-4916
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Assignee: Mark Miller
 Attachments: SOLR-4916.patch, SOLR-4916.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

2013-06-19 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688104#comment-13688104
 ] 

Shawn Heisey commented on SOLR-4934:


I was getting ready to file an issue, glad I found this before doing so.  The 
only thing I knew was that LUCENE-5038 had caused Solr to make compound files 
and the useCompoundFile setting under indexConfig that I found in the branch_4x 
example wasn't turning it off.

A connected discussion, for which I can file an issue if necessary: Assuming 
there are plenty of file descriptors available, will a user get better 
performance from compound files or separate files?  Is it dependent on other 
factors like filesystem choice, or is one a clear winner?  The outcome of that 
discussion should decide what Solr's default is when no related config options 
are used.


 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688113#comment-13688113
 ] 

David Smiley commented on LUCENE-4583:
--

Should the closed status and resolution change to not a problem mean that 
[~mikemccand] improvement's in his patch here (that don't change the limit) 
won't get applied?  They looked good to me.  And you?

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5067) add a BaseDirectoryTestCase


[ 
https://issues.apache.org/jira/browse/LUCENE-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688126#comment-13688126
 ] 

Michael McCandless commented on LUCENE-5067:


+1

 add a BaseDirectoryTestCase
 ---

 Key: LUCENE-5067
 URL: https://issues.apache.org/jira/browse/LUCENE-5067
 Project: Lucene - Core
  Issue Type: Test
Reporter: Robert Muir

 Currently most directory code is tested indirectly. But there are still 
 corner cases like LUCENE-5066, NRCachingDirectory.testNoDir, 
 TestRAMDirectory.testSeekToEOFThenBack, that only target specific directories 
 where some user reported the bug. If one of our other directories has these 
 bugs, the best we can hope for is some other lucene test will trip it 
 indirectly and we will find it after lots of debugging...
 Instead we should herd up all these tests into a base class and test every 
 directory explicitly and directly with it (like we do with the codec API).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler

Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688146#comment-13688146
 ] 

Yonik Seeley commented on SOLR-4926:


I hacked the lucene IWC and MergePolicy classes to never use compound format, 
and then started ChaosMonkeySafeLeaderTest tests in a loop.
11 passes in a row so far, so it definitely looks like these failures are 
related to the compound file format.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150
 ] 

Uwe Schindler commented on SOLR-4926:
-

How does this test depend on CFS or not?

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150
 ] 

Uwe Schindler edited comment on SOLR-4926 at 6/19/13 4:53 PM:
--

How does this test depend on CFS or not? So it looks like replication does not 
work correctly with CFS, which is a serious bug!

  was (Author: thetaphi):
How does this test depend on CFS or not?
  
 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

2013-06-19 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688151#comment-13688151
 ] 

Yonik Seeley commented on SOLR-4926:


bq. How does this test depend on CFS or not?

That's the million dollar question :-)  It does not, explicitly, but it seems 
like the use of CFS somehow causes replication to fail.

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: solrj content-length header missing

2013-06-19 Thread Payne, Joe

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

Hi,

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library (http://hc.apache.org/).

What is the problem / error message of nginx?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: solrj content-length header missing

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

[jira] [Comment Edited] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.


[ 
https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094
 ] 

Mark Miller edited comment on SOLR-4916 at 6/19/13 4:59 PM:


It doesn't greatly affect other parts of Solr, it's not some big experimental 
change, so I intend to first commit to 5x and see how jenkins likes things and 
then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop 2.0.5-alpha. I don't suspect that will 
be easily 'pluggable', but it will be easy enough to change the ivy files to 
point to another Hadoop distro, fix any compile time errors (if there are any), 
run the tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.


  was (Author: markrmil...@gmail.com):
It doesn't greatly affect other parts of Solr, it's not some big 
experimental change, so I intend to first commit to 5x and see how jenkins 
likes things and then backport to 4.x.

A lot of the core changes for this have slowly gone into 4.x long ago - 
including issues around making custom Directories first class in Solr and other 
little changes.

This builds to run against Apache Hadoop. I don't suspect that will be easily 
'pluggable', but it will be easy enough to change the ivy files to point to 
another Hadoop distro, fix any compile time errors (if there are any), run the 
tests, and build Solr.

Because our dependency is on client code that talks to hdfs, I suspect that it 
will work fine as is with most distros based on the same version of Apache 
Hadoop - and probably other versions as well in many cases.


  
 Add support to write and read Solr index files and transaction log files to 
 and from HDFS.
 --

 Key: SOLR-4916
 URL: https://issues.apache.org/jira/browse/SOLR-4916
 Project: Solr
  Issue Type: New Feature
Reporter: Mark Miller
Assignee: Mark Miller
 Attachments: SOLR-4916.patch, SOLR-4916.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5006) Simplify / understand IndexWriter/DocumentsWriter synchronization


[ 
https://issues.apache.org/jira/browse/LUCENE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688159#comment-13688159
 ] 

Michael McCandless commented on LUCENE-5006:


+1, thanks Simon!

 Simplify / understand IndexWriter/DocumentsWriter synchronization
 -

 Key: LUCENE-5006
 URL: https://issues.apache.org/jira/browse/LUCENE-5006
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
Assignee: Simon Willnauer
 Attachments: LUCENE-5006.patch, LUCENE-5006.patch


 The concurrency in IW/DW/BD is terrifying: there are many locks involved, not 
 just intrinsic locks but IW also has fullFlushLock, commitLock, and there are 
 no clear rules about lock order to avoid deadlocks like LUCENE-5002.
 We have to somehow simplify this, and define the allowed concurrent behavior 
 eg when an app calls deleteAll while other threads are indexing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688162#comment-13688162
 ] 

Michael McCandless commented on LUCENE-4583:


I still think we should fix the limitation in core; this way apps that want to 
store large binary fields per-doc are able to use a custom DVFormat.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler

See:  http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ 
http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k

2013-06-19 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688163#comment-13688163
 ] 

Yonik Seeley commented on LUCENE-4583:
--

bq. I still think we should fix the limitation in core; this way apps that want 
to store large binary fields per-doc are able to use a custom DVFormat.

+1
arbitrary limits are not a feature.

 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: solrj content-length header missing

2013-06-19 Thread Payne, Joe

Thank you.  I will try that.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 1:07 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

See: http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: RE: solrj content-length header missing

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: RE: solrj content-length header missing

Hi,

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library (http://hc.apache.org/).

What is the problem / error message of nginx?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.dehttp://www.thetaphi.de/
eMail: u...@thetaphi.demailto:u...@thetaphi.de

From: Payne, Joe [mailto:joe.pa...@kroger.com]
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: solrj content-length header missing

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?
Joe

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

RE: solrj content-length header missing

2013-06-19 Thread Uwe Schindler

Reading further, see the following statement:

http://wiki.nginx.org/NginxHttpChunkinModule

 

Status

This module is no longer needed for Nginx 1.3.9+ because since 1.3.9, the Nginx 
core already has built-in support for the chunked request bodies.

And this module is now only maintained for Nginx versions older than 1.3.9.

 

So you could install this module to make it work. The bug is on Nginx side, the 
older versions do not support chunked encoding which is *required* by the 
HTTP/1.1 spec! So clear usability failure.

 

Solr does not know body length without buffering, so cannot send length (see my 
mails before).

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 7:07 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

See:  http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ 
http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 6:59 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

This is happening on version 1.2.7 of Nginx.  Newer versions do not produce 
this error, but getting that updated is another battle.  The error message it 
returns is 411: Length Required.

 

From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 12:29 PM
To: dev@lucene.apache.org
Subject: RE: solrj content-length header missing

 

Hi,

 

POST (or any other request that sends data to HTTP endpoint) always needs the 
length of the body, but there are two options:

-  If you know the length you *may* set it before (this was required in 
HTTP/1.0).

-  HTTP/1.1 added chunked transfer encoding, so the POST data is sent 
as smaller chunks, each with its own length header. This is the preferred way 
to send content, if the size is not known (which is not the case for data sent 
by the solr client library without buffering it completely which has a negative 
impact on response times and memory requirements). Depending on the size of the 
POST data, HttpSolrServer decides internally if it can set content-length (if 
the body is smaller than the buffer size and chunking is not needed) or not. 
This is handled by the underlying HttpClient library ( http://hc.apache.org/ 
http://hc.apache.org/).

 

What is the problem / error message of nginx?

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Payne, Joe [mailto:joe.pa...@kroger.com] 
Sent: Wednesday, June 19, 2013 5:53 PM
To: dev@lucene.apache.org
Subject: solrj content-length header missing

 

We are trying to use Nginx to do load balancing and it does not like that the 
content-length header is missing on a POST with an add…/add document.  I 
looked in the code and did not find anything about setting the header. 
(http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup).
  Are there plans to add the content-length header in future versions?

Joe

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

 

  _  


This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain information that is confidential and 
protected by law from unauthorized disclosure. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply e-mail and destroy all copies of 
the original message.

[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM


[ 
https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688168#comment-13688168
 ] 

Michael McCandless commented on LUCENE-5066:


+1 patch looks good

Maybe we should pull out a public static final MAX_TERM_LENGTH_BYTES
in IndexWriter?  And DWPT references that, and this added assert in
TermBuffer.java uses it too?  Shai needed to use it recently as well...


 TestFieldsReader fails in 4.x with OOM
 --

 Key: LUCENE-5066
 URL: https://issues.apache.org/jira/browse/LUCENE-5066
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5066.patch


 Its FaultyIndexInput is broken (doesn't implement seek/clone correctly).
 This causes it to read bogus data and try to allocate an enormous byte[] for 
 a term.
 The bug was previously hidden:
 FaultyDirectory doesnt override openSlice, so CFS must not be used at flush 
 if you want to trigger the bug.
 FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone 
 to the right place. This causes a disaster with BufferedIndexInput (which it 
 extends), because BufferedIndexInput (not just the delegate) must know its 
 position since it has seek-within-block etc code...
 It seems with this test (very simple one), that only 3.x codec triggers it 
 because its term dict relies upon clone()'s being seek'd to right place. 
 I'm not sure what other codecs rely upon this, but imo we should also add a 
 low-level test for directories that does something like this to ensure its 
 really tested:
 {code}
 dir.createOutput(x);
 dir.openInput(x);
 input.seek(somewhere);
 clone = input.clone();
 assertEquals(somewhere, clone.getFilePointer());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy

[
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179
]

Hoss Man edited comment on SOLR-4934 at 6/19/13 5:25 PM:
-

bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound
files and the useCompoundFile setting under indexConfig that I found in the
branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that. definitely a bug there. I've
opened SOLR-4941 to track that, and we'll leave this issue specifically about
the broken initargs config option.

*EDIT:* fixed issue number

was (Author: hossman):
bq. The only thing I knew was that LUCENE-5038 had caused Solr to make
compound files and the useCompoundFile setting under indexConfig that I found
in the branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that. definitely a bug there. I've
opened SOLR-4926 to track that, and we'll leave this issue specifically about
the broken initargs config option.

Prevent runtime failure if users use initargs useCompoundFile setting on
LogMergePolicy or TieredMergePolicy
--

Key: SOLR-4934
URL: https://issues.apache.org/jira/browse/SOLR-4934
Project: Solr
Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
Fix For: 5.0, 4.4

* LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in
MergePolicies
* existing users may have configs that use mergePolicy init args to try and
call that setter
* we already do some explicit checks for these MergePolices in
SolrIndexConfig to deal with legacy syntax
* update the existing logic to remove useCompoundFile from the MergePolicy
initArgs for these known policies if found, and log a warning.
(NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs
regardless of class in case someone has a custom MergePolicy that implements
that logic -- that would suck)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reassigned SOLR-4941:
--

Assignee: Hoss Man

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man

 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy


[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179
 ] 

Hoss Man commented on SOLR-4934:


bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound 
files and the useCompoundFile setting under indexConfig that I found in the 
branch_4x example wasn't turning it off.

Oh ... hmmm, yeah ... i hadn't noticed that.  definitely a bug there.  I've 
opened SOLR-4926 to track that, and we'll leave this issue specifically about 
the broken initargs config option.



 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

Hoss Man created SOLR-4941:
--

 Summary: useCompoundFile default has changed, simple config option 
no longer seems to work
 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man


Spin off of SOLR-4934.  We should updated tests to ensure that the various ways 
of specifying useCompoundFile as well as the expected default are working 
properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy


 [ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4934.


Resolution: Fixed

merged r1494348 - 4x as r1494696

 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters


[ 
https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688185#comment-13688185
 ] 

Michael McCandless commented on LUCENE-5030:


The easy performance tester to run is
lucene/suggest/src/test/org/apache/lucene/search/suggest/LookupBenchmarkTest.java
... we should test that first I think?  I can also run one based on
FreeDB ... the sources are in luceneutil
(https://code.google.com/a/apache-extras.org/p/luceneutil/ ).

If the perf hit is too much then one option would be to make it
optional (whether we count edits in Unicode space UTF-8 space), or
maybe just another suggester class (FuzzyUnicodeSuggester?).

I think we can use INFO_SEP: yes, this is used for PAYLOAD_SEP, but
that only means the incoming surfaceForm cannot contain this char, I
think?  So ... I think we are free to use it in the analyzed form?  Or
did something go wrong when you tried?

Whichever chars we use (steal), we should add checks that these chars do not
occur in the input...


 FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work 
 correctly for 1-byte (like English) and multi-byte (non-Latin) letters
 

 Key: LUCENE-5030
 URL: https://issues.apache.org/jira/browse/LUCENE-5030
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.3
Reporter: Artem Lukanin
 Attachments: nonlatin_fuzzySuggester1.patch, 
 nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, 
 nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, 
 nonlatin_fuzzySuggester.patch


 There is a limitation in the current FuzzySuggester implementation: it 
 computes edits in UTF-8 space instead of Unicode character (code point) 
 space. 
 This should be fixable: we'd need to fix TokenStreamToAutomaton to work in 
 Unicode character space, then fix FuzzySuggester to do the same steps that 
 FuzzyQuery does: do the LevN expansion in Unicode character space, then 
 convert that automaton to UTF-8, then intersect with the suggest FST.
 See the discussion here: 
 http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy


[ 
https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688190#comment-13688190
 ] 

Uwe Schindler commented on SOLR-4934:
-

bq. Assuming there are plenty of file descriptors available, will a user get 
better performance from compound files or separate files?

Searching on the index will have no negative impact. IndexInputSlicer returns 
optimized indexinputs that remove the whole file offset stuff. Indexing speed 
is identical, too, but merging (done in background) is more expensive.

 Prevent runtime failure if users use initargs useCompoundFile setting on 
 LogMergePolicy or TieredMergePolicy
 --

 Key: SOLR-4934
 URL: https://issues.apache.org/jira/browse/SOLR-4934
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4


 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in 
 MergePolicies
 * existing users may have configs that use mergePolicy init args to try and 
 call that setter
 * we already do some explicit checks for these MergePolices in 
 SolrIndexConfig to deal with legacy syntax
 * update the existing logic to remove useCompoundFile from the MergePolicy 
 initArgs for these known policies if found, and log a warning.
 (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs 
 regardless of class in case someone has a custom MergePolicy that implements 
 that logic -- that would suck)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat

2013-06-19 Thread Subhash Karemore (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688201#comment-13688201
 ] 

Subhash Karemore commented on SOLR-4939:


Hi,

I think you are right. I am not too much familiar with linux environment.  
Could you please tell me exact command for allowing TCP connection so that I 
should able to connect to remote oracle DB using java. I searched lot for this 
problem, however I didn't find the exact command/solution.

I appreciate your help.

Regards,
Subhash



 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at

[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat


[ 
https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688217#comment-13688217
 ] 

Uwe Schindler commented on SOLR-4939:
-

Ask your firewall administrator, we have no idea about your environment and 
cannot help!

A quick test if it works at all is to enter the following on shell (needs 
netcat installed):

{code}
nc hostname_of_oracle_server 2126
{code}

If this also timeouts, ask somebody who knows your network.

 Not able to import oracle DB on RedHat
 --

 Key: SOLR-4939
 URL: https://issues.apache.org/jira/browse/SOLR-4939
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3.1
 Environment: Redhat Linux
Reporter: Subhash Karemore

 I have configured my RedHat system for Solr. After that I started the solr, 
 it is started properly. I have to import the Oracle DB for indexing. My data 
 config file is.
 dataConfig
   dataSource type=JdbcDataSource 
 driver=oracle.jdbc.driver.OracleDriver 
 url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user 
 password=Passwd batchSize=1 /
   document
   entity name=table1 query=SELECT ID, col2, col3 FROM table1 
 WHERE rownum BETWEEN 1 AND 1000 
   field column=ID name=id /
   field column=col2 name=col2 /
   field column=col3 name=col3 /
   /entity
   /document
 /dataConfig
 I have done similar changes for schema.xml file.
 I have copied the solr-dataimporthandler-4.3.0.jar, 
 solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist 
 folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same 
 folder.
 With this setting, it is working properly on Windows. However on RedHat, it 
 is not working. It is giving me errors when I try to index DB.
 Below are the errors which I got on console.
 ERROR org.apache.solr.handler.dataimport.DocBuilder  â Exception while 
 processing: table1 document : 
 SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException:
  Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum 
 BETWEEN 1 AND 1000 Processing Document # 1
 at 
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
 Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter 
 could not establish the connection
 at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458)
 at 
 oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546)
 at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236)
 at 
 oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)
 at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38)
 at 
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
 ... 12 more
 Caused by: oracle.net.ns.NetException: The Network Adapter could not 
 establish the connection
 at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392)
 at

[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


[ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688270#comment-13688270
 ] 

Hoss Man commented on SOLR-4941:


I understand what happend now..

when simon asked on the mailing list for help reviewing the solr changes 
affected by  LUCENE-5038 i didn't fully understand the scope of the change, and 
only focused on how it affected the existing MergePolicy settings (SOLR-4934) 
-- but i only noticed that setUseCompoundFile had been removed from the merge 
policies in facvor us only using the ratio -- i didn't realize that 
setUseCompoundFile was actaully moved to IndexWriterConfig.

i'll work up a patch to make the existing solr settings apply to the 
IndexWriterConfig.

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man

 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: IndexWriter commit user data takes a map

2013-06-19 Thread Varun Thacker

Hi Steve,

Thanks for pointing it out.

I was actually looking at SOLR-2701 when I thought about why have a Map
instead of a string identifier.

So I'm guessing this should be left untouched?



On Wed, Jun 19, 2013 at 7:55 PM, Steve Rowe sar...@gmail.com wrote:

 Hi Varun,

 LUCENE-4575 did not change IW's user data to a Map.  That was done in
 LUCENE-1654.

 Steve

 On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com
 wrote:

  I was just curious as to why IW.setCommitData uses a map ?
 
  Looking back at LUCENE-1382 when committing user data was introduced it
 took a string.
 
  In LUCENE-4575 it was refactored and changed to a Map. From the comments
 I couldn't really figure out why was it changed.
 
  --
 
 
  Regards,
  Varun Thacker
  http://www.vthacker.in/


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 


Regards,
Varun Thacker
http://www.vthacker.in/

[jira] [Commented] (SOLR-1301) Solr + Hadoop

2013-06-19 Thread Alexander Kanarsky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330
]

Alexander Kanarsky commented on SOLR-1301:
--

[~otis], do you mean to use the Solr query result as an MapReduce job input?

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 4.4

Attachments: commons-logging-1.0.4.jar,
commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar,
hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch,
log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch,
SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch,
SOLR-1301.patch, SolrRecordWriter.java

This patch contains a contrib module that provides distributed indexing
(using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is
twofold:
* provide an API that is familiar to Hadoop developers, i.e. that of
OutputFormat
* avoid unnecessary export and (de)serialization of data maintained on HDFS.
SolrOutputFormat consumes data produced by reduce tasks directly, without
storing it in intermediate files. Furthermore, by using an
EmbeddedSolrServer, the indexing task is split into as many parts as there
are reducers, and the data to be indexed is not sent over the network.
Design
--
Key/value pairs produced by reduce tasks are passed to SolrOutputFormat,
which in turn uses SolrRecordWriter to write this data. SolrRecordWriter
instantiates an EmbeddedSolrServer, and it also instantiates an
implementation of SolrDocumentConverter, which is responsible for turning
Hadoop (key, value) into a SolrInputDocument. This data is then added to a
batch, which is periodically submitted to EmbeddedSolrServer. When reduce
task completes, and the OutputFormat is closed, SolrRecordWriter calls
commit() and optimize() on the EmbeddedSolrServer.
The API provides facilities to specify an arbitrary existing solr.home
directory, from which the conf/ and lib/ files will be taken.
This process results in the creation of as many partial Solr home directories
as there were reduce tasks. The output shards are placed in the output
directory on the default filesystem (e.g. HDFS). Such part-N directories
can be used to run N shard servers. Additionally, users can specify the
number of reduce tasks, in particular 1 reduce task, in which case the output
will consist of a single shard.
An example application is provided that processes large CSV files and uses
this API. It uses a custom CSV processing to avoid (de)serialization overhead.
This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this
issue, you should put it in contrib/hadoop/lib.
Note: the development of this patch was sponsored by an anonymous contributor
and approved for release under Apache License.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-1301) Solr + Hadoop

2013-06-19 Thread Alexander Kanarsky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330
]

Alexander Kanarsky edited comment on SOLR-1301 at 6/19/13 7:17 PM:
---

[~otis], do you mean to use the Solr query result as an MapReduce job input?
Also, regarding the SOLR-1045, it is a different approach (in Map phase vs.
Reduce phase- great explanation by Ted is up here:
https://issues.apache.org/jira/browse/SOLR-1301#comment-12828961)

was (Author: kanarsky):
[~otis], do you mean to use the Solr query result as an MapReduce job input?

Solr + Hadoop
-

Key: SOLR-1301
URL: https://issues.apache.org/jira/browse/SOLR-1301
Project: Solr
Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki
Fix For: 4.4

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

Michael McCandless created LUCENE-5069:
--

 Summary: Can/should we store NumericField's precisionStep in the 
index?
 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless


I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
failing to hit the expected docs ... and it was because s/he had indexed with 
precStep=4 but searched with precStep=1.

Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe 
store precStep in FieldInfo, and then fail at search time if you use a 
non-matching precStep?

I think you can index fine and then search on a multiple of that?  E.g., I can 
index with precStep=2 but search with precStep=8?  But indexing with precStep=4 
and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

[
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339
]

Mark Miller commented on SOLR-4926:
---

bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs
to replicate.

16 documents end up on the relica after the replication - 110 short.

The leader is on gen 3, the replica on gen 1.

Perhaps a red herring, but in the many cases of this I've looked at, oddly, no
buffered docs are ever replayed after that - though I have seen buffered docs
replayed in those same runs when the replication did not fail. Weird
observation.

Anyway, I need to turn on more replication level logging I think.

I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
-

Key: SOLR-4926
URL: https://issues.apache.org/jira/browse/SOLR-4926
Project: Solr
Issue Type: Bug
Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
Fix For: 5.0, 4.4

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.

[
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339
]

Mark Miller edited comment on SOLR-4926 at 6/19/13 7:23 PM:

bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs
to replicate.

16 documents end up on the relica after the replication - 110 short.

Before the replication, the leader is on gen 3, the replica on gen 1.

Anyway, I need to turn on more replication level logging I think.

was (Author: markrmil...@gmail.com):
bq. the use of CFS somehow causes replication to fail

Yeah, this is what I'm seeing - I just caught a really good sample case with
decent logging.

The recovering replica commits on the leader and that leader then has 126 docs
to replicate.

16 documents end up on the relica after the replication - 110 short.

The leader is on gen 3, the replica on gen 1.

Anyway, I need to turn on more replication level logging I think.

I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
-

Key: SOLR-4926
URL: https://issues.apache.org/jira/browse/SOLR-4926
Project: Solr
Issue Type: Bug
Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
Fix For: 5.0, 4.4

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

[
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346
]

Uwe Schindler commented on LUCENE-5069:
---

I think we can do this. I had the same in mind, but lots of people were
against for schema reasons (you know, no schema info in index). If we save
precision step we should also save type like we do for stored fields.

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail. The
check on indexing show would be done in the TermsEnum initialization of mtq's
getTermsEnum().

Can/should we store NumericField's precisionStep in the index?
--

Key: LUCENE-5069
URL: https://issues.apache.org/jira/browse/LUCENE-5069
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was
failing to hit the expected docs ... and it was because s/he had indexed with
precStep=4 but searched with precStep=1.
Then we wondered if it'd be possible to somehow catch this, e.g. we could
maybe store precStep in FieldInfo, and then fail at search time if you use a
non-matching precStep?
I think you can index fine and then search on a multiple of that? E.g., I
can index with precStep=2 but search with precStep=8? But indexing with
precStep=4 and searching precStep=1 won't work ...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

[
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346
]

Uwe Schindler edited comment on LUCENE-5069 at 6/19/13 7:30 PM:

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail. The
check on searching would be done in the TermsEnum initialization of mtq's
getTermsEnum().

was (Author: thetaphi):
I think we can do this. I had the same in mind, but lots of people were
against for schema reasons (you know, no schema info in index). If we save
precision step we should also save type like we do for stored fields.

The search works with multiple of original precision step is correct, btw

While indexing, adding a new item with different step should also fail. The
check on indexing show would be done in the TermsEnum initialization of mtq's
getTermsEnum().

Can/should we store NumericField's precisionStep in the index?
--

Key: LUCENE-5069
URL: https://issues.apache.org/jira/browse/LUCENE-5069
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Estimating Solr memory requirements

Hi Erick,

Is typo in the title on purpose?


On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote:

 OK, I seem to have stalled on this. Over part of the winter, I put
 together a Swing-based program to help estimate Solr/Lucene memory
 requirements, with all the usual caveats see:
 https://github.com/ErickErickson/SolrMemoryEsitmator.

 I have notes to myself that it's still deficient in several areas:
 FieldValueCache estimates
 tlog requirements
 Memory required to re-open a searcher
 Position and term vector memory requirements
 And whatever I haven't thought about yet.

 Of course it builds on Grant's spreadsheet (reads steals from it
 shamelessly!) I'm hoping to have a friendlier interface. And _of
 course_ I'd be willing to donate it to Solr as a util/contrib/whatever
 if it fits.

 So, what I'm about here is a few things:

  Anyone who wants to try it feel free. The build instructions are at the
 above, but the short form is to clone it, ant jar and java -jar
 dist/estimator.jar. Enter some field info and hit the Add/Save button
 then hit the Dump calcs button to see what it does currently.

 It also saves the estimates away in a file and shows all the steps it
 goes through to perform the calculations. It'll also make rudimentary
 field definitions from the entered data. You can come back to it later
 and add to what you've already done.

  Make any improvements you see fit, particular to flesh out the
 deficiencies listed above.

  Anyone who has, you know, graphic design/Swing skills please feel free
 to make it better. I'm a newbie as far as using Swing is concerned, and the
 way I align buttons and checkboxes is pretty hacky. But it works

  Any suggestions anyone wants to make. Suggestions in code are nicest of
 course, but algorithms for calculating, say, position and tv memory usage
 would be great as well! Isolated code snippets that I could incorporate
 would be great too.

  Any info where I've gotten the calculations wrong or don't show enough
 info to actually figure out whether they're correct or not.

 Note that the goal for this is to give a rough idea of memory
 requirements and be easy to use. The spreadsheet is a bit daunting to
 someone who knows nothing about Solr so this might be an easier way to
 get into it.

 Thanks,
 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Adrien Grand (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688355#comment-13688355
]

Adrien Grand commented on LUCENE-5069:
--

bq. While indexing, adding a new item with different step should also fail.

+1 This motivation is enough to me to store the precision step in the field
info.

Can/should we store NumericField's precisionStep in the index?
--

Key: LUCENE-5069
URL: https://issues.apache.org/jira/browse/LUCENE-5069
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?


[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688359#comment-13688359
 ] 

Uwe Schindler commented on LUCENE-5069:
---

With this info in FieldInfo we could automatically select the right precision 
step for each atomic reader processed while the query runs. 

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4941:
---

Attachment: infostream.txt
SOLR-4941.patch

Patch that improves the tests and updates the logic added in SOLR-4934 so that 
if there is explicit useCompoundFile configuration as an init arg for a (known) 
MergePolicy we pass that to the IndexWriterConfig's setUseCompoundFile method 
and log a warning instead of just ignoring it.

patch also removes the warnings about the simple legacy useCompoundFile 
syntax since that actually makes sense now that it's a setting on IWC.

I've also updated the tests to inspect the useCompoundFile on the IWC as well 
as checking the results of adding some segments.

there is still a failure in testTieredMergePolicyConfig where (as i understand 
it from talking to mike on IRC) the merged segment after the optimize command 
should *not* be in CFS format because of the noCFSRatio setting -- but the 
merged segment is still in CFS. i've attached the infostream log from running 
ant test -Dtestcase=TestMergePolicyConfig 
-Dtests.method=testTieredMergePolicyConfig to see if it helps illuminate the 
problem ... i suspect it's either a test bug because i still missunderstand 
something about how the MergePolicy settings come into play, or a genuine bug 
in the lower level TieredMP code -- i don't see how it could be specific to the 
solr config parsing logic since the IWC and TMP getters say they got the 
expected settings.

(NOTE: the patch includes a nocommit in solrconfig-mergepolicy.xml to turn off 
the infostream before committing)

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: infostream.txt, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

[
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688393#comment-13688393
]

Robert Muir commented on LUCENE-5069:
-

{quote}
I had the same in mind, but lots of people were against for schema reasons
(you know, no schema info in index). If we save precision step we should also
save type like we do for stored fields.
{quote}

Count me as one of those: I'm worried how the issue has already jumped to this.

Can/should we store NumericField's precisionStep in the index?
--

Key: LUCENE-5069
URL: https://issues.apache.org/jira/browse/LUCENE-5069
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI


 [ 
https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3838:


Attachment: SOLR-3838.patch

Updated Patch, includes the focus on last possible row after deletion-change.

will commit that shortly

 Admin UI - Multiple filter queries are not supported in Query UI
 

 Key: SOLR-3838
 URL: https://issues.apache.org/jira/browse/SOLR-3838
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, 
 SOLR-3838.patch, SOLR-3838.patch


 The Solr Admin Query UI has only a single fq input field, which does not 
 permit the user to enter multiple filter query parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?


[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688405#comment-13688405
 ] 

Robert Muir commented on LUCENE-5069:
-

{quote}
With this info in FieldInfo we could automatically select the right precision 
step for each atomic reader processed while the query runs. 
{quote}

The problem is its too late: QueryParser/Query are independent of readers: so 
they dont know to generate the correct query (e.g. NumericRangeQuery instead of 
TermRangeQuery) in the first place!

So this issue misses the forest for the trees, sorry, -1 to a halfass schema 
that brings all of the problems of a schema and none of the benefits!

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI


 [ 
https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3838.
-

Resolution: Implemented

Committed in ..
trunk: r1494762
branch_4x: r1494763

 Admin UI - Multiple filter queries are not supported in Query UI
 

 Key: SOLR-3838
 URL: https://issues.apache.org/jira/browse/SOLR-3838
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, 
 SOLR-3838.patch, SOLR-3838.patch


 The Solr Admin Query UI has only a single fq input field, which does not 
 permit the user to enter multiple filter query parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4456) Admin UI: Displays dashboard even if Solr is down


 [ 
https://issues.apache.org/jira/browse/SOLR-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4456.
-

   Resolution: Fixed
Fix Version/s: 5.0

committed the current state in
trunk r1494765
branch_4x r1494768

if there a suggestions for tweaking take, please open a new ticket for that

 Admin UI: Displays dashboard even if Solr is down
 -

 Key: SOLR-4456
 URL: https://issues.apache.org/jira/browse/SOLR-4456
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 4.1
Reporter: Jan Høydahl
Assignee: Stefan Matheis (steffkes)
 Fix For: 5.0, 4.4

 Attachments: SOLR-4456.patch, SOLR-4456.patch, SOLR-4456.patch


 1. Run Solr and bruing up the Admin dashboard
 2. Stop Solr
 3. Click around the Admin GUI. It apparently works, but displays a spinning 
 wheel for most panels
 4. Click on Dashboard. An old cached dashboard is displayed
 What should happen is that once connection to Solr is lost, the whole Admin 
 UI displays a large red box CONNECTION LOST or something :) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k


 [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-4583:



 StraightBytesDocValuesField fails if bytes  32k
 

 Key: LUCENE-4583
 URL: https://issues.apache.org/jira/browse/LUCENE-4583
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.0, 4.1, 5.0
Reporter: David Smiley
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
 LUCENE-4583.patch, LUCENE-4583.patch


 I didn't observe any limitations on the size of a bytes based DocValues field 
 value in the docs.  It appears that the limit is 32k, although I didn't get 
 any friendly error telling me that was the limit.  32k is kind of small IMO; 
 I suspect this limit is unintended and as such is a bug.The following 
 test fails:
 {code:java}
   public void testBigDocValue() throws IOException {
 Directory dir = newDirectory();
 IndexWriter writer = new IndexWriter(dir, writerConfig(false));
 Document doc = new Document();
 BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
 bytes.length = bytes.bytes.length;//byte data doesn't matter
 doc.add(new StraightBytesDocValuesField(dvField, bytes));
 writer.addDocument(doc);
 writer.commit();
 writer.close();
 DirectoryReader reader = DirectoryReader.open(dir);
 DocValues docValues = MultiDocValues.getDocValues(reader, dvField);
 //FAILS IF BYTES IS BIG!
 docValues.getSource().getBytes(0, bytes);
 reader.close();
 dir.close();
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688440#comment-13688440
 ] 

Mark Miller commented on SOLR-4926:
---

Reviewing some more sample fails of RecoveryZkTest:

It actually looks like after the replication we end up with one commit point 
back - eg we are trying to replicate gen 3 and replica moves from gen 1 to gen 
2.

- Mark

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


[ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688444#comment-13688444
 ] 

Michael McCandless commented on SOLR-4941:
--

Indeed I can see that TMP has noCFSRatio=0.6, and two segments are flushed  
turned into CFS, then those two segments are merged, and then the merged 
segment is turned into a CFS.

I think this means that the merged segment's files (pre-CFS) are  0.6 the size 
of the two flushed CFS segments ... e.g. maybe the CFS headers of the first 2 
segments are tipping the scale?  Try indexing more docs for each segment maybe?

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: infostream.txt, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4719) Admin UI - Default to wt=json on Query-Screen


 [ 
https://issues.apache.org/jira/browse/SOLR-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4719.
-

   Resolution: Implemented
Fix Version/s: 5.0

committed in 
trunk r1494772
branch_4x r1494774

 Admin UI - Default to wt=json on Query-Screen
 -

 Key: SOLR-4719
 URL: https://issues.apache.org/jira/browse/SOLR-4719
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Fix For: 5.0, 4.4


 I didn't really notice that we're still using {{wt=xml}} as default on the 
 Query-Screen .. i suggest we change that to {{wt=json}} .. it's 2013 =)
 Syntax-Highlight would still work, even if one tries the 
 example-configuration where the content-type is overwritten with text/plain, 
 since it's based on the selection on the left side :)
 Any objections?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3546) Add index page to Admin UI


 [ 
https://issues.apache.org/jira/browse/SOLR-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3546.
-

Resolution: Duplicate
  Assignee: Stefan Matheis (steffkes)

 Add index page to Admin UI
 --

 Key: SOLR-3546
 URL: https://issues.apache.org/jira/browse/SOLR-3546
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Lance Norskog
Assignee: Stefan Matheis (steffkes)
Priority: Minor

 It would be great to index a file by uploading it. In designing schemas and 
 testing features I often make one or two test documents. It would be great to 
 upload these directly from the UI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2440) Schema Browser more user friendly


[ 
https://issues.apache.org/jira/browse/SOLR-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688456#comment-13688456
 ] 

Stefan Matheis (steffkes) commented on SOLR-2440:
-

[~jcodina] WDYT? If it's covered i'm going to close this one

 Schema Browser more user friendly
 -

 Key: SOLR-2440
 URL: https://issues.apache.org/jira/browse/SOLR-2440
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 1.4.1
 Environment: The schema browser of the admin web application
Reporter: Joan Codina
Priority: Minor
  Labels: browser, schema
 Fix For: 4.4

 Attachments: LUCENE_4_schema_jsp.patch, LUCENE_4_screen_css.patch, 
 schema_jsp.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The schema browser has some drawbacks
 * Does not sort the fields (the actual sorting seems arbritrary)
 * Capitalises all field names. Making difficult the match
 * Does not allow a drill down
 This small patch solves the three issues: 
 #  Changes the Css to do not capitalise the links
 #  Sorts the field names
 #  It replaces the tokens by links to a search query with that token
 that's all  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Estimating Solr memory requirements

2013-06-19 Thread Erick Erickson

Nope, never even noticed it until now. That's the right URL though,
typo and all

Someday I may even fix it G...

Thanks,
Erick

On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan dmitry.luc...@gmail.com wrote:
 Hi Erick,

 Is typo in the title on purpose?


 On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote:

 OK, I seem to have stalled on this. Over part of the winter, I put
 together a Swing-based program to help estimate Solr/Lucene memory
 requirements, with all the usual caveats see:
 https://github.com/ErickErickson/SolrMemoryEsitmator.

 I have notes to myself that it's still deficient in several areas:
 FieldValueCache estimates
 tlog requirements
 Memory required to re-open a searcher
 Position and term vector memory requirements
 And whatever I haven't thought about yet.

 Of course it builds on Grant's spreadsheet (reads steals from it
 shamelessly!) I'm hoping to have a friendlier interface. And _of
 course_ I'd be willing to donate it to Solr as a util/contrib/whatever
 if it fits.

 So, what I'm about here is a few things:

  Anyone who wants to try it feel free. The build instructions are at the
  above, but the short form is to clone it, ant jar and java -jar
  dist/estimator.jar. Enter some field info and hit the Add/Save button
  then hit the Dump calcs button to see what it does currently.

 It also saves the estimates away in a file and shows all the steps it
 goes through to perform the calculations. It'll also make rudimentary
 field definitions from the entered data. You can come back to it later
 and add to what you've already done.

  Make any improvements you see fit, particular to flesh out the
  deficiencies listed above.

  Anyone who has, you know, graphic design/Swing skills please feel free
  to make it better. I'm a newbie as far as using Swing is concerned, and the
  way I align buttons and checkboxes is pretty hacky. But it works

  Any suggestions anyone wants to make. Suggestions in code are nicest of
  course, but algorithms for calculating, say, position and tv memory usage
  would be great as well! Isolated code snippets that I could incorporate
  would be great too.

  Any info where I've gotten the calculations wrong or don't show enough
  info to actually figure out whether they're correct or not.

 Note that the goal for this is to give a rough idea of memory
 requirements and be easy to use. The spreadsheet is a bit daunting to
 someone who knows nothing about Solr so this might be an easier way to
 get into it.

 Thanks,
 Erick

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.


[ 
https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688522#comment-13688522
 ] 

Mark Miller commented on SOLR-4926:
---

In the case where the slave is on gen 2, it did just download the files for gen 
3 - so it seems we are not picking up the latest commit point somehow..

 I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
 -

 Key: SOLR-4926
 URL: https://issues.apache.org/jira/browse/SOLR-4926
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0, 4.4




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM


[ 
https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688532#comment-13688532
 ] 

Robert Muir commented on LUCENE-5066:
-

I mentioned this in the email: should we do it here under this issue?

re above: I think we should spin off an issue to improve the codec checks (so 
we get assert fails at least, rather than OOM), i imagine this would be part of 
that issue, but can do it here too.

 TestFieldsReader fails in 4.x with OOM
 --

 Key: LUCENE-5066
 URL: https://issues.apache.org/jira/browse/LUCENE-5066
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5066.patch


 Its FaultyIndexInput is broken (doesn't implement seek/clone correctly).
 This causes it to read bogus data and try to allocate an enormous byte[] for 
 a term.
 The bug was previously hidden:
 FaultyDirectory doesnt override openSlice, so CFS must not be used at flush 
 if you want to trigger the bug.
 FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone 
 to the right place. This causes a disaster with BufferedIndexInput (which it 
 extends), because BufferedIndexInput (not just the delegate) must know its 
 position since it has seek-within-block etc code...
 It seems with this test (very simple one), that only 3.x codec triggers it 
 because its term dict relies upon clone()'s being seek'd to right place. 
 I'm not sure what other codecs rely upon this, but imo we should also add a 
 low-level test for directories that does something like this to ensure its 
 really tested:
 {code}
 dir.createOutput(x);
 dir.openInput(x);
 input.seek(somewhere);
 clone = input.clone();
 assertEquals(somewhere, clone.getFilePointer());
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work

[
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-4941:
---

Attachment: SOLR-4941.patch

bq. maybe the CFS headers of the first 2 segments are tipping the scale? Try
indexing more docs for each segment maybe?

yeah .. i guess i was just naive in considering 0.6 a low enough threshold.

i increased the size of the docs and the number of docs per segment -- and when
that still didn't work i also decreased the ratio to 0.1 and that seemed to do
the trick.

updated patch fixes the test, removes the nocommit, and updates the upgrading
instructions in CHANGES.txt (still need an explicit Bug Fix entry though)

still running more test iters, but i think this is pretty good.

useCompoundFile default has changed, simple config option no longer seems to
work
-

Key: SOLR-4941
URL: https://issues.apache.org/jira/browse/SOLR-4941
Project: Solr
Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch

Spin off of SOLR-4934. We should updated tests to ensure that the various
ways of specifying useCompoundFile as well as the expected default are
working properly after LUCENE-5038

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-4941:
---

Fix Version/s: 4.4
   5.0

 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4

 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?

2013-06-19 Thread Adriano Crestani (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688589#comment-13688589
]

Adriano Crestani commented on LUCENE-5069:
--

Couldn't the standard flexible query parser be used for that? I know you can
configure numeric fields in it before parsing a query. I think there is a wiki
about it, just can't find it, maybe Uwe remembers where it is. For now you can
take a look at TestNumericQueryParser.

Can/should we store NumericField's precisionStep in the index?
--

Key: LUCENE-5069
URL: https://issues.apache.org/jira/browse/LUCENE-5069
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4618) Integrate LucidWorks' Solr Reference Guide with Solr documentation


[ 
https://issues.apache.org/jira/browse/SOLR-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688623#comment-13688623
 ] 

Hoss Man commented on SOLR-4618:


FYI: Things have kind of been in a holding pattern for a while now ... first i 
was waiting for some confirmation from Infra to proceed, then Gavin in Infra 
said he wanted to do a full backup first and be online during the import, then 
after playing jira  irc message tag for a bit (Gavin and i are in diametricly 
opposed timezones) Infra announced that they are upgrading CWIKI to Confluence 
5.x.

I _think_ the current plan is to import the data into the current wiki sometime 
in the next day or so before the upgrade, but it may happen as part of the 
ugprade, or perhaps after the upgrade ... i really don't know.

 Integrate LucidWorks' Solr Reference Guide with Solr documentation
 --

 Key: SOLR-4618
 URL: https://issues.apache.org/jira/browse/SOLR-4618
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.1
Reporter: Cassandra Targett
Assignee: Hoss Man
 Attachments: NewSolrStyle.css, SolrRefGuide4.1-ASF.zip, 
 SolrRefGuide.4.3.zip


 LucidWorks would like to donate the Apache Solr Reference Guide, maintained 
 by LucidWorks tech writers, to the Solr community. It was first produced in 
 2009 as a download-only PDF for Solr 1.4, but since 2011 it has been online 
 at http://docs.lucidworks.com/display/solr/ and updated for Solr 3.x releases 
 and for Solr 4.0 and 4.1.
 I've prepared an XML export from our Confluence installation, which can be 
 easily imported into the Apache Confluence installation by someone with 
 system admin rights. The doc has not yet been updated for 4.2, so it covers 
 Solr 4.1 so far. I'll add some additional technical notes about the export 
 itself in a comment. 
 Since we use Confluence at LucidWorks, I can also offer assistance getting 
 Confluence set up, importing this package into it, or any other help needed 
 for the community to start using this. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?


[ 
https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688631#comment-13688631
 ] 

Robert Muir commented on LUCENE-5069:
-

Sure but then you basically have 2 schemas :)

Alternatively we could argue numericrangequery is something that a QP should 
never generate anyway: instead maybe QP's should only worry about user intent 
and generate RangeQuery, which rewrite()s to the correct type...

My point is we should just think these things thru without introducing 
additional schema-like things into lucene, since we already have enough of them 
(Analyzer configuration for example, is a form of schema, maintained by the 
user).

 Can/should we store NumericField's precisionStep in the index?
 --

 Key: LUCENE-5069
 URL: https://issues.apache.org/jira/browse/LUCENE-5069
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless

 I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was 
 failing to hit the expected docs ... and it was because s/he had indexed with 
 precStep=4 but searched with precStep=1.
 Then we wondered if it'd be possible to somehow catch this, e.g. we could 
 maybe store precStep in FieldInfo, and then fail at search time if you use a 
 non-matching precStep?
 I think you can index fine and then search on a multiple of that?  E.g., I 
 can index with precStep=2 but search with precStep=8?  But indexing with 
 precStep=4 and searching precStep=1 won't work ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #885: POMs out of sync

2013-06-19 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/885/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 is not consistent.  Got 305 from 
http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from 
http://127.0.0.1:63228/g_d/x/collection1

Stack Trace:
java.lang.AssertionError: shard1 is not consistent.  Got 305 from 
http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from 
http://127.0.0.1:63228/g_d/x/collection1
at 
__randomizedtesting.SeedInfo.seed([201755EC8EA7E3B9:A1F1DBF4F9F88385]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1018)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:238)




Build Log:
[...truncated 23632 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work


 [ 
https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-4941.


Resolution: Fixed

Committed revision 1494837.
Committed revision 1494839.


 useCompoundFile default has changed, simple config option no longer seems to 
 work
 -

 Key: SOLR-4941
 URL: https://issues.apache.org/jira/browse/SOLR-4941
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.4

 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch


 Spin off of SOLR-4934.  We should updated tests to ensure that the various 
 ways of specifying useCompoundFile as well as the expected default are 
 working properly after LUCENE-5038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4942) Add more randomized testing of compound file format and random merge policies

Hoss Man created SOLR-4942:
--

 Summary: Add more randomized testing of compound file format and 
random merge policies
 Key: SOLR-4942
 URL: https://issues.apache.org/jira/browse/SOLR-4942
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man


SOLR-4926 seems to have uncovered some sporadic cloud/replication bugs related 
to using compound files.

We should updated SolrTestCaseJ4 and the majority of our test configs to better 
randomize the usage of compound files and merge policies.

Step #1...

* update test configs to use 
{{useCompoundFile${useCompoundFile:false}/useCompoundFile}}
* update SolrTestCaseJ4 to toggle that sys property randomly

Step #2...

* add a new RandomMergePolicy that implements MergePolicy by proxying to 
another instance selected at creation using one of the 
LuceneTestCase.new...MergePolicy methods
* updated test configs to refer to this new MergePolicy
* borrow the tests.shardhandler.randomSeed logic in SolrTestCaseJ4 to give 
our RandomMergePolicy a consistent seed at runtime.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.