[jira] [Closed] (PYLUCENE-25) JCC: NameError: global name 'StringWriter' is not defined occurs when java exception raised
[ https://issues.apache.org/jira/browse/PYLUCENE-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilia Meerovich closed PYLUCENE-25. -- Resolution: Implemented JCC: NameError: global name 'StringWriter' is not defined occurs when java exception raised - Key: PYLUCENE-25 URL: https://issues.apache.org/jira/browse/PYLUCENE-25 Project: PyLucene Issue Type: Bug Reporter: Ilia Meerovich Labels: jcc I used jcc and tried to run generated python code. I noticed that when java exception occurs, python throws NameError exception: NameError: global name 'StringWriter' is not defined It looks like __init__.py needs to adapt to the full names features. I found that somebody already sent an email regards similar failure: http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201302.mbox/%3Calpine.OSX.2.01.1302041320590.1972@yuzu.local%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Reestablishing a Solr node that ran on a completely crashed machine
On 6/18/13 2:15 PM, Mark Miller wrote: I don't know what the best method to use now is, but the slightly longer term plan is to: * Have a new mode where you cannot preconfigure cores, only use the collection's API. * ZK becomes the cluster state truth. * The Overseer takes actions to ensure cores live/die in different places based on the truth in ZK. Not that we have to decide on this now, but I guess in my scenario I do not see why the Overseer should be involved. The replica is already assigned to run on the replaced machine with a specific IP/hostname (actually a specific Solr node-name), so I guess that the Solr node itself on this new/replaced machine should just go look in ZK when it starts up and realize that it ought to run this and that replica and start loading them itself. I recognize that the Overseer should/could be involved in relocating replica for different reasons - loadbalancing, rack-awareness etc. But in cases where a replica is already assigned to a certain node-name according to ZK state, but the node is not preconfigured (in solr.xml) to run this replica, the node itself should just realize that it ought to run it anyway and load it. But it probably have to be thought through well. Just my immediate thoughts. - Mark Regards, Per Steffensen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4792) stop shipping a war in 5.0
[ https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687659#comment-13687659 ] Noble Paul commented on SOLR-4792: -- Thanks Shawn for pointing me to the list. Seriously, I was sleeping at the wheel Mark Miller nicely captured everything I have to say on this subject and I have very little to add . I always wanted Solr to be a standalone app +1 stop shipping a war in 5.0 -- Key: SOLR-4792 URL: https://issues.apache.org/jira/browse/SOLR-4792 Project: Solr Issue Type: Task Components: Build Reporter: Robert Muir Assignee: Robert Muir Fix For: 5.0 Attachments: SOLR-4792.patch see the vote on the developer list. This is the first step: if we stop shipping a war then we are free to do anything we want. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687730#comment-13687730 ] selckin commented on LUCENE-4583: - A few comments up someone asked for a use case, shouldn't something like http://www.elasticsearch.org/guide/reference/mapping/source-field/ be a perfect thing to use BinaryDocValues for? I was trying to store something similar using DiskDocValuesFormat and hit the 32k limit StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5064) Add PagedMutable
[ https://issues.apache.org/jira/browse/LUCENE-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-5064. -- Resolution: Fixed Add PagedMutable Key: LUCENE-5064 URL: https://issues.apache.org/jira/browse/LUCENE-5064 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.4 Attachments: LUCENE-5064.patch In the same way that we now have a PagedGrowableWriter, we could have a PagedMutable which would behave just like PackedInts.Mutable but would support more than 2B values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5006) Simplify / understand IndexWriter/DocumentsWriter synchronization
[ https://issues.apache.org/jira/browse/LUCENE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-5006: Attachment: LUCENE-5006.patch Here is a cleaned-up version of the patch. I removed the accidentally added (leftover) int[] from BytesRefHash that was indeed unintended. I also removed all the leftovers like forcePurge and applyDeletes flags they were still in there from a previous iteration without the Queue. I changed _maybeMerge_ to _hasEvents_ consistently. The changes in DWPT and DWPTThreadPool are mainly due to the fact that I move the creation of DWPT into DW and out of the ThreadPool. The ThreadPool only maintains the ThreadState instances but is not responsible for creating the actual DWPT. DWPT is now not reuseable anymore, yet we never really reused them but if they were initialized and we did a full flush we kept using them with a new DeleteQueue which is gone now. This is nice since DWPT is now solely initialized in its Ctor. This includes the segment name which we obtain from IW when the DWPT is created. This remains the only place where we sync on IW which is done in updateDocument right now. I think this patch is a step into the right direction making this simpler, at the end of the day I'd want to change the lifetime of a DW to be a single flush and replace the entire DW once we flush or reopen. This would make a lot of logic much simpler but I don't want to make this big change at once so maybe we should work to get the current patch into trunk and let it bake in a bit. Simplify / understand IndexWriter/DocumentsWriter synchronization - Key: LUCENE-5006 URL: https://issues.apache.org/jira/browse/LUCENE-5006 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Simon Willnauer Attachments: LUCENE-5006.patch, LUCENE-5006.patch The concurrency in IW/DW/BD is terrifying: there are many locks involved, not just intrinsic locks but IW also has fullFlushLock, commitLock, and there are no clear rules about lock order to avoid deadlocks like LUCENE-5002. We have to somehow simplify this, and define the allowed concurrent behavior eg when an app calls deleteAll while other threads are indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687822#comment-13687822 ] Artem Lukanin commented on LUCENE-5030: --- I see, that some tests in AnalyzingSuggesterTest fail, so I have to look what's wrong... FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Attachments: nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4939) Not able to import oracle DB on RedHat
Subhash Karemore created SOLR-4939: -- Summary: Not able to import oracle DB on RedHat Key: SOLR-4939 URL: https://issues.apache.org/jira/browse/SOLR-4939 Project: Solr Issue Type: Bug Affects Versions: 4.3.1 Environment: Redhat Linux Reporter: Subhash Karemore I have configured my RedHat system for Solr. After that I started the solr, it is started properly. I have to import the Oracle DB for indexing. My data config file is. dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd batchSize=1 / document entity name=table1 query=SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 field column=ID name=id / field column=col2 name=col2 / field column=col3 name=col3 / /entity /document /dataConfig I have done similar changes for schema.xml file. I have copied the solr-dataimporthandler-4.3.0.jar, solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder. With this setting, it is working properly on Windows. However on RedHat, it is not working. It is giving me errors when I try to index DB. Below are the errors which I got on console. ERROR org.apache.solr.handler.dataimport.DocBuilder â Exception while processing: table1 document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458) at oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546) at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 12 more Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392) at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434) at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687) at oracle.net.ns.NSProtocol.connect(NSProtocol.java:247) at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1102) at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:320) ... 21 more Caused by: java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at
IndexWriter commit user data takes a map
I was just curious as to why IW.setCommitData uses a map ? Looking back at LUCENE-1382 when committing user data was introduced it took a string. In LUCENE-4575 it was refactored and changed to a Map. From the comments I couldn't really figure out why was it changed. -- Regards, Varun Thacker http://www.vthacker.in/
[jira] [Created] (LUCENE-5068) QueryParserUtil.escape() does not escape forward slash
Matias Holte created LUCENE-5068: Summary: QueryParserUtil.escape() does not escape forward slash Key: LUCENE-5068 URL: https://issues.apache.org/jira/browse/LUCENE-5068 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.0 Reporter: Matias Holte Priority: Minor QueryParserUtil.escape() and QueryParser.escape() have different implementations. Most important, the former omit escaping forward slash (/). This again caused errors in the queryparser when a query ended with forward slash. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687883#comment-13687883 ] Robert Muir commented on LUCENE-4583: - good god no. DocValues are not stored fields... This reinforces the value of the limit! StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687894#comment-13687894 ] selckin commented on LUCENE-4583: - Ok, from the talks i watched on them other info gathered it seemed like it would be a good fit, guess i really missed the point somewhere, can't find much info in the javadocs either, but guess this is for the user list and i shouldn't pollute this issue StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4583. - Resolution: Not A Problem StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artem Lukanin updated LUCENE-5030: -- Attachment: nonlatin_fuzzySuggester.patch now tests in FuzzySuggesterTest and AnalyzingSuggesterTest pass, except for AnalyzingSuggesterTest.testRandom (when preserveSep = true). If I enable VERBOSE, I see, that suggestions are correct. I guess, there is a bug in the test, but I cannot find it. Can you please review? FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Attachments: nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687902#comment-13687902 ] Robert Muir commented on LUCENE-5030: - I dont think changing SEP_LABEL from a single byte to 4 bytes is necessarily a good idea. I think benchmarks (size and speed) should be run on this change before we jump into it, I'm also concerned about the determinization and shit being in the middle of an autosuggest request... this seems like it would be way way too slow. FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Attachments: nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Estimating Solr memory requirements
OK, I seem to have stalled on this. Over part of the winter, I put together a Swing-based program to help estimate Solr/Lucene memory requirements, with all the usual caveats see: https://github.com/ErickErickson/SolrMemoryEsitmator. I have notes to myself that it's still deficient in several areas: FieldValueCache estimates tlog requirements Memory required to re-open a searcher Position and term vector memory requirements And whatever I haven't thought about yet. Of course it builds on Grant's spreadsheet (reads steals from it shamelessly!) I'm hoping to have a friendlier interface. And _of course_ I'd be willing to donate it to Solr as a util/contrib/whatever if it fits. So, what I'm about here is a few things: Anyone who wants to try it feel free. The build instructions are at the above, but the short form is to clone it, ant jar and java -jar dist/estimator.jar. Enter some field info and hit the Add/Save button then hit the Dump calcs button to see what it does currently. It also saves the estimates away in a file and shows all the steps it goes through to perform the calculations. It'll also make rudimentary field definitions from the entered data. You can come back to it later and add to what you've already done. Make any improvements you see fit, particular to flesh out the deficiencies listed above. Anyone who has, you know, graphic design/Swing skills please feel free to make it better. I'm a newbie as far as using Swing is concerned, and the way I align buttons and checkboxes is pretty hacky. But it works Any suggestions anyone wants to make. Suggestions in code are nicest of course, but algorithms for calculating, say, position and tv memory usage would be great as well! Isolated code snippets that I could incorporate would be great too. Any info where I've gotten the calculations wrong or don't show enough info to actually figure out whether they're correct or not. Note that the goal for this is to give a rough idea of memory requirements and be easy to use. The spreadsheet is a bit daunting to someone who knows nothing about Solr so this might be an easier way to get into it. Thanks, Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4939) Not able to import oracle DB on RedHat
[ https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4939. -- Resolution: Invalid Please raise this issue on the user's list first to determine whether it's a bona-fide bug, I suspect a configuration error. If it is really a bug, we can re-open this. Not able to import oracle DB on RedHat -- Key: SOLR-4939 URL: https://issues.apache.org/jira/browse/SOLR-4939 Project: Solr Issue Type: Bug Affects Versions: 4.3.1 Environment: Redhat Linux Reporter: Subhash Karemore I have configured my RedHat system for Solr. After that I started the solr, it is started properly. I have to import the Oracle DB for indexing. My data config file is. dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd batchSize=1 / document entity name=table1 query=SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 field column=ID name=id / field column=col2 name=col2 / field column=col3 name=col3 / /entity /document /dataConfig I have done similar changes for schema.xml file. I have copied the solr-dataimporthandler-4.3.0.jar, solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder. With this setting, it is working properly on Windows. However on RedHat, it is not working. It is giving me errors when I try to index DB. Below are the errors which I got on console. ERROR org.apache.solr.handler.dataimport.DocBuilder â Exception while processing: table1 document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458) at oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546) at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 12 more Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392) at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434) at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687) at
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687917#comment-13687917 ] Artem Lukanin commented on LUCENE-5030: --- Possibly we should change it to INFO_SEP2 (U+001E) as Michael suggested for TokenStreamToAutomaton? Do you like 0x10 and 0x10fffe separators in TokenStreamToAutomaton? Won't they slow down the process? I guess, Michael is the man, who runs benchmarks regularly? I don't know, how to do it... FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Attachments: nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat
[ https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687924#comment-13687924 ] Uwe Schindler commented on SOLR-4939: - Check your firewall! I think your server may not have TCP access to the database server. Not able to import oracle DB on RedHat -- Key: SOLR-4939 URL: https://issues.apache.org/jira/browse/SOLR-4939 Project: Solr Issue Type: Bug Affects Versions: 4.3.1 Environment: Redhat Linux Reporter: Subhash Karemore I have configured my RedHat system for Solr. After that I started the solr, it is started properly. I have to import the Oracle DB for indexing. My data config file is. dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd batchSize=1 / document entity name=table1 query=SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 field column=ID name=id / field column=col2 name=col2 / field column=col3 name=col3 / /entity /document /dataConfig I have done similar changes for schema.xml file. I have copied the solr-dataimporthandler-4.3.0.jar, solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder. With this setting, it is working properly on Windows. However on RedHat, it is not working. It is giving me errors when I try to index DB. Below are the errors which I got on console. ERROR org.apache.solr.handler.dataimport.DocBuilder â Exception while processing: table1 document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458) at oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546) at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 12 more Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392) at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:434) at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:687) at oracle.net.ns.NSProtocol.connect(NSProtocol.java:247)
[jira] [Created] (SOLR-4940) Cluster crashed for *:* queries with large page number (OOM)
Bjoern Ebers created SOLR-4940: -- Summary: Cluster crashed for *:* queries with large page number (OOM) Key: SOLR-4940 URL: https://issues.apache.org/jira/browse/SOLR-4940 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: One collection is sharded by 8 high mem machines. Each shard has one replica (additional 8 machines). The Solr instances are started with -Xmx16384m -Xms4096m. The index contains around 230-240 million documents. All Solr instances are connected to a ZooKeeper ensemble with 5 instances. Reporter: Bjoern Ebers Priority: Critical executing the query on the large index: q=*:*page=1000max=1000 this cause to an OOM and crashed the whole cluster! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4940) Cluster crashed for *:* queries with large page number (OOM)
[ https://issues.apache.org/jira/browse/SOLR-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687938#comment-13687938 ] Uwe Schindler commented on SOLR-4940: - see SOLR-1726 The main issue is: full-text search engine are only good in returning top-ranking results. If you increase the window of top-ranking results the underlying algortithms, which are optimized to do the find top-n fast, will need lots of memeory and get slow. Cluster crashed for *:* queries with large page number (OOM) Key: SOLR-4940 URL: https://issues.apache.org/jira/browse/SOLR-4940 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: One collection is sharded by 8 high mem machines. Each shard has one replica (additional 8 machines). The Solr instances are started with -Xmx16384m -Xms4096m. The index contains around 230-240 million documents. All Solr instances are connected to a ZooKeeper ensemble with 5 instances. Reporter: Bjoern Ebers Priority: Critical executing the query on the large index: q=*:*page=1000max=1000 this cause to an OOM and crashed the whole cluster! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Looking for community guidance on SOLR-4872
I write to seek guidance from the dev community on SOLR-4872. This JIRA concerns lifecycle management for Solr schema components: tokenizers, token filters, and char filters. If you read the comments, you'll find three opinions from committers. What follows are précis: read the JIRA to get the details. Hoss is in favor of having close methods on these components and arranging to have them called when a schema is torn down. Hoss is opposed to allowing these objects to be SolrCoreAware. Yonik is opposed to having such close methods and prefers SolrCoreAware, or something like it, or letting component implementors use finalizers. Rob Muir thinks that there should be a fix to the related LUCENE-2145, which I see as complementary to this. So, here I am. I'm not a committer. I'm a builder of Solr plugins, and, from that standpoint, I think that there should be a lifecycle somehow, because I try to apply a general principle of avoiding finalizers, and because in some cases their unpredictable schedule can be a practical problem. Is there a committer in this community who is willing to work with me on this? As things are, I can't see how to proceed, since I'm suspended between two committers with apparently opposed views. I have already implemented what I think of as the hard part, and, indeed, the foundation of either approach. I have a close lifecycle that extends down to the IndexSchema object and the TokenizerChain. So it remains to decide whether that should in turn call ordinary close methods on the tokenizers, token filters, and char filters, or rather look for some optional lifecycle interface.
List your chair on https://lucene.apache.org/whoweare.html?
A small suggestion: identify the VP on the list of PMC and committers.
[lucene 4.3.1] solr webapp is put to null directory on maven build
Hello, executing 'package' on Apache Solr Search Server pom (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory. Apache Maven 3.0.4 OS: Ubuntu 12.04 LTS Thanks, Dmitry Kan
Re: [lucene 4.3.1] solr webapp is put to null directory on maven build
also: ${build-directory} is not set anywhere in the project. On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote: Hello, executing 'package' on Apache Solr Search Server pom (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory. Apache Maven 3.0.4 OS: Ubuntu 12.04 LTS Thanks, Dmitry Kan
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_21) - Build # 6138 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6138/ Java: 32bit/jdk1.7.0_21 -server -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestFieldsReader.testExceptions Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0) at org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196) at org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343) at org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113) at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184) at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863) at org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292) at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318) at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103) at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3767) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3371) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1887) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1697) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1650) at org.apache.lucene.index.TestFieldsReader.testExceptions(TestFieldsReader.java:204) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) Build Log: [...truncated 355 lines...] [junit4:junit4] Suite: org.apache.lucene.index.TestFieldsReader [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFieldsReader -Dtests.method=testExceptions -Dtests.seed=A3AC19F388354DBF -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=mt_MT -Dtests.timezone=Europe/Samara -Dtests.file.encoding=ISO-8859-1 [junit4:junit4] ERROR 1.49s J0 | TestFieldsReader.testExceptions [junit4:junit4] Throwable #1: java.lang.OutOfMemoryError: Java heap space [junit4:junit4]at __randomizedtesting.SeedInfo.seed([A3AC19F388354DBF:D5AD4B5B20483309]:0) [junit4:junit4]at org.apache.lucene.util.BytesRef.copyBytes(BytesRef.java:196) [junit4:junit4]at org.apache.lucene.util.BytesRef.deepCopyOf(BytesRef.java:343) [junit4:junit4]at org.apache.lucene.codecs.lucene3x.TermBuffer.toTerm(TermBuffer.java:113) [junit4:junit4]at org.apache.lucene.codecs.lucene3x.SegmentTermEnum.term(SegmentTermEnum.java:184) [junit4:junit4]at org.apache.lucene.codecs.lucene3x.Lucene3xFields$PreTermsEnum.next(Lucene3xFields.java:863) [junit4:junit4]at org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292) [junit4:junit4]at org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318) [junit4:junit4]at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:103) [junit4:junit4]at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72) [junit4:junit4]at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365) [junit4:junit4]at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98) [junit4:junit4]
Re: Reestablishing a Solr node that ran on a completely crashed machine
On Jun 19, 2013, at 2:20 AM, Per Steffensen st...@designware.dk wrote: On 6/18/13 2:15 PM, Mark Miller wrote: I don't know what the best method to use now is, but the slightly longer term plan is to: * Have a new mode where you cannot preconfigure cores, only use the collection's API. * ZK becomes the cluster state truth. * The Overseer takes actions to ensure cores live/die in different places based on the truth in ZK. Not that we have to decide on this now, but I guess in my scenario I do not see why the Overseer should be involved. The replica is already assigned to run on the replaced machine with a specific IP/hostname (actually a specific Solr node-name), so I guess that the Solr node itself on this new/replaced machine should just go look in ZK when it starts up and realize that it ought to run this and that replica and start loading them itself. I recognize that the Overseer should/could be involved in relocating replica for different reasons - loadbalancing, rack-awareness etc. But in cases where a replica is already assigned to a certain node-name according to ZK state, but the node is not preconfigured (in solr.xml) to run this replica, the node itself should just realize that it ought to run it anyway and load it. But it probably have to be thought through well. Just my immediate thoughts. Specific node names have since been essentially deprecated - auto assigned generic node names are what we have transitioned to. You should easily be able to host a shard with a machine that has a different address without confusion. By and large, the Overseer will be able too assume responsibility for assignments (though I'm sure how much it will do will be configurable) at a high level. It will be able to do things like look at maxShardsPerNode and replicationFactor and periodically follow rules to make adjustments. The Overseer being in charge is more a conceptual idea though, not the implementation. When a core starts up and checks with ZK and sees the collection it belongs to no longer exists or something, it likely to just not load rather than wait for an Overseer to spot and it remove it later. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI
[ https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-4921: -- Attachment: SOLR-4921.patch Patch has the following improvements # Better Layout # Result Reporting, including errors # Various other little fixes You should be able to submit a variety of document types at this point and see the response. Left to do: # Icon for Collection drop down # Wizard implementation # General cleanup, comments # File Upload # Other things I've forgotten Support for Adding Documents via the Solr UI Key: SOLR-4921 URL: https://issues.apache.org/jira/browse/SOLR-4921 Project: Solr Issue Type: New Feature Components: web gui Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.4 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch For demos and prototyping, it would be nice if we could add documents via the admin UI. Various things to support: 1. Uploading XML, JSON, CSV, etc. 2. Optionally also do file upload -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [lucene 4.3.1] solr webapp is put to null directory on maven build
After adding: build-directorytarget/build-directory the war file is put into the target subdir. On a side note: running solr with maven jetty plugin seem to work, which required two artifacts (couldn't figure out where does jetty store the lib dir in this mode): command. mvn jetty:run-war (configured in the jetty-maven-plugin): dependencies dependency groupIdch.qos.logback/groupId artifactIdlogback-classic/artifactId version1.0.13/version /dependency dependency groupIdtomcat/groupId artifactIdcommons-logging/artifactId version4.0.6/version /dependency /dependencies when starting the webapp, however, solr tries to create a collection1: 17:02:53.108 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using instanceDir: ${top-level}/solr/example/solr/collection1 Apparently, ${top-level} var isn't defined either. On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote: also: ${build-directory} is not set anywhere in the project. On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote: Hello, executing 'package' on Apache Solr Search Server pom (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory. Apache Maven 3.0.4 OS: Ubuntu 12.04 LTS Thanks, Dmitry Kan
Re: List your chair on https://lucene.apache.org/whoweare.html?
On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com wrote: A small suggestion: identify the VP on the list of PMC and committers. Why? To the outside, this might suggest some sort of specialness that doesn't exist for day to day development activities. If someone has business with the PMC, they should email the PMC, not individuals. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter commit user data takes a map
Hi Varun, LUCENE-4575 did not change IW's user data to a Map. That was done in LUCENE-1654. Steve On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com wrote: I was just curious as to why IW.setCommitData uses a map ? Looking back at LUCENE-1382 when committing user data was introduced it took a string. In LUCENE-4575 it was refactored and changed to a Map. From the comments I couldn't really figure out why was it changed. -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [lucene 4.3.1] solr webapp is put to null directory on maven build
Thanks for reporting, Dmitry, I'll take a look. - Steve On Jun 19, 2013, at 10:06 AM, Dmitry Kan dmitry.luc...@gmail.com wrote: After adding: build-directorytarget/build-directory the war file is put into the target subdir. On a side note: running solr with maven jetty plugin seem to work, which required two artifacts (couldn't figure out where does jetty store the lib dir in this mode): command. mvn jetty:run-war (configured in the jetty-maven-plugin): dependencies dependency groupIdch.qos.logback/groupId artifactIdlogback-classic/artifactId version1.0.13/version /dependency dependency groupIdtomcat/groupId artifactIdcommons-logging/artifactId version4.0.6/version /dependency /dependencies when starting the webapp, however, solr tries to create a collection1: 17:02:53.108 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using instanceDir: ${top-level}/solr/example/solr/collection1 Apparently, ${top-level} var isn't defined either. On 19 June 2013 16:25, Dmitry Kan dmitry.luc...@gmail.com wrote: also: ${build-directory} is not set anywhere in the project. On 19 June 2013 16:23, Dmitry Kan dmitry.luc...@gmail.com wrote: Hello, executing 'package' on Apache Solr Search Server pom (maven-build/solr/webapp/pom.xml) puts the webapp into a null sub-directory. Apache Maven 3.0.4 OS: Ubuntu 12.04 LTS Thanks, Dmitry Kan - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: List your chair on https://lucene.apache.org/whoweare.html?
+1 on not specially marking it. If you really wanna know you can figure it out via the asf website. I agree with yonik that the PMC should be contacted! simon On Wed, Jun 19, 2013 at 4:13 PM, Yonik Seeley yo...@lucidworks.com wrote: On Wed, Jun 19, 2013 at 8:56 AM, Benson Margulies bimargul...@gmail.com wrote: A small suggestion: identify the VP on the list of PMC and committers. Why? To the outside, this might suggest some sort of specialness that doesn't exist for day to day development activities. If someone has business with the PMC, they should email the PMC, not individuals. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: List your chair on https://lucene.apache.org/whoweare.html?
On Jun 19, 2013, at 11:01 AM, Simon Willnauer simon.willna...@gmail.com wrote: +1 on not specially marking it. +1 - I like the way we currently handle this. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4921) Support for Adding Documents via the Solr UI
[ https://issues.apache.org/jira/browse/SOLR-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-4921: -- Attachment: SOLR-4921.patch Here's a start on file upload. It kind of works right now if you hit the submit button twice (after changing the QT option to /update/extract). There seems to be some oddities with variable bindings for creating the document_url based off of the handler path. Support for Adding Documents via the Solr UI Key: SOLR-4921 URL: https://issues.apache.org/jira/browse/SOLR-4921 Project: Solr Issue Type: New Feature Components: web gui Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.4 Attachments: SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch, SOLR-4921.patch For demos and prototyping, it would be nice if we could add documents via the admin UI. Various things to support: 1. Uploading XML, JSON, CSV, etc. 2. Optionally also do file upload -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
solrj content-length header missing
We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add.../add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
[jira] [Commented] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.
[ https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094 ] Mark Miller commented on SOLR-4916: --- It doesn't greatly affect other parts of Solr, it's not some big experimental change, so I intend to first commit to 5x and see how jenkins likes things and then backport to 4.x. A lot of the core changes for this have slowly gone into 4.x long ago - including issues around making custom Directories first class in Solr and other little changes. This builds to run against Apache Hadoop. I don't suspect that will be easily 'pluggable', but it will be easy enough to change the ivy files to point to another Hadoop distro, fix any compile time errors (if there are any), run the tests, and build Solr. Because our dependency is on client code that talks to hdfs, I suspect that it will work fine as is with most distros based on the same version of Apache Hadoop - and probably other versions as well in many cases. Add support to write and read Solr index files and transaction log files to and from HDFS. -- Key: SOLR-4916 URL: https://issues.apache.org/jira/browse/SOLR-4916 Project: Solr Issue Type: New Feature Reporter: Mark Miller Assignee: Mark Miller Attachments: SOLR-4916.patch, SOLR-4916.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy
[ https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688104#comment-13688104 ] Shawn Heisey commented on SOLR-4934: I was getting ready to file an issue, glad I found this before doing so. The only thing I knew was that LUCENE-5038 had caused Solr to make compound files and the useCompoundFile setting under indexConfig that I found in the branch_4x example wasn't turning it off. A connected discussion, for which I can file an issue if necessary: Assuming there are plenty of file descriptors available, will a user get better performance from compound files or separate files? Is it dependent on other factors like filesystem choice, or is one a clear winner? The outcome of that discussion should decide what Solr's default is when no related config options are used. Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy -- Key: SOLR-4934 URL: https://issues.apache.org/jira/browse/SOLR-4934 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in MergePolicies * existing users may have configs that use mergePolicy init args to try and call that setter * we already do some explicit checks for these MergePolices in SolrIndexConfig to deal with legacy syntax * update the existing logic to remove useCompoundFile from the MergePolicy initArgs for these known policies if found, and log a warning. (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs regardless of class in case someone has a custom MergePolicy that implements that logic -- that would suck) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688113#comment-13688113 ] David Smiley commented on LUCENE-4583: -- Should the closed status and resolution change to not a problem mean that [~mikemccand] improvement's in his patch here (that don't change the limit) won't get applied? They looked good to me. And you? StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5067) add a BaseDirectoryTestCase
[ https://issues.apache.org/jira/browse/LUCENE-5067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688126#comment-13688126 ] Michael McCandless commented on LUCENE-5067: +1 add a BaseDirectoryTestCase --- Key: LUCENE-5067 URL: https://issues.apache.org/jira/browse/LUCENE-5067 Project: Lucene - Core Issue Type: Test Reporter: Robert Muir Currently most directory code is tested indirectly. But there are still corner cases like LUCENE-5066, NRCachingDirectory.testNoDir, TestRAMDirectory.testSeekToEOFThenBack, that only target specific directories where some user reported the bug. If one of our other directories has these bugs, the best we can hope for is some other lucene test will trip it indirectly and we will find it after lots of debugging... Instead we should herd up all these tests into a base class and test every directory explicitly and directly with it (like we do with the codec API). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: solrj content-length header missing
Hi, POST (or any other request that sends data to HTTP endpoint) always needs the length of the body, but there are two options: - If you know the length you *may* set it before (this was required in HTTP/1.0). - HTTP/1.1 added chunked transfer encoding, so the POST data is sent as smaller chunks, each with its own length header. This is the preferred way to send content, if the size is not known (which is not the case for data sent by the solr client library without buffering it completely which has a negative impact on response times and memory requirements). Depending on the size of the POST data, HttpSolrServer decides internally if it can set content-length (if the body is smaller than the buffer size and chunking is not needed) or not. This is handled by the underlying HttpClient library ( http://hc.apache.org/ http://hc.apache.org/). What is the problem / error message of nginx? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 5:53 PM To: dev@lucene.apache.org Subject: solrj content-length header missing We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add…/add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe _ This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688146#comment-13688146 ] Yonik Seeley commented on SOLR-4926: I hacked the lucene IWC and MergePolicy classes to never use compound format, and then started ChaosMonkeySafeLeaderTest tests in a loop. 11 passes in a row so far, so it definitely looks like these failures are related to the compound file format. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150 ] Uwe Schindler commented on SOLR-4926: - How does this test depend on CFS or not? I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688150#comment-13688150 ] Uwe Schindler edited comment on SOLR-4926 at 6/19/13 4:53 PM: -- How does this test depend on CFS or not? So it looks like replication does not work correctly with CFS, which is a serious bug! was (Author: thetaphi): How does this test depend on CFS or not? I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688151#comment-13688151 ] Yonik Seeley commented on SOLR-4926: bq. How does this test depend on CFS or not? That's the million dollar question :-) It does not, explicitly, but it seems like the use of CFS somehow causes replication to fail. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: solrj content-length header missing
This is happening on version 1.2.7 of Nginx. Newer versions do not produce this error, but getting that updated is another battle. The error message it returns is 411: Length Required. From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 12:29 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing Hi, POST (or any other request that sends data to HTTP endpoint) always needs the length of the body, but there are two options: - If you know the length you *may* set it before (this was required in HTTP/1.0). - HTTP/1.1 added chunked transfer encoding, so the POST data is sent as smaller chunks, each with its own length header. This is the preferred way to send content, if the size is not known (which is not the case for data sent by the solr client library without buffering it completely which has a negative impact on response times and memory requirements). Depending on the size of the POST data, HttpSolrServer decides internally if it can set content-length (if the body is smaller than the buffer size and chunking is not needed) or not. This is handled by the underlying HttpClient library (http://hc.apache.org/). What is the problem / error message of nginx? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.dehttp://www.thetaphi.de/ eMail: u...@thetaphi.demailto:u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 5:53 PM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: solrj content-length header missing We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add…/add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
[jira] [Comment Edited] (SOLR-4916) Add support to write and read Solr index files and transaction log files to and from HDFS.
[ https://issues.apache.org/jira/browse/SOLR-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688094#comment-13688094 ] Mark Miller edited comment on SOLR-4916 at 6/19/13 4:59 PM: It doesn't greatly affect other parts of Solr, it's not some big experimental change, so I intend to first commit to 5x and see how jenkins likes things and then backport to 4.x. A lot of the core changes for this have slowly gone into 4.x long ago - including issues around making custom Directories first class in Solr and other little changes. This builds to run against Apache Hadoop 2.0.5-alpha. I don't suspect that will be easily 'pluggable', but it will be easy enough to change the ivy files to point to another Hadoop distro, fix any compile time errors (if there are any), run the tests, and build Solr. Because our dependency is on client code that talks to hdfs, I suspect that it will work fine as is with most distros based on the same version of Apache Hadoop - and probably other versions as well in many cases. was (Author: markrmil...@gmail.com): It doesn't greatly affect other parts of Solr, it's not some big experimental change, so I intend to first commit to 5x and see how jenkins likes things and then backport to 4.x. A lot of the core changes for this have slowly gone into 4.x long ago - including issues around making custom Directories first class in Solr and other little changes. This builds to run against Apache Hadoop. I don't suspect that will be easily 'pluggable', but it will be easy enough to change the ivy files to point to another Hadoop distro, fix any compile time errors (if there are any), run the tests, and build Solr. Because our dependency is on client code that talks to hdfs, I suspect that it will work fine as is with most distros based on the same version of Apache Hadoop - and probably other versions as well in many cases. Add support to write and read Solr index files and transaction log files to and from HDFS. -- Key: SOLR-4916 URL: https://issues.apache.org/jira/browse/SOLR-4916 Project: Solr Issue Type: New Feature Reporter: Mark Miller Assignee: Mark Miller Attachments: SOLR-4916.patch, SOLR-4916.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5006) Simplify / understand IndexWriter/DocumentsWriter synchronization
[ https://issues.apache.org/jira/browse/LUCENE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688159#comment-13688159 ] Michael McCandless commented on LUCENE-5006: +1, thanks Simon! Simplify / understand IndexWriter/DocumentsWriter synchronization - Key: LUCENE-5006 URL: https://issues.apache.org/jira/browse/LUCENE-5006 Project: Lucene - Core Issue Type: Bug Reporter: Michael McCandless Assignee: Simon Willnauer Attachments: LUCENE-5006.patch, LUCENE-5006.patch The concurrency in IW/DW/BD is terrifying: there are many locks involved, not just intrinsic locks but IW also has fullFlushLock, commitLock, and there are no clear rules about lock order to avoid deadlocks like LUCENE-5002. We have to somehow simplify this, and define the allowed concurrent behavior eg when an app calls deleteAll while other threads are indexing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688162#comment-13688162 ] Michael McCandless commented on LUCENE-4583: I still think we should fix the limitation in core; this way apps that want to store large binary fields per-doc are able to use a custom DVFormat. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: solrj content-length header missing
See: http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 6:59 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing This is happening on version 1.2.7 of Nginx. Newer versions do not produce this error, but getting that updated is another battle. The error message it returns is 411: Length Required. From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 12:29 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing Hi, POST (or any other request that sends data to HTTP endpoint) always needs the length of the body, but there are two options: - If you know the length you *may* set it before (this was required in HTTP/1.0). - HTTP/1.1 added chunked transfer encoding, so the POST data is sent as smaller chunks, each with its own length header. This is the preferred way to send content, if the size is not known (which is not the case for data sent by the solr client library without buffering it completely which has a negative impact on response times and memory requirements). Depending on the size of the POST data, HttpSolrServer decides internally if it can set content-length (if the body is smaller than the buffer size and chunking is not needed) or not. This is handled by the underlying HttpClient library ( http://hc.apache.org/ http://hc.apache.org/). What is the problem / error message of nginx? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 5:53 PM To: dev@lucene.apache.org Subject: solrj content-length header missing We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add…/add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe _ This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _ This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688163#comment-13688163 ] Yonik Seeley commented on LUCENE-4583: -- bq. I still think we should fix the limitation in core; this way apps that want to store large binary fields per-doc are able to use a custom DVFormat. +1 arbitrary limits are not a feature. StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: solrj content-length header missing
Thank you. I will try that. From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 1:07 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing See: http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.dehttp://www.thetaphi.de/ eMail: u...@thetaphi.demailto:u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 6:59 PM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: RE: solrj content-length header missing This is happening on version 1.2.7 of Nginx. Newer versions do not produce this error, but getting that updated is another battle. The error message it returns is 411: Length Required. From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 12:29 PM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: RE: solrj content-length header missing Hi, POST (or any other request that sends data to HTTP endpoint) always needs the length of the body, but there are two options: - If you know the length you *may* set it before (this was required in HTTP/1.0). - HTTP/1.1 added chunked transfer encoding, so the POST data is sent as smaller chunks, each with its own length header. This is the preferred way to send content, if the size is not known (which is not the case for data sent by the solr client library without buffering it completely which has a negative impact on response times and memory requirements). Depending on the size of the POST data, HttpSolrServer decides internally if it can set content-length (if the body is smaller than the buffer size and chunking is not needed) or not. This is handled by the underlying HttpClient library (http://hc.apache.org/). What is the problem / error message of nginx? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.dehttp://www.thetaphi.de/ eMail: u...@thetaphi.demailto:u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 5:53 PM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: solrj content-length header missing We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add…/add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
RE: solrj content-length header missing
Reading further, see the following statement: http://wiki.nginx.org/NginxHttpChunkinModule Status This module is no longer needed for Nginx 1.3.9+ because since 1.3.9, the Nginx core already has built-in support for the chunked request bodies. And this module is now only maintained for Nginx versions older than 1.3.9. So you could install this module to make it work. The bug is on Nginx side, the older versions do not support chunked encoding which is *required* by the HTTP/1.1 spec! So clear usability failure. Solr does not know body length without buffering, so cannot send length (see my mails before). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 7:07 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing See: http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ http://www.lamnk.com/blog/computer/fix-nginx-411-length-required-error/ - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 6:59 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing This is happening on version 1.2.7 of Nginx. Newer versions do not produce this error, but getting that updated is another battle. The error message it returns is 411: Length Required. From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 12:29 PM To: dev@lucene.apache.org Subject: RE: solrj content-length header missing Hi, POST (or any other request that sends data to HTTP endpoint) always needs the length of the body, but there are two options: - If you know the length you *may* set it before (this was required in HTTP/1.0). - HTTP/1.1 added chunked transfer encoding, so the POST data is sent as smaller chunks, each with its own length header. This is the preferred way to send content, if the size is not known (which is not the case for data sent by the solr client library without buffering it completely which has a negative impact on response times and memory requirements). Depending on the size of the POST data, HttpSolrServer decides internally if it can set content-length (if the body is smaller than the buffer size and chunking is not needed) or not. This is handled by the underlying HttpClient library ( http://hc.apache.org/ http://hc.apache.org/). What is the problem / error message of nginx? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Payne, Joe [mailto:joe.pa...@kroger.com] Sent: Wednesday, June 19, 2013 5:53 PM To: dev@lucene.apache.org Subject: solrj content-length header missing We are trying to use Nginx to do load balancing and it does not like that the content-length header is missing on a POST with an add…/add document. I looked in the code and did not find anything about setting the header. (http://svn.apache.org/viewvc/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java?view=markup). Are there plans to add the content-length header in future versions? Joe _ This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. _ This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is confidential and protected by law from unauthorized disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM
[ https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688168#comment-13688168 ] Michael McCandless commented on LUCENE-5066: +1 patch looks good Maybe we should pull out a public static final MAX_TERM_LENGTH_BYTES in IndexWriter? And DWPT references that, and this added assert in TermBuffer.java uses it too? Shai needed to use it recently as well... TestFieldsReader fails in 4.x with OOM -- Key: LUCENE-5066 URL: https://issues.apache.org/jira/browse/LUCENE-5066 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5066.patch Its FaultyIndexInput is broken (doesn't implement seek/clone correctly). This causes it to read bogus data and try to allocate an enormous byte[] for a term. The bug was previously hidden: FaultyDirectory doesnt override openSlice, so CFS must not be used at flush if you want to trigger the bug. FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone to the right place. This causes a disaster with BufferedIndexInput (which it extends), because BufferedIndexInput (not just the delegate) must know its position since it has seek-within-block etc code... It seems with this test (very simple one), that only 3.x codec triggers it because its term dict relies upon clone()'s being seek'd to right place. I'm not sure what other codecs rely upon this, but imo we should also add a low-level test for directories that does something like this to ensure its really tested: {code} dir.createOutput(x); dir.openInput(x); input.seek(somewhere); clone = input.clone(); assertEquals(somewhere, clone.getFilePointer()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy
[ https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179 ] Hoss Man edited comment on SOLR-4934 at 6/19/13 5:25 PM: - bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound files and the useCompoundFile setting under indexConfig that I found in the branch_4x example wasn't turning it off. Oh ... hmmm, yeah ... i hadn't noticed that. definitely a bug there. I've opened SOLR-4941 to track that, and we'll leave this issue specifically about the broken initargs config option. *EDIT:* fixed issue number was (Author: hossman): bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound files and the useCompoundFile setting under indexConfig that I found in the branch_4x example wasn't turning it off. Oh ... hmmm, yeah ... i hadn't noticed that. definitely a bug there. I've opened SOLR-4926 to track that, and we'll leave this issue specifically about the broken initargs config option. Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy -- Key: SOLR-4934 URL: https://issues.apache.org/jira/browse/SOLR-4934 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in MergePolicies * existing users may have configs that use mergePolicy init args to try and call that setter * we already do some explicit checks for these MergePolices in SolrIndexConfig to deal with legacy syntax * update the existing logic to remove useCompoundFile from the MergePolicy initArgs for these known policies if found, and log a warning. (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs regardless of class in case someone has a custom MergePolicy that implements that logic -- that would suck) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned SOLR-4941: -- Assignee: Hoss Man useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy
[ https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688179#comment-13688179 ] Hoss Man commented on SOLR-4934: bq. The only thing I knew was that LUCENE-5038 had caused Solr to make compound files and the useCompoundFile setting under indexConfig that I found in the branch_4x example wasn't turning it off. Oh ... hmmm, yeah ... i hadn't noticed that. definitely a bug there. I've opened SOLR-4926 to track that, and we'll leave this issue specifically about the broken initargs config option. Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy -- Key: SOLR-4934 URL: https://issues.apache.org/jira/browse/SOLR-4934 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in MergePolicies * existing users may have configs that use mergePolicy init args to try and call that setter * we already do some explicit checks for these MergePolices in SolrIndexConfig to deal with legacy syntax * update the existing logic to remove useCompoundFile from the MergePolicy initArgs for these known policies if found, and log a warning. (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs regardless of class in case someone has a custom MergePolicy that implements that logic -- that would suck) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
Hoss Man created SOLR-4941: -- Summary: useCompoundFile default has changed, simple config option no longer seems to work Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy
[ https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-4934. Resolution: Fixed merged r1494348 - 4x as r1494696 Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy -- Key: SOLR-4934 URL: https://issues.apache.org/jira/browse/SOLR-4934 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in MergePolicies * existing users may have configs that use mergePolicy init args to try and call that setter * we already do some explicit checks for these MergePolices in SolrIndexConfig to deal with legacy syntax * update the existing logic to remove useCompoundFile from the MergePolicy initArgs for these known policies if found, and log a warning. (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs regardless of class in case someone has a custom MergePolicy that implements that logic -- that would suck) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5030) FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters
[ https://issues.apache.org/jira/browse/LUCENE-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688185#comment-13688185 ] Michael McCandless commented on LUCENE-5030: The easy performance tester to run is lucene/suggest/src/test/org/apache/lucene/search/suggest/LookupBenchmarkTest.java ... we should test that first I think? I can also run one based on FreeDB ... the sources are in luceneutil (https://code.google.com/a/apache-extras.org/p/luceneutil/ ). If the perf hit is too much then one option would be to make it optional (whether we count edits in Unicode space UTF-8 space), or maybe just another suggester class (FuzzyUnicodeSuggester?). I think we can use INFO_SEP: yes, this is used for PAYLOAD_SEP, but that only means the incoming surfaceForm cannot contain this char, I think? So ... I think we are free to use it in the analyzed form? Or did something go wrong when you tried? Whichever chars we use (steal), we should add checks that these chars do not occur in the input... FuzzySuggester has to operate FSTs of Unicode-letters, not UTF-8, to work correctly for 1-byte (like English) and multi-byte (non-Latin) letters Key: LUCENE-5030 URL: https://issues.apache.org/jira/browse/LUCENE-5030 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.3 Reporter: Artem Lukanin Attachments: nonlatin_fuzzySuggester1.patch, nonlatin_fuzzySuggester2.patch, nonlatin_fuzzySuggester3.patch, nonlatin_fuzzySuggester4.patch, nonlatin_fuzzySuggester.patch, nonlatin_fuzzySuggester.patch There is a limitation in the current FuzzySuggester implementation: it computes edits in UTF-8 space instead of Unicode character (code point) space. This should be fixable: we'd need to fix TokenStreamToAutomaton to work in Unicode character space, then fix FuzzySuggester to do the same steps that FuzzyQuery does: do the LevN expansion in Unicode character space, then convert that automaton to UTF-8, then intersect with the suggest FST. See the discussion here: http://lucene.472066.n3.nabble.com/minFuzzyLength-in-FuzzySuggester-behaves-differently-for-English-and-Russian-td4067018.html#none -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4934) Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy
[ https://issues.apache.org/jira/browse/SOLR-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688190#comment-13688190 ] Uwe Schindler commented on SOLR-4934: - bq. Assuming there are plenty of file descriptors available, will a user get better performance from compound files or separate files? Searching on the index will have no negative impact. IndexInputSlicer returns optimized indexinputs that remove the whole file offset stuff. Indexing speed is identical, too, but merging (done in background) is more expensive. Prevent runtime failure if users use initargs useCompoundFile setting on LogMergePolicy or TieredMergePolicy -- Key: SOLR-4934 URL: https://issues.apache.org/jira/browse/SOLR-4934 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 * LUCENE-5038 eliminated setUseCompoundFile(boolean) from the built in MergePolicies * existing users may have configs that use mergePolicy init args to try and call that setter * we already do some explicit checks for these MergePolices in SolrIndexConfig to deal with legacy syntax * update the existing logic to remove useCompoundFile from the MergePolicy initArgs for these known policies if found, and log a warning. (NOTE: i don't want to arbitrarily remove useCompoundFile from the initArgs regardless of class in case someone has a custom MergePolicy that implements that logic -- that would suck) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat
[ https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688201#comment-13688201 ] Subhash Karemore commented on SOLR-4939: Hi, I think you are right. I am not too much familiar with linux environment. Could you please tell me exact command for allowing TCP connection so that I should able to connect to remote oracle DB using java. I searched lot for this problem, however I didn't find the exact command/solution. I appreciate your help. Regards, Subhash Not able to import oracle DB on RedHat -- Key: SOLR-4939 URL: https://issues.apache.org/jira/browse/SOLR-4939 Project: Solr Issue Type: Bug Affects Versions: 4.3.1 Environment: Redhat Linux Reporter: Subhash Karemore I have configured my RedHat system for Solr. After that I started the solr, it is started properly. I have to import the Oracle DB for indexing. My data config file is. dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd batchSize=1 / document entity name=table1 query=SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 field column=ID name=id / field column=col2 name=col2 / field column=col3 name=col3 / /entity /document /dataConfig I have done similar changes for schema.xml file. I have copied the solr-dataimporthandler-4.3.0.jar, solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder. With this setting, it is working properly on Windows. However on RedHat, it is not working. It is giving me errors when I try to index DB. Below are the errors which I got on console. ERROR org.apache.solr.handler.dataimport.DocBuilder â Exception while processing: table1 document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458) at oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546) at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 12 more Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection at
[jira] [Commented] (SOLR-4939) Not able to import oracle DB on RedHat
[ https://issues.apache.org/jira/browse/SOLR-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688217#comment-13688217 ] Uwe Schindler commented on SOLR-4939: - Ask your firewall administrator, we have no idea about your environment and cannot help! A quick test if it works at all is to enter the following on shell (needs netcat installed): {code} nc hostname_of_oracle_server 2126 {code} If this also timeouts, ask somebody who knows your network. Not able to import oracle DB on RedHat -- Key: SOLR-4939 URL: https://issues.apache.org/jira/browse/SOLR-4939 Project: Solr Issue Type: Bug Affects Versions: 4.3.1 Environment: Redhat Linux Reporter: Subhash Karemore I have configured my RedHat system for Solr. After that I started the solr, it is started properly. I have to import the Oracle DB for indexing. My data config file is. dataConfig dataSource type=JdbcDataSource driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:@//hostname:2126/DBNAme user=user password=Passwd batchSize=1 / document entity name=table1 query=SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 field column=ID name=id / field column=col2 name=col2 / field column=col3 name=col3 / /entity /document /dataConfig I have done similar changes for schema.xml file. I have copied the solr-dataimporthandler-4.3.0.jar, solr-dataimporthandler-extras-4.3.0.jar, solr-solrj-4.3.0.jar from dist folder to ../lib folder. Also I have downloaded ojdbc6.jar and put in same folder. With this setting, it is working properly on Windows. However on RedHat, it is not working. It is giving me errors when I try to index DB. Below are the errors which I got on console. ERROR org.apache.solr.handler.dataimport.DocBuilder â Exception while processing: table1 document : SolrInputDocument[]:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: SELECT ID, col2, col3 FROM table1 WHERE rownum BETWEEN 1 AND 1000 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) Caused by: java.sql.SQLRecoverableException: IO Error: The Network Adapter could not establish the connection at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:458) at oracle.jdbc.driver.PhysicalConnection.init(PhysicalConnection.java:546) at oracle.jdbc.driver.T4CConnection.init(T4CConnection.java:236) at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32) at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:521) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:161) at org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:127) at org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:366) at org.apache.solr.handler.dataimport.JdbcDataSource.access$200(JdbcDataSource.java:38) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240) ... 12 more Caused by: oracle.net.ns.NetException: The Network Adapter could not establish the connection at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:392) at
[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688270#comment-13688270 ] Hoss Man commented on SOLR-4941: I understand what happend now.. when simon asked on the mailing list for help reviewing the solr changes affected by LUCENE-5038 i didn't fully understand the scope of the change, and only focused on how it affected the existing MergePolicy settings (SOLR-4934) -- but i only noticed that setUseCompoundFile had been removed from the merge policies in facvor us only using the ratio -- i didn't realize that setUseCompoundFile was actaully moved to IndexWriterConfig. i'll work up a patch to make the existing solr settings apply to the IndexWriterConfig. useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: IndexWriter commit user data takes a map
Hi Steve, Thanks for pointing it out. I was actually looking at SOLR-2701 when I thought about why have a Map instead of a string identifier. So I'm guessing this should be left untouched? On Wed, Jun 19, 2013 at 7:55 PM, Steve Rowe sar...@gmail.com wrote: Hi Varun, LUCENE-4575 did not change IW's user data to a Map. That was done in LUCENE-1654. Steve On Jun 19, 2013, at 6:57 AM, Varun Thacker varunthacker1...@gmail.com wrote: I was just curious as to why IW.setCommitData uses a map ? Looking back at LUCENE-1382 when committing user data was introduced it took a string. In LUCENE-4575 it was refactored and changed to a Map. From the comments I couldn't really figure out why was it changed. -- Regards, Varun Thacker http://www.vthacker.in/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Regards, Varun Thacker http://www.vthacker.in/
[jira] [Commented] (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330 ] Alexander Kanarsky commented on SOLR-1301: -- [~otis], do you mean to use the Solr query result as an MapReduce job input? Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 4.4 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-1301) Solr + Hadoop
[ https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688330#comment-13688330 ] Alexander Kanarsky edited comment on SOLR-1301 at 6/19/13 7:17 PM: --- [~otis], do you mean to use the Solr query result as an MapReduce job input? Also, regarding the SOLR-1045, it is a different approach (in Map phase vs. Reduce phase- great explanation by Ted is up here: https://issues.apache.org/jira/browse/SOLR-1301#comment-12828961) was (Author: kanarsky): [~otis], do you mean to use the Solr query result as an MapReduce job input? Solr + Hadoop - Key: SOLR-1301 URL: https://issues.apache.org/jira/browse/SOLR-1301 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Andrzej Bialecki Fix For: 4.4 Attachments: commons-logging-1.0.4.jar, commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop-0.20.1-core.jar, hadoop-core-0.20.2-cdh3u3.jar, hadoop.patch, log4j-1.2.15.jar, README.txt, SOLR-1301-hadoop-0-20.patch, SOLR-1301-hadoop-0-20.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SolrRecordWriter.java This patch contains a contrib module that provides distributed indexing (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is twofold: * provide an API that is familiar to Hadoop developers, i.e. that of OutputFormat * avoid unnecessary export and (de)serialization of data maintained on HDFS. SolrOutputFormat consumes data produced by reduce tasks directly, without storing it in intermediate files. Furthermore, by using an EmbeddedSolrServer, the indexing task is split into as many parts as there are reducers, and the data to be indexed is not sent over the network. Design -- Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, which in turn uses SolrRecordWriter to write this data. SolrRecordWriter instantiates an EmbeddedSolrServer, and it also instantiates an implementation of SolrDocumentConverter, which is responsible for turning Hadoop (key, value) into a SolrInputDocument. This data is then added to a batch, which is periodically submitted to EmbeddedSolrServer. When reduce task completes, and the OutputFormat is closed, SolrRecordWriter calls commit() and optimize() on the EmbeddedSolrServer. The API provides facilities to specify an arbitrary existing solr.home directory, from which the conf/ and lib/ files will be taken. This process results in the creation of as many partial Solr home directories as there were reduce tasks. The output shards are placed in the output directory on the default filesystem (e.g. HDFS). Such part-N directories can be used to run N shard servers. Additionally, users can specify the number of reduce tasks, in particular 1 reduce task, in which case the output will consist of a single shard. An example application is provided that processes large CSV files and uses this API. It uses a custom CSV processing to avoid (de)serialization overhead. This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this issue, you should put it in contrib/hadoop/lib. Note: the development of this patch was sponsored by an anonymous contributor and approved for release under Apache License. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
Michael McCandless created LUCENE-5069: -- Summary: Can/should we store NumericField's precisionStep in the index? Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339 ] Mark Miller commented on SOLR-4926: --- bq. the use of CFS somehow causes replication to fail Yeah, this is what I'm seeing - I just caught a really good sample case with decent logging. The recovering replica commits on the leader and that leader then has 126 docs to replicate. 16 documents end up on the relica after the replication - 110 short. The leader is on gen 3, the replica on gen 1. Perhaps a red herring, but in the many cases of this I've looked at, oddly, no buffered docs are ever replayed after that - though I have seen buffered docs replayed in those same runs when the replication did not fail. Weird observation. Anyway, I need to turn on more replication level logging I think. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688339#comment-13688339 ] Mark Miller edited comment on SOLR-4926 at 6/19/13 7:23 PM: bq. the use of CFS somehow causes replication to fail Yeah, this is what I'm seeing - I just caught a really good sample case with decent logging. The recovering replica commits on the leader and that leader then has 126 docs to replicate. 16 documents end up on the relica after the replication - 110 short. Before the replication, the leader is on gen 3, the replica on gen 1. Perhaps a red herring, but in the many cases of this I've looked at, oddly, no buffered docs are ever replayed after that - though I have seen buffered docs replayed in those same runs when the replication did not fail. Weird observation. Anyway, I need to turn on more replication level logging I think. was (Author: markrmil...@gmail.com): bq. the use of CFS somehow causes replication to fail Yeah, this is what I'm seeing - I just caught a really good sample case with decent logging. The recovering replica commits on the leader and that leader then has 126 docs to replicate. 16 documents end up on the relica after the replication - 110 short. The leader is on gen 3, the replica on gen 1. Perhaps a red herring, but in the many cases of this I've looked at, oddly, no buffered docs are ever replayed after that - though I have seen buffered docs replayed in those same runs when the replication did not fail. Weird observation. Anyway, I need to turn on more replication level logging I think. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346 ] Uwe Schindler commented on LUCENE-5069: --- I think we can do this. I had the same in mind, but lots of people were against for schema reasons (you know, no schema info in index). If we save precision step we should also save type like we do for stored fields. The search works with multiple of original precision step is correct, btw While indexing, adding a new item with different step should also fail. The check on indexing show would be done in the TermsEnum initialization of mtq's getTermsEnum(). Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688346#comment-13688346 ] Uwe Schindler edited comment on LUCENE-5069 at 6/19/13 7:30 PM: I think we can do this. I had the same in mind, but lots of people were against for schema reasons (you know, no schema info in index). If we save precision step we should also save type like we do for stored fields. The search works with multiple of original precision step is correct, btw While indexing, adding a new item with different step should also fail. The check on searching would be done in the TermsEnum initialization of mtq's getTermsEnum(). was (Author: thetaphi): I think we can do this. I had the same in mind, but lots of people were against for schema reasons (you know, no schema info in index). If we save precision step we should also save type like we do for stored fields. The search works with multiple of original precision step is correct, btw While indexing, adding a new item with different step should also fail. The check on indexing show would be done in the TermsEnum initialization of mtq's getTermsEnum(). Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Estimating Solr memory requirements
Hi Erick, Is typo in the title on purpose? On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote: OK, I seem to have stalled on this. Over part of the winter, I put together a Swing-based program to help estimate Solr/Lucene memory requirements, with all the usual caveats see: https://github.com/ErickErickson/SolrMemoryEsitmator. I have notes to myself that it's still deficient in several areas: FieldValueCache estimates tlog requirements Memory required to re-open a searcher Position and term vector memory requirements And whatever I haven't thought about yet. Of course it builds on Grant's spreadsheet (reads steals from it shamelessly!) I'm hoping to have a friendlier interface. And _of course_ I'd be willing to donate it to Solr as a util/contrib/whatever if it fits. So, what I'm about here is a few things: Anyone who wants to try it feel free. The build instructions are at the above, but the short form is to clone it, ant jar and java -jar dist/estimator.jar. Enter some field info and hit the Add/Save button then hit the Dump calcs button to see what it does currently. It also saves the estimates away in a file and shows all the steps it goes through to perform the calculations. It'll also make rudimentary field definitions from the entered data. You can come back to it later and add to what you've already done. Make any improvements you see fit, particular to flesh out the deficiencies listed above. Anyone who has, you know, graphic design/Swing skills please feel free to make it better. I'm a newbie as far as using Swing is concerned, and the way I align buttons and checkboxes is pretty hacky. But it works Any suggestions anyone wants to make. Suggestions in code are nicest of course, but algorithms for calculating, say, position and tv memory usage would be great as well! Isolated code snippets that I could incorporate would be great too. Any info where I've gotten the calculations wrong or don't show enough info to actually figure out whether they're correct or not. Note that the goal for this is to give a rough idea of memory requirements and be easy to use. The spreadsheet is a bit daunting to someone who knows nothing about Solr so this might be an easier way to get into it. Thanks, Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688355#comment-13688355 ] Adrien Grand commented on LUCENE-5069: -- bq. While indexing, adding a new item with different step should also fail. +1 This motivation is enough to me to store the precision step in the field info. Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688359#comment-13688359 ] Uwe Schindler commented on LUCENE-5069: --- With this info in FieldInfo we could automatically select the right precision step for each atomic reader processed while the query runs. Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4941: --- Attachment: infostream.txt SOLR-4941.patch Patch that improves the tests and updates the logic added in SOLR-4934 so that if there is explicit useCompoundFile configuration as an init arg for a (known) MergePolicy we pass that to the IndexWriterConfig's setUseCompoundFile method and log a warning instead of just ignoring it. patch also removes the warnings about the simple legacy useCompoundFile syntax since that actually makes sense now that it's a setting on IWC. I've also updated the tests to inspect the useCompoundFile on the IWC as well as checking the results of adding some segments. there is still a failure in testTieredMergePolicyConfig where (as i understand it from talking to mike on IRC) the merged segment after the optimize command should *not* be in CFS format because of the noCFSRatio setting -- but the merged segment is still in CFS. i've attached the infostream log from running ant test -Dtestcase=TestMergePolicyConfig -Dtests.method=testTieredMergePolicyConfig to see if it helps illuminate the problem ... i suspect it's either a test bug because i still missunderstand something about how the MergePolicy settings come into play, or a genuine bug in the lower level TieredMP code -- i don't see how it could be specific to the solr config parsing logic since the IWC and TMP getters say they got the expected settings. (NOTE: the patch includes a nocommit in solrconfig-mergepolicy.xml to turn off the infostream before committing) useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: infostream.txt, SOLR-4941.patch Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688393#comment-13688393 ] Robert Muir commented on LUCENE-5069: - {quote} I had the same in mind, but lots of people were against for schema reasons (you know, no schema info in index). If we save precision step we should also save type like we do for stored fields. {quote} Count me as one of those: I'm worried how the issue has already jumped to this. Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI
[ https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3838: Attachment: SOLR-3838.patch Updated Patch, includes the focus on last possible row after deletion-change. will commit that shortly Admin UI - Multiple filter queries are not supported in Query UI Key: SOLR-3838 URL: https://issues.apache.org/jira/browse/SOLR-3838 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA Reporter: Jack Krupansky Assignee: Stefan Matheis (steffkes) Fix For: 5.0, 4.4 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, SOLR-3838.patch, SOLR-3838.patch The Solr Admin Query UI has only a single fq input field, which does not permit the user to enter multiple filter query parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688405#comment-13688405 ] Robert Muir commented on LUCENE-5069: - {quote} With this info in FieldInfo we could automatically select the right precision step for each atomic reader processed while the query runs. {quote} The problem is its too late: QueryParser/Query are independent of readers: so they dont know to generate the correct query (e.g. NumericRangeQuery instead of TermRangeQuery) in the first place! So this issue misses the forest for the trees, sorry, -1 to a halfass schema that brings all of the problems of a schema and none of the benefits! Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3838) Admin UI - Multiple filter queries are not supported in Query UI
[ https://issues.apache.org/jira/browse/SOLR-3838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-3838. - Resolution: Implemented Committed in .. trunk: r1494762 branch_4x: r1494763 Admin UI - Multiple filter queries are not supported in Query UI Key: SOLR-3838 URL: https://issues.apache.org/jira/browse/SOLR-3838 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 4.0-BETA Reporter: Jack Krupansky Assignee: Stefan Matheis (steffkes) Fix For: 5.0, 4.4 Attachments: screenshot-1.jpg, SOLR-3838.patch, SOLR-3838.patch, SOLR-3838.patch, SOLR-3838.patch The Solr Admin Query UI has only a single fq input field, which does not permit the user to enter multiple filter query parameters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4456) Admin UI: Displays dashboard even if Solr is down
[ https://issues.apache.org/jira/browse/SOLR-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-4456. - Resolution: Fixed Fix Version/s: 5.0 committed the current state in trunk r1494765 branch_4x r1494768 if there a suggestions for tweaking take, please open a new ticket for that Admin UI: Displays dashboard even if Solr is down - Key: SOLR-4456 URL: https://issues.apache.org/jira/browse/SOLR-4456 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.1 Reporter: Jan Høydahl Assignee: Stefan Matheis (steffkes) Fix For: 5.0, 4.4 Attachments: SOLR-4456.patch, SOLR-4456.patch, SOLR-4456.patch 1. Run Solr and bruing up the Admin dashboard 2. Stop Solr 3. Click around the Admin GUI. It apparently works, but displays a spinning wheel for most panels 4. Click on Dashboard. An old cached dashboard is displayed What should happen is that once connection to Solr is lost, the whole Admin UI displays a large red box CONNECTION LOST or something :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reopened LUCENE-4583: StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Priority: Critical Fix For: 4.4 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688440#comment-13688440 ] Mark Miller commented on SOLR-4926: --- Reviewing some more sample fails of RecoveryZkTest: It actually looks like after the replication we end up with one commit point back - eg we are trying to replicate gen 3 and replica moves from gen 1 to gen 2. - Mark I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688444#comment-13688444 ] Michael McCandless commented on SOLR-4941: -- Indeed I can see that TMP has noCFSRatio=0.6, and two segments are flushed turned into CFS, then those two segments are merged, and then the merged segment is turned into a CFS. I think this means that the merged segment's files (pre-CFS) are 0.6 the size of the two flushed CFS segments ... e.g. maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe? useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: infostream.txt, SOLR-4941.patch Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4719) Admin UI - Default to wt=json on Query-Screen
[ https://issues.apache.org/jira/browse/SOLR-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-4719. - Resolution: Implemented Fix Version/s: 5.0 committed in trunk r1494772 branch_4x r1494774 Admin UI - Default to wt=json on Query-Screen - Key: SOLR-4719 URL: https://issues.apache.org/jira/browse/SOLR-4719 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Stefan Matheis (steffkes) Priority: Minor Fix For: 5.0, 4.4 I didn't really notice that we're still using {{wt=xml}} as default on the Query-Screen .. i suggest we change that to {{wt=json}} .. it's 2013 =) Syntax-Highlight would still work, even if one tries the example-configuration where the content-type is overwritten with text/plain, since it's based on the selection on the left side :) Any objections? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3546) Add index page to Admin UI
[ https://issues.apache.org/jira/browse/SOLR-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) resolved SOLR-3546. - Resolution: Duplicate Assignee: Stefan Matheis (steffkes) Add index page to Admin UI -- Key: SOLR-3546 URL: https://issues.apache.org/jira/browse/SOLR-3546 Project: Solr Issue Type: New Feature Components: web gui Reporter: Lance Norskog Assignee: Stefan Matheis (steffkes) Priority: Minor It would be great to index a file by uploading it. In designing schemas and testing features I often make one or two test documents. It would be great to upload these directly from the UI. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2440) Schema Browser more user friendly
[ https://issues.apache.org/jira/browse/SOLR-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688456#comment-13688456 ] Stefan Matheis (steffkes) commented on SOLR-2440: - [~jcodina] WDYT? If it's covered i'm going to close this one Schema Browser more user friendly - Key: SOLR-2440 URL: https://issues.apache.org/jira/browse/SOLR-2440 Project: Solr Issue Type: New Feature Components: web gui Affects Versions: 1.4.1 Environment: The schema browser of the admin web application Reporter: Joan Codina Priority: Minor Labels: browser, schema Fix For: 4.4 Attachments: LUCENE_4_schema_jsp.patch, LUCENE_4_screen_css.patch, schema_jsp.patch Original Estimate: 1h Remaining Estimate: 1h The schema browser has some drawbacks * Does not sort the fields (the actual sorting seems arbritrary) * Capitalises all field names. Making difficult the match * Does not allow a drill down This small patch solves the three issues: # Changes the Css to do not capitalise the links # Sorts the field names # It replaces the tokens by links to a search query with that token that's all -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Estimating Solr memory requirements
Nope, never even noticed it until now. That's the right URL though, typo and all Someday I may even fix it G... Thanks, Erick On Wed, Jun 19, 2013 at 3:35 PM, Dmitry Kan dmitry.luc...@gmail.com wrote: Hi Erick, Is typo in the title on purpose? On 19 June 2013 15:09, Erick Erickson erickerick...@gmail.com wrote: OK, I seem to have stalled on this. Over part of the winter, I put together a Swing-based program to help estimate Solr/Lucene memory requirements, with all the usual caveats see: https://github.com/ErickErickson/SolrMemoryEsitmator. I have notes to myself that it's still deficient in several areas: FieldValueCache estimates tlog requirements Memory required to re-open a searcher Position and term vector memory requirements And whatever I haven't thought about yet. Of course it builds on Grant's spreadsheet (reads steals from it shamelessly!) I'm hoping to have a friendlier interface. And _of course_ I'd be willing to donate it to Solr as a util/contrib/whatever if it fits. So, what I'm about here is a few things: Anyone who wants to try it feel free. The build instructions are at the above, but the short form is to clone it, ant jar and java -jar dist/estimator.jar. Enter some field info and hit the Add/Save button then hit the Dump calcs button to see what it does currently. It also saves the estimates away in a file and shows all the steps it goes through to perform the calculations. It'll also make rudimentary field definitions from the entered data. You can come back to it later and add to what you've already done. Make any improvements you see fit, particular to flesh out the deficiencies listed above. Anyone who has, you know, graphic design/Swing skills please feel free to make it better. I'm a newbie as far as using Swing is concerned, and the way I align buttons and checkboxes is pretty hacky. But it works Any suggestions anyone wants to make. Suggestions in code are nicest of course, but algorithms for calculating, say, position and tv memory usage would be great as well! Isolated code snippets that I could incorporate would be great too. Any info where I've gotten the calculations wrong or don't show enough info to actually figure out whether they're correct or not. Note that the goal for this is to give a rough idea of memory requirements and be easy to use. The spreadsheet is a bit daunting to someone who knows nothing about Solr so this might be an easier way to get into it. Thanks, Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688522#comment-13688522 ] Mark Miller commented on SOLR-4926: --- In the case where the slave is on gen 2, it did just download the files for gen 3 - so it seems we are not picking up the latest commit point somehow.. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5066) TestFieldsReader fails in 4.x with OOM
[ https://issues.apache.org/jira/browse/LUCENE-5066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688532#comment-13688532 ] Robert Muir commented on LUCENE-5066: - I mentioned this in the email: should we do it here under this issue? re above: I think we should spin off an issue to improve the codec checks (so we get assert fails at least, rather than OOM), i imagine this would be part of that issue, but can do it here too. TestFieldsReader fails in 4.x with OOM -- Key: LUCENE-5066 URL: https://issues.apache.org/jira/browse/LUCENE-5066 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5066.patch Its FaultyIndexInput is broken (doesn't implement seek/clone correctly). This causes it to read bogus data and try to allocate an enormous byte[] for a term. The bug was previously hidden: FaultyDirectory doesnt override openSlice, so CFS must not be used at flush if you want to trigger the bug. FailtyIndexInput's clone is broken, it uses new but doesn't seek the clone to the right place. This causes a disaster with BufferedIndexInput (which it extends), because BufferedIndexInput (not just the delegate) must know its position since it has seek-within-block etc code... It seems with this test (very simple one), that only 3.x codec triggers it because its term dict relies upon clone()'s being seek'd to right place. I'm not sure what other codecs rely upon this, but imo we should also add a low-level test for directories that does something like this to ensure its really tested: {code} dir.createOutput(x); dir.openInput(x); input.seek(somewhere); clone = input.clone(); assertEquals(somewhere, clone.getFilePointer()); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4941: --- Attachment: SOLR-4941.patch bq. maybe the CFS headers of the first 2 segments are tipping the scale? Try indexing more docs for each segment maybe? yeah .. i guess i was just naive in considering 0.6 a low enough threshold. i increased the size of the docs and the number of docs per segment -- and when that still didn't work i also decreased the ratio to 0.1 and that seemed to do the trick. updated patch fixes the test, removes the nocommit, and updates the upgrading instructions in CHANGES.txt (still need an explicit Bug Fix entry though) still running more test iters, but i think this is pretty good. useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4941: --- Fix Version/s: 4.4 5.0 useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688589#comment-13688589 ] Adriano Crestani commented on LUCENE-5069: -- Couldn't the standard flexible query parser be used for that? I know you can configure numeric fields in it before parsing a query. I think there is a wiki about it, just can't find it, maybe Uwe remembers where it is. For now you can take a look at TestNumericQueryParser. Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4618) Integrate LucidWorks' Solr Reference Guide with Solr documentation
[ https://issues.apache.org/jira/browse/SOLR-4618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688623#comment-13688623 ] Hoss Man commented on SOLR-4618: FYI: Things have kind of been in a holding pattern for a while now ... first i was waiting for some confirmation from Infra to proceed, then Gavin in Infra said he wanted to do a full backup first and be online during the import, then after playing jira irc message tag for a bit (Gavin and i are in diametricly opposed timezones) Infra announced that they are upgrading CWIKI to Confluence 5.x. I _think_ the current plan is to import the data into the current wiki sometime in the next day or so before the upgrade, but it may happen as part of the ugprade, or perhaps after the upgrade ... i really don't know. Integrate LucidWorks' Solr Reference Guide with Solr documentation -- Key: SOLR-4618 URL: https://issues.apache.org/jira/browse/SOLR-4618 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 4.1 Reporter: Cassandra Targett Assignee: Hoss Man Attachments: NewSolrStyle.css, SolrRefGuide4.1-ASF.zip, SolrRefGuide.4.3.zip LucidWorks would like to donate the Apache Solr Reference Guide, maintained by LucidWorks tech writers, to the Solr community. It was first produced in 2009 as a download-only PDF for Solr 1.4, but since 2011 it has been online at http://docs.lucidworks.com/display/solr/ and updated for Solr 3.x releases and for Solr 4.0 and 4.1. I've prepared an XML export from our Confluence installation, which can be easily imported into the Apache Confluence installation by someone with system admin rights. The doc has not yet been updated for 4.2, so it covers Solr 4.1 so far. I'll add some additional technical notes about the export itself in a comment. Since we use Confluence at LucidWorks, I can also offer assistance getting Confluence set up, importing this package into it, or any other help needed for the community to start using this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5069) Can/should we store NumericField's precisionStep in the index?
[ https://issues.apache.org/jira/browse/LUCENE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688631#comment-13688631 ] Robert Muir commented on LUCENE-5069: - Sure but then you basically have 2 schemas :) Alternatively we could argue numericrangequery is something that a QP should never generate anyway: instead maybe QP's should only worry about user intent and generate RangeQuery, which rewrite()s to the correct type... My point is we should just think these things thru without introducing additional schema-like things into lucene, since we already have enough of them (Analyzer configuration for example, is a form of schema, maintained by the user). Can/should we store NumericField's precisionStep in the index? -- Key: LUCENE-5069 URL: https://issues.apache.org/jira/browse/LUCENE-5069 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I was just helping a user (buzzkills) on IRC on why NumericRangeQuery was failing to hit the expected docs ... and it was because s/he had indexed with precStep=4 but searched with precStep=1. Then we wondered if it'd be possible to somehow catch this, e.g. we could maybe store precStep in FieldInfo, and then fail at search time if you use a non-matching precStep? I think you can index fine and then search on a multiple of that? E.g., I can index with precStep=2 but search with precStep=8? But indexing with precStep=4 and searching precStep=1 won't work ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #885: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/885/ 1 tests failed. REGRESSION: org.apache.solr.cloud.SyncSliceTest.testDistribSearch Error Message: shard1 is not consistent. Got 305 from http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from http://127.0.0.1:63228/g_d/x/collection1 Stack Trace: java.lang.AssertionError: shard1 is not consistent. Got 305 from http://127.0.0.1:64102/g_d/x/collection1lastClient and got 253 from http://127.0.0.1:63228/g_d/x/collection1 at __randomizedtesting.SeedInfo.seed([201755EC8EA7E3B9:A1F1DBF4F9F88385]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1018) at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:238) Build Log: [...truncated 23632 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4941) useCompoundFile default has changed, simple config option no longer seems to work
[ https://issues.apache.org/jira/browse/SOLR-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-4941. Resolution: Fixed Committed revision 1494837. Committed revision 1494839. useCompoundFile default has changed, simple config option no longer seems to work - Key: SOLR-4941 URL: https://issues.apache.org/jira/browse/SOLR-4941 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 5.0, 4.4 Attachments: infostream.txt, SOLR-4941.patch, SOLR-4941.patch Spin off of SOLR-4934. We should updated tests to ensure that the various ways of specifying useCompoundFile as well as the expected default are working properly after LUCENE-5038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4942) Add more randomized testing of compound file format and random merge policies
Hoss Man created SOLR-4942: -- Summary: Add more randomized testing of compound file format and random merge policies Key: SOLR-4942 URL: https://issues.apache.org/jira/browse/SOLR-4942 Project: Solr Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man SOLR-4926 seems to have uncovered some sporadic cloud/replication bugs related to using compound files. We should updated SolrTestCaseJ4 and the majority of our test configs to better randomize the usage of compound files and merge policies. Step #1... * update test configs to use {{useCompoundFile${useCompoundFile:false}/useCompoundFile}} * update SolrTestCaseJ4 to toggle that sys property randomly Step #2... * add a new RandomMergePolicy that implements MergePolicy by proxying to another instance selected at creation using one of the LuceneTestCase.new...MergePolicy methods * updated test configs to refer to this new MergePolicy * borrow the tests.shardhandler.randomSeed logic in SolrTestCaseJ4 to give our RandomMergePolicy a consistent seed at runtime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4926) I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk.
[ https://issues.apache.org/jira/browse/SOLR-4926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688807#comment-13688807 ] Mark Miller commented on SOLR-4926: --- I've been focusing on the RecoveryZkTest case. Every fail I've looked at has used the RAM dir. Odd because the safe leader test that fails is hard coded to not use ramdir I think. RecoveryZkTest also uses mock dir, but I don't think the safe leader test does because of the hard coding to standard dir. Anyway, more on what I'm seeing from the RecoveryZkTest fails: we replicate gen 3 files, we reopen the writer and then the searcher using that writer - we get an index of gen 2 - the files from the searcher's directory don't contain the newly replicated files, just the gen 2 index files. I am seeing RecoveryZkTest and ChaosMonkeySafeLeaderTest fail often on trunk. - Key: SOLR-4926 URL: https://issues.apache.org/jira/browse/SOLR-4926 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Blocker Fix For: 5.0, 4.4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org