[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028620#comment-13028620 ] Sami Siren commented on SOLR-2493: -- Trunk version of Solr has this same problem too, I just timed comparable difference in req/sec when caching the Version vs the current implementation. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Priority: Blocker Labels: core, parser, performance, request, solr I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2481) Add support for commitWithin in DataImportHandler
[ https://issues.apache.org/jira/browse/SOLR-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-2481: - Component/s: contrib - DataImportHandler Add support for commitWithin in DataImportHandler - Key: SOLR-2481 URL: https://issues.apache.org/jira/browse/SOLR-2481 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Reporter: Sami Siren Priority: Trivial Attachments: SOLR-2481.patch It looks like DataImportHandler does not support commitWithin. Would be nice if it did. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned SOLR-2493: --- Assignee: Uwe Schindler SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2493: Attachment: SOLR-2493.patch Patch for trunk, 3.x/3.1 is similar, will attach after merge. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2493: Fix Version/s: 4.0 3.2 3.1.1 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2493: Attachment: SOLR-2493-3.x.patch Patch for 3.x and 3.1 branch. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028642#comment-13028642 ] Uwe Schindler commented on SOLR-2493: - I also reviewed other places where luceneMatchVersion is used, all other places are correct (SpellChecker...). SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2493: Attachment: SOLR-2493-3.x.patch Here the final 3.x patch (the prev one was incomplete) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved SOLR-2493. - Resolution: Fixed Committed trunk revision: 1099340 Merged 3.x revision: 1099347 Merged 3.1 branch revision: 1099349 You can fix this in you local installation by using the latest 3.1 stable branch, if you can't wait for 3.1.1 :-) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2493: Attachment: (was: SOLR-2493-3.x.patch) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028649#comment-13028649 ] Simon Willnauer commented on LUCENE-3018: - hey varun sorry for the long delay! I have a couple of comments for you: * I think we should default the compiler to whatever ant-conrtib uses as the default so we can remove the -Dcompilername * -Dbuild64 is an option that only works on x86-64 architectures so I think we can remove that too entirely. * we are going to commit the cpptasks jar file into the ant_lib directory so it comes with the checkout meaning you can remove the line in the overview.html file saying that you need to place the jar there. * the overview should say cd lucene/contrib/misc/ instead of cd lucene/dev/trunk/lucene/contrib/misc/ same is true for ... will be located in the lucene/dev/trunk/lucene/build/native/ simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
33 Days left to Berlin Buzzwords 2011
hey folks, BerlinBuzzwords 2011 is close only 33 days left until the big Search, Store and Scale opensource crowd is gathering in Berlin on June 6th/7th. The conference again focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking forward to two awesome keynote speakers who shaped the world of open source data analysis: Doug Cutting, founder of Apache Lucene and Hadoop) as well as Ted Dunning (Chief Application Architect at MapR Technologies and active developer at Apache Hadoop and Mahout). We are amazed by the amount and quality of the talk submissions we got. As a result this year we have added one more track to the main conference. If you haven't done so already, make sure to book your ticket now - early bird tickets are already sold out since April 7th and there might not be many tickets left. As we would like to give visitors of our main conference a reason to stay in town for the whole week, we have been talking to local co-working spaces and companies asking them for free space and WiFi to host Hackathons right after the main conference - that is on June 8th through 10th. If you would like to gather with fellow developers and users of your project, fix bugs together, hack on new features or give users a hands-on introduction to your tools, please submit your workshop proposal to our wiki: http://berlinbuzzwords.de/node/428 Please note that slots are assigned on a first come first serve basis. We are doing our best to get you connected, however space is limited. The deal is simple: We get you in touch with a conference room provider. Your event gets promoted in our schedule. Co-Ordination however is completely up to you: Make sure to provide an interesting abstract, provide a Hackathon registration area - see the Barcamp page for a good example: http://berlinbuzzwords.de/wiki/barcamp Attending Hackathons requires a Berlin Buzzwords ticket and (then free) registration at the Hackathon in question. Hope I see you all around in Berlin, Simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: I was accepted in GSoC!!!
Hi Vinicius, Submitting patches via JIRA is fine! We were just thinking about possibly providing some SVN to work with (as additional training), but came to the conclusion, that all students should go the standard Apache Lucene way of submitting patches to JIRA issues. You can of course still use SVN / GIT locally to organize your code. At the end we just need a patch to be committed by one of the core committers. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] Sent: Wednesday, May 04, 2011 6:23 AM To: dev@lucene.apache.org Subject: RE: I was accepted in GSoC!!! Hi Uwe, Sorry, I only saw your email today, busy lately with college homeworks. I was planning to submit patches to Lucene (through JIRA/email?). Do you have something else in mind? Regards, Vinicius Barros --- Em dom, 1/5/11, Uwe Schindler u...@thetaphi.de escreveu: De: Uwe Schindler u...@thetaphi.de Assunto: RE: I was accepted in GSoC!!! Para: dev@lucene.apache.org Data: Domingo, 1 de Maio de 2011, 7:36 Welcome Vinicius, I am glad to hear that you (my mentee) are one of the 5 students that are working for Apache Lucene/Solr this year. Until the coding officially starts, we should also sort out the infrastructure things like where to put the code and make a plan how to start. We should keep in close contact. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] Sent: Sunday, May 01, 2011 3:18 AM To: dev@lucene.apache.org; uschind...@apache.org Subject: I was accepted in GSoC!!! Hi, That's great, I am waiting next instructions from google, it seems there is some paperwork to do. Regards, Vinicius Barros --- Em seg, 25/4/11, no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com escreveu: De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com Assunto: Congratulations! Para: viniciusbarros.g...@yahoo.com.br Data: Segunda-feira, 25 de Abril de 2011, 15:48 Dear Vinicius, Congratulations! Your proposal LUCENE-1768: NumericRange support for new query parser as submitted to Apache Software Foundation has been accepted for Google Summer of Code 2011. Over the next few days, we will add you to the private Google Summer of Code Student Discussion List. Over the next few weeks, we will send instructions to this list regarding turn in proof of enrollment, tax forms, etc. Now that you've been accepted, please take the opportunity to speak with your mentors about plans for the Community Bonding Period: what documentation should you be reading, what version control system will you need to set up, etc., before start of coding begins on May 23rd. Welcome to Google Summer of Code 2011! We look forward to having you with us. With best regards, The Google Summer of Code Program Administration Team
[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-3018: -- Attachment: LUCENE-3018.patch I have made the changes mentioned above. Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028677#comment-13028677 ] Uwe Schindler commented on LUCENE-3065: --- Mike: I reviewed the patch again: You are currently using 3 bits already. 1 bit is solely for detecting numerics, the other two are the type. In my opinion, to check if its a numeric field, use a MASK of 3 bits and check for !=0. As soon as any bit in this mask is set, its numeric. The actual numeric fields have values !=0: {code} private static final int _NUMERIC_BIT_SHIFT = 3; static final byte FIELD_IS_NUMERIC_MASK = 0x07 _NUMERIC_BIT_SHIFT; static final byte FIELD_IS_NUMERIC_INT = 1 _NUMERIC_BIT_SHIFT; static final byte FIELD_IS_NUMERIC_LONG = 2 _NUMERIC_BIT_SHIFT; static final byte FIELD_IS_NUMERIC_FLOAT = 3 _NUMERIC_BIT_SHIFT; static final byte FIELD_IS_NUMERIC_DOUBLE = 4 _NUMERIC_BIT_SHIFT; // unused: static final byte FIELD_IS_NUMERIC_SHORT = 5 _NUMERIC_BIT_SHIFT; // unused: static final byte FIELD_IS_NUMERIC_BYTE = 6 _NUMERIC_BIT_SHIFT; // and we have still one more over :-) 7 _NUMERIC_BIT_SHIFT // check if field is numeric: if ((bits FIELD_IS_NUMERIC_MASK) != 0) {} // parse type: switch (bits FIELD_IS_NUMERIC_MASK) { case FIELD_IS_NUMERIC_INT: ... } {code} NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Updated] (LUCENENET-413) Medium trust security issue
[ https://issues.apache.org/jira/browse/LUCENENET-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy updated LUCENENET-413: --- Attachment: MediumTrust.2.9.4.patch constants.cs fix added into patch Medium trust security issue - Key: LUCENENET-413 URL: https://issues.apache.org/jira/browse/LUCENENET-413 Project: Lucene.Net Issue Type: Improvement Affects Versions: Lucene.Net 2.9.4 Environment: Lucene.Net 2.9.4, Lucene.Net 2.9.4g , .Net 4.0 Reporter: Digy Priority: Minor Fix For: Lucene.Net 2.9.4 Attachments: MediumTrust.2.9.4.patch, MediumTrust.2.9.4.patch, MediumTrust.2.9.4g.patch On behalf of Richard Wilde: Exceptions in Medium Trust(.NET 4.0) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028680#comment-13028680 ] Uwe Schindler commented on LUCENE-3065: --- This gives us more freedom in future, as we are limit to completely 8 bits, 3 are already used - this only adds 3 more not 4. By the way, for performance reasons all constants should be declared as int not byte, as the byte read from index is already an int. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028689#comment-13028689 ] Michael McCandless commented on LUCENE-3018: Patch works well for me -- I installed cpptasks-1.0b5.jar under lucene/contrib/misc/ant_lib, and was then able to simply ant build-native-unix, which produced the .so under lucene/build/native. I then added lucene/build/native to my LD_LIBRARY_PATH, and ran: {noformat} ant test -lib lucene/build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar -Dtests.directory=org.apache.lucene.store.DirectIOLinuxDirectory {noformat} at the top of the source tree, ie, it runs all unit tests, forcing the dir impl to be DirectIOLinuxDirectory. All tests passed! For grins I tried the first step on OpenSolaris too, and it generated a big number of compilation errors, which seems strange. EG it could not find jni.h on this platform. (I expect a few compilation errors because we are using Linux-only flags, but not that it could not find jni.h)... any ideas? Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch Here the patch with my changes NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028693#comment-13028693 ] Michael McCandless commented on LUCENE-3065: Patch looks great Uwe! Except we need to resolve this Field/Fieldable/AbstractField. Probably we should go and finish LUCENE-2310... NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.1, 3.0.3, 4.0 Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3068: --- Attachment: LUCENE-3068.patch Patch w/ test case showing the problem. If you set slop to 0 for the PhraseQuery, the test passes. The MultiPhraseQuery passes with slop or no slop because it handles the same-position case itself (Union*Enum). That got me thinking... maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery and let it handle the same positions...? Is there any downside to that? The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-3068: --- Assignee: Doron Cohen The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Re: Solr Config XML DTD's
Hi Michael, This looks compelling! I'm also not sure what, specifically, we can validate in Solr's configuration... and I also don't know how much validation we do today. What hard errors does Solr produce on startup when configuration is wrong? I know one challenge is the fact that plugins can reach in and claim attrs/elements, which makes validation more interesting. But we could do something like this: when a plugin claims a certain attr/element, this is recorded. If at the end of loading the config, there are unclaimed attrs/elements, then that's an error. More generally, before we hash out an approach here, I'd like to know if anyone disagree that we should move Solr to more strict error checking of its configuration on startup. I think being silent on configuration errors is the wrong choice... and I think that's generally Solr's approach today (I think? Or do we catch configuration errors w/ a hard error and clear message?). Mike http://blog.mikemccandless.com On Sun, May 1, 2011 at 7:34 PM, Michael Sokolov soko...@ifactory.com wrote: My first post too - but if I can offer a suggestion - there are more modern XML validation technologies available than DTD. I would heartily recommend RelaxNG/Compact notation (see http://relaxng.org/compact-tutorial-20030326.html) - you can generate Relax from a DTD, but it is more expressive, while still being easy on the eyes (uses curly-brace syntax), and much simpler than XML schema. In particular it lets you express wildcard constraints like: start = anyElement anyElement = element * { (attribute * { text } | text | anyElement)* } which matches absolutely anything. I'm not sure what kinds of constraints can actually be applied to solr's configuration in practice? But using a formal constraint language will give decent error reporting out of the box. Java-based tools for Relax validation and conversion are available here: http://code.google.com/p/jing-trang/ -Mike S On 2:59 PM, Michael McCandless wrote: If not a DTD, can we put some more customized form of validation for Solr's configuration? In general, I think servers should be anal on startup, refusing to start if there's anything off in their configuration. (Of course, along with this, the error messaging has to be *excellent* so you know precisely where the problem is, what's wrong, how to fix it). If you take the lenient/forgiving approach then you wind up with Solr instances in unknown states -- the app developer thinks they turned X on, everything starts fine, but then, silently, inexplicably, it's not working. This then leads to frustration, thinking Solr is buggy, not using this feature, blogging about problems, etc. Mike http://blog.mikemccandless.com On Tue, Mar 29, 2011 at 7:15 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Hi, this is my first post to the mailing list. I'm working on a commercial Welcome! : My DTD works for our internal version of queryElevation.xml, but since the : ATTRIB name of thedoc/ tag could be anything, I'm not sure how to write a : DTD that would validate any valid query elevation file. right .. this is one of the reasons why we've never tried to publish a DTD for the solrconfig.xml or schema.xml files either. there are lots of cases where plugins can define arbitrary attributes on the XML nodes. If i had the chance to do it all over again, and i better understood xml back when yonik first showed me what the configs would look like, i would have suggested using xml namespaces .. but that ship kind of sailed a while ago. we're getting a little better -- moving towards using the same type of NamedList backed XML for the initialization anytime new plugins are added, but i don't see it being feasible to have a config DTD anytime soon. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch More refactoring: - Now NumericFields also reproduce the indexed/omitNorms/omitTF settings - only precStep cannot be reproduced - Cut over to int instead of byte, this removes lots of casting in FieldsReader NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch New patch, previous one had a leftover unused constant from Mike's patch. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: (was: LUCENE-3065.patch) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: I was accepted in GSoC!!!
Hi Uwe, do you mean one issue per GSoC proposal, or one for every logical unit in the project? If the second: Robert told me to use the flexscoring branch as a base for my project, since preliminary work has already been done in that branch. Should I open JIRA issues nevertheless? Thanks, David On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote: Hi Vinicius, Submitting patches via JIRA is fine! We were just thinking about possibly providing some SVN to work with (as additional training), but came to the conclusion, that all students should go the standard Apache Lucene way of submitting patches to JIRA issues. You can of course still use SVN / GIT locally to organize your code. At the end we just need a patch to be committed by one of the core committers. Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of id-statements can be replaced by a fast switch. Also some minor comments and a missing 0xFF when casting byte to int. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028707#comment-13028707 ] Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 11:06 AM: This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of if-statements can be replaced by a fast switch. Also some minor comments and a missing 0xFF when casting byte to int. was (Author: thetaphi): This patch adds some refactoring because FieldSelectorResult is an enum since 3.0, so the (slow) queue of id-statements can be replaced by a fast switch. Also some minor comments and a missing 0xFF when casting byte to int. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Bug in boilerpipe 1.1.0 referenced from solr-cell
Solr-cell references boilerpipe 1.1.0 which contains a modified version of nekohtml 1.9.9. It seems that this version of nekohtml is broken in that it references the class LostText without including it. The unmodified release of nekohtml 1.9.9 does not reference or include this class and the latest release, 1.9.14, both references and includes it. As a result, our application has been broken because it independently uses nekohtml and is now finding a broken version of the jar. How should I report this issue as it is not directly a bug in solr? Andrew Le Couteur Bisson Senior Software Engineer GOSS Interactive t: 0844 880 3637 f: 0844 880 3638 e: andrew.bis...@gossinteractive.com w: www.gossinteractive.com http://www.gossinteractive.com/ Have you registered for our e-Newsletter? www.gossinteractive.com/newsletter Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch Next iteration: Reverted changes in Solr (they should come later), Lucene instead uses natively IndexInput and IndexOutput to write/read ints and longs. Solr's changes are completely unrelated. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028718#comment-13028718 ] Uwe Schindler commented on LUCENE-3065: --- Just to note: We also need to change the Forrest index format documentation! NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-3065: - Assignee: Uwe Schindler NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch Moved test to TestFieldsReader NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028727#comment-13028727 ] Robert Muir commented on SOLR-2493: --- this wasn't broken by the lucene commit. this is solr's fault by having a getter that does some heavy duty xml shit I don't think the issue is fixed until these getters that parse xml are removed! SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
Mark, Can you give some more details on your disagreement here...? Are there certain modules from my list that you don't think should be modules? The timeframe (1-2 years) is too optimistic/aggressive? Or you disagree that we should poach from outside projects too...? Or, more generally, you don't think Solr benefits from being opened up / modularized? Mike http://blog.mikemccandless.com On Tue, May 3, 2011 at 1:11 PM, Mark Miller markrmil...@gmail.com wrote: On May 3, 2011, at 12:49 PM, Michael McCandless wrote: Isn't this the future we are working towards? No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time. When I voted to merge, no, I certainly was not thinking, I hope in a year or two we have taken everything from Solr and made it a module. I did it for a few specific things to start - analyzers for sure, perhaps some other things as people did something that made sense. I did it so we could share some code more easily - not all code. Others did it for their own reasons I assume. But no - I'm not sure I have ever fully subscribed to what you are saying. - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028728#comment-13028728 ] Uwe Schindler commented on SOLR-2493: - ...as I said before :-) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028729#comment-13028729 ] Uwe Schindler commented on SOLR-2493: - In my opinion, the correct way to solve this is to make all methods in o.a.solr.core.Config *protected* as they should only be called by subclasses doing the actual parsing. Uwe SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Re: Solr Config XML DTD's
if anyone disagree that we should move Solr to more strict error checking of its configuration on startup. I think being silent on configuration errors is the wrong choice... and I think that's +1 for validation/ warning/ error messages from config files. Excellent link, Michael (Sokolov), I didn't know about this at all. Dawid
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028733#comment-13028733 ] Michael McCandless commented on LUCENE-3065: Patch looks great Uwe! I think we should deprecate Document.getField? And advertise in CHANGES that this is an [intentional] BW break, ie, you can no longer .getField if it's a NumericField (you'll hit CCE, just like you already do for lazy fields)? I think that's the lesser evil here? NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028734#comment-13028734 ] Robert Muir commented on SOLR-2493: --- {quote} In my opinion, the correct way to solve this is to make all methods in o.a.solr.core.Config protected as they should only be called by subclasses doing the actual parsing {quote} +1 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028736#comment-13028736 ] Chris Male commented on SOLR-2493: -- bq. In my opinion, the correct way to solve this is to make all methods in o.a.solr.core.Config protected as they should only be called by subclasses doing the actual parsing. +1 We dont need getters doing parsing available to every component. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028738#comment-13028738 ] Michael McCandless commented on LUCENE-3018: OK, if I change the /linux to /solaris (in the build.xml), then on OpenSolaris I get the expected compilation errors (using the wrong IO flags). Can this somehow be done automagically...? Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On May 4, 2011, at 8:25 AM, Michael McCandless wrote: Mark, Can you give some more details on your disagreement here...? Are there certain modules from my list that you don't think should be modules? The timeframe (1-2 years) is too optimistic/aggressive? Or you disagree that we should poach from outside projects too...? I don't necessarily disagree with your goals - I'm just saying those are not my goals. I think just like minix vs linux (should I mention hurd for stallman?), there are tradeoffs when trying to tackle some of these things modules style vs monolithic style. Yes, an OS is not Lucene/Solr, I'm going for more connotation than anything here. Now, if some people came in and just did things module style in a way that matches the monolithic style (quality, feature wise), and they do that module after module, that is one thing. But I think that is indeed a daunting task, and I think there are a lot of other things to focus on. The end result is not even any guarantee - we seem just as likely to end up with a mess of modules with all kinds of crazy interdependencies. It's really easy to say, yeah, everything should be a module, sounds great, but there are large practical issues there. And from an open source project perspective, it's all even harder to plan. That's why I'm so about case by case. I think poaching compatible license open source code is always okay. Or, more generally, you don't think Solr benefits from being opened up / modularized? I think there would be benefits for many types of modules. And perhaps some downsides for some depending on the developers involved and how long they stay involved, and some of the interdependency issues that seem likely. Overall, I'm not terribly concerned about modules - they are not on my short term priority list (Analyzers would be for sure though, thanks Robert!). On the one hand, you might think, well other Lucene users could take advantage of more of this stuff - and I see that as something kind of nice myself - but they already can use this stuff too - use Solr. So it's just not on the tip of my priority poll. I happily accept others are more concerned about it. To wrap up, like I've said a million times, I'm not against modules. I also just don't share that same long term vision right now I guess. Side note (plug): I have been playing with the benchmark module (who did that module? I had missed it), and I've got some cool stuff to show at Berlin Buzzwords this year for my solr performance talk! Mike http://blog.mikemccandless.com On Tue, May 3, 2011 at 1:11 PM, Mark Miller markrmil...@gmail.com wrote: On May 3, 2011, at 12:49 PM, Michael McCandless wrote: Isn't this the future we are working towards? No, not really. Others perhaps, but not me. I'm on board with some modules. I do think there are tradeoffs when considering them and considering Lucene and Solr. I'm happy to take everything one issue at a time. When I voted to merge, no, I certainly was not thinking, I hope in a year or two we have taken everything from Solr and made it a module. I did it for a few specific things to start - analyzers for sure, perhaps some other things as people did something that made sense. I did it so we could share some code more easily - not all code. Others did it for their own reasons I assume. But no - I'm not sure I have ever fully subscribed to what you are saying. - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, May 4, 2011 at 9:11 AM, Mark Miller markrmil...@gmail.com wrote: Side note (plug): I have been playing with the benchmark module (who did that module? I had missed it), and I've got some cool stuff to show at Berlin Buzzwords this year for my solr performance talk! we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845 We should feel free to make this depend upon solr now (I know we probably have to change some things about the build for that to totally work, but thats the idea). - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: modularization discussion
From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, May 04, 2011 3:30 PM To: dev@lucene.apache.org Subject: Re: modularization discussion On Wed, May 4, 2011 at 9:11 AM, Mark Miller markrmil...@gmail.com wrote: Side note (plug): I have been playing with the benchmark module (who did that module? I had missed it), and I've got some cool stuff to show at Berlin Buzzwords this year for my solr performance talk! we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845 We should feel free to make this depend upon solr now (I know we probably have to change some things about the build for that to totally work, but thats the idea). Hihi, Solr has no performance testing framework, see the issue from today (SOLR-2493). Uwe - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On May 4, 2011, at 9:42 AM, Uwe Schindler wrote: Solr has no performance testing framework, see the issue from today (SOLR-2493). Come to Berlin Buzzwords! (I know you already are :) ) - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, May 4, 2011 at 3:49 PM, Mark Miller markrmil...@gmail.com wrote: On May 4, 2011, at 9:42 AM, Uwe Schindler wrote: Solr has no performance testing framework, see the issue from today (SOLR-2493). Come to Berlin Buzzwords! I think I will come :) simon (I know you already are :) ) - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3069) Lucene should be able to have a entirely memory resident term dictionary
Lucene should be able to have a entirely memory resident term dictionary Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Java Issue Type: Improvement Components: Index, Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3069: Summary: Lucene should have an entirely memory resident term dictionary (was: Lucene should be able to have a entirely memory resident term dictionary) Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Java Issue Type: Improvement Components: Index, Search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028757#comment-13028757 ] Simon Willnauer commented on LUCENE-3018: - bq. OK, if I change the /linux to /solaris (in the build.xml), then on OpenSolaris I get the expected compilation errors (using the wrong IO flags). Can this somehow be done automagically...? kind of quick and dirty but we could simply include {noformat} pathelement location=${java.home}/../include/solaris/ {noformat} so we automatically build on solaris too? simon Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3070) Enable DocValues by default for every Codec
Enable DocValues by default for every Codec --- Key: LUCENE-3070 URL: https://issues.apache.org/jira/browse/LUCENE-3070 Project: Lucene - Java Issue Type: Task Components: Index Affects Versions: CSF branch Reporter: Simon Willnauer Fix For: CSF branch Currently DocValues are enable with a wrapper Codec so each codec which needs DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader should be moved to Codec to be enabled by default. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065.patch I added some javadocs to Document class: - getField() / getFields() is deprecated [we may change this in ] Some thoughts: - maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned. - I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields? The CHANGES entry may also be extended, currently it under bugs - we shold move. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028764#comment-13028764 ] Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 2:44 PM: --- I added some javadocs to Document class: - getField() / getFields() is deprecated [we may change this in LUCENE-2310] Some thoughts: - maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned. - I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields? The CHANGES entry may also be extended, currently it under bugs - we shold move. was (Author: thetaphi): I added some javadocs to Document class: - getField() / getFields() is deprecated [we may change this in ] Some thoughts: - maybe we should make getField()/getFields() simply return null or does not include the Field into the returned array, if its not instanceof Field? We can add that to documentation, that lazy loaded and numerical fields are not returned. - I would also like to add a method Document.getNumericValue(s), that returns Number[] or Number like the NumericField one. Like above getField() it can return null/empty array if the field name has no numeric Fields? The CHANGES entry may also be extended, currently it under bugs - we shold move. NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7718 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7718/ No tests ran. Build Log (for compile errors): [...truncated 81 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7717 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7717/ No tests ran. Build Log (for compile errors): [...truncated 19 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7718 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7718/ No tests ran. Build Log (for compile errors): [...truncated 46 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7719 - Still Failing
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7719/ No tests ran. Build Log (for compile errors): [...truncated 35 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028798#comment-13028798 ] Michael McCandless commented on LUCENE-3018: That sounds good for starters? Just stick /solaris (and others...?) in? Lucene Native Directory implementation need automated build --- Key: LUCENE-3018 URL: https://issues.apache.org/jira/browse/LUCENE-3018 Project: Lucene - Java Issue Type: Wish Components: Build Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Varun Thacker Priority: Minor Fix For: 4.0 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar Currently the native directory impl in contrib/misc require manual action to compile the c code (partially) documented in https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html yet it would be nice if we had an ant task and documentation for all platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed
PathHierarchyTokenizer adaptation for urls: splits reversed --- Key: LUCENE-3071 URL: https://issues.apache.org/jira/browse/LUCENE-3071 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Olivier Favre Priority: Minor {{PathHierarchyTokenizer}} should be usable to split urls the a reversed way (useful for faceted search against urls): {{www.site.com}} - {{www.site.com, site.com, com}} Moreover, it should be able to skip a given number of first (or last, if reversed) tokens: {{/usr/share/doc/somesoftware/INTERESTING/PART}} Should give with 4 tokens skipped: {{INTERESTING}} {{INTERESTING/PART}} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3072) 3.1 fileformats out of date
[ https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3072: Attachment: LUCENE-3072.patch 3.1 fileformats out of date --- Key: LUCENE-3072 URL: https://issues.apache.org/jira/browse/LUCENE-3072 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3072.patch The 3.1 fileformats is missing the change from LUCENE-2811 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3072) 3.1 fileformats out of date
3.1 fileformats out of date --- Key: LUCENE-3072 URL: https://issues.apache.org/jira/browse/LUCENE-3072 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3072.patch The 3.1 fileformats is missing the change from LUCENE-2811 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3072) 3.1 fileformats out of date
[ https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028822#comment-13028822 ] Michael McCandless commented on LUCENE-3072: Looks good Robert, nice catch! 3.1 fileformats out of date --- Key: LUCENE-3072 URL: https://issues.apache.org/jira/browse/LUCENE-3072 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3072.patch The 3.1 fileformats is missing the change from LUCENE-2811 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Heisey updated SOLR-2462: --- Attachment: SOLR-2462_3_1.patch The original patch would not apply cleanly for me against 3.1 without fuzz and whitespace options, and when those are used, it applies incorrectly. Here's a new patch specific to 3.1. Before creating this, I checked 3.1 out from SVN and then applied the patch for SOLR-2469, which should not interfere in any way. Hopefully the patch is suitable. I am only putting it up here for convenience, in case anyone else runs into this. Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Priority: Critical Fix For: 3.1.1, 4.0 Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3072) 3.1 fileformats out of date
[ https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3072. - Resolution: Fixed Fix Version/s: 4.0 3.2 Committed revision 1099529, 1099534 (branch-3x) 3.1 fileformats out of date --- Key: LUCENE-3072 URL: https://issues.apache.org/jira/browse/LUCENE-3072 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3072.patch The 3.1 fileformats is missing the change from LUCENE-2811 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2494) Error in Context.setSessionAttribute implementation (ContextImpl.putVal)
Error in Context.setSessionAttribute implementation (ContextImpl.putVal) Key: SOLR-2494 URL: https://issues.apache.org/jira/browse/SOLR-2494 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 3.1 Reporter: Tom Klonikowski Session attributes are set to SCOPE_ENTITY, if SCOPE_GLOBAL or SCOPE_SOLR_CORE is given, due to an error in org.apache.solr.handler.dataimport.ContextImpl.putVal in line 159 (entitySession.put instead of map.put). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028895#comment-13028895 ] Doron Cohen commented on LUCENE-3068: - bq. specifically when the doc itself has tokens at the same position. I am not convinced yet that there is a bug here - I think the code does allow this? There is another assumption in the code, that any two different PPs are in different TPs - which underlines the assumption that originally each PP differs in position, This seems a valid assumption, because QP will create MFQ if there are two terms in the (phrase) query with same position. bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite to a MultiPhraseQuery and let it handle the same positions...? Is there any downside to that? I think this is the correct behavior - in particular this will be the query that a QP will create. The only way to create a PQ (not MPQ) for PPs in same positions is to create it manually. But why would anyone do that? And they did, wouldn't such a rewrite be a surprise to them? A patch to follow with a revised version of this test - one that uses the QP. In this patch the QP indeed creates an MFQ, and I am yet unable to make it fail. Still trying. The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position
[ https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3068: Attachment: LUCENE-3068.patch Attached modified version of the test - one that invokes the query parser to create an MFQ. The test passes. The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position -- Key: LUCENE-3068 URL: https://issues.apache.org/jira/browse/LUCENE-3068 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.3, 3.1, 4.0 Reporter: Michael McCandless Assignee: Doron Cohen Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3068.patch, LUCENE-3068.patch In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was matching docs that it shouldn't; but I think those changes caused it to fail to match docs that it should, specifically when the doc itself has tokens at the same position. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: LUCENE-3065-solr-only.patch Here a first step in cutover of Solr to NumericField. Most tests work, except: - TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong - TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account). The following changes had to be made: - Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable - TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal()) If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests. Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!! NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065-solr-only.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2495) update noggit json parser
update noggit json parser - Key: SOLR-2495 URL: https://issues.apache.org/jira/browse/SOLR-2495 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1.1 The latest version of noggit has fixes for long overflow detection (only important for numbers that don't fit in a long), and for a bug where corrupted JSON input could lead to an infinite loop. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: jira issues falling off the radar -- Next JIRA version
On May 2, 2011, at 7:54 PM, Chris Hostetter wrote: We should definitely kill of Next ... i would suggest just removing it, and not bulk applying a new version (there is no requirement that issues have a version) Chris, in JIRA, Next has this description: Placeholder for commiters to track issues that are not ready to commit, but seem close enough to being ready to warrant focus before the next feature release Based on that, I think it would be irresponsible to just delete Next because any issues assigned to this version on the basis of that description (like SOLR-2191) is going to be dropped on the floor. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028943#comment-13028943 ] Hoss Man commented on SOLR-2493: bq. this is solr's fault by having a getter that does some heavy duty xml shit. that sounds like some serious buck passing. All of the get methods on the Config class take in xpath expressions -- it should be obvious to any one who uses them that they are going to do xpath parsing. By the looks of it, the SolrConfig constructor was already creating a public final luceneMatchVersion variable in it's constructor (using the xml parsing based COnfig method) it just wasn't getting used by the query parser. bq. In my opinion, the correct way to solve this is to make all methods in o.a.solr.core.Config protected as they should only be called by subclasses doing the actual parsing. I don't see how that would inherently protect us from this kind of mistake. The cause of the problem came from needing public access to a getLuceneVersion type method on SolrConfig (which is a subclass of Config) even if all the methods in COnfig were protected, that could have very easily wound up being implemented like so ... {code} public Version getLuceneVersion() { return super.inefficientProtectedMethod(...) } {code} ...and we would have had the same problem. Bottom line: we just need to be careful about how/when the Config XML parsing methods are used (protected or otherwise) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028954#comment-13028954 ] Robert Muir commented on SOLR-2493: --- {quote} All of the get methods on the Config class take in xpath expressions – it should be obvious to any one who uses them that they are going to do xpath parsing. {quote} How is that obvious? There's definitely no javadoc saying this. In general, if you have an api that contains XYZ and you add a getXYZ() with absolutely no javadocs that behaves as more than a getter, thats a trap. So I still agree with Uwe, it should be protected to prevent problems, also it would be nice if these methods were called *parse*XYZ() instead of *get*XYZ(). Otherwise this is going to continue to happen! SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3073) make compoundfilewriter public
[ https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3073: Attachment: LUCENE-3073.patch make compoundfilewriter public -- Key: LUCENE-3073 URL: https://issues.apache.org/jira/browse/LUCENE-3073 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Priority: Minor Attachments: LUCENE-3073.patch CompoundFileReader is public, but CompoundFileWriter is not. I propose we make it public + @lucene.internal instead (just in case someone else finds themselves wanting to manipulate cfs files) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3073) make compoundfilewriter public
make compoundfilewriter public -- Key: LUCENE-3073 URL: https://issues.apache.org/jira/browse/LUCENE-3073 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Priority: Minor Attachments: LUCENE-3073.patch CompoundFileReader is public, but CompoundFileWriter is not. I propose we make it public + @lucene.internal instead (just in case someone else finds themselves wanting to manipulate cfs files) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028974#comment-13028974 ] Stephane Bailliez commented on SOLR-2493: - The problem is hardly about naming here, it is about correctly using classes when offered the choice. Mistake was made. That's it. We expect committers to be sufficiently knowledgeable about the codebase when committing code. That's true anywhere. You can hardly expect a service ItemService to have methods such as: getItemFromDatabase() or getItemFromServerOnTheOtherSideOfThePlanet() or getItemFromFile() or getItemFromMemory() if there are 4 different implementations of it., you have getItem() and the 4 different implementation do something different internally. I rather actually wonder why the config is not parsed entirely at startup rather than have nodes lying around and cherry-picked. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028975#comment-13028975 ] Uwe Schindler commented on SOLR-2493: - bq. The cause of the problem came from needing public access to a getLuceneVersion type method on SolrConfig (which is a subclass of Config) This is not true. getLuceneVersion is in Config not SolrConfig and its public like all the other getXxx() methods. Version is just a datatype like int/float/String. Thats all. In general the bad thing about the whole config stuff in solr is mixing parsing and value holder. This should theoretically separate classes. So SolrConfig has no parse methods at all. In its ctor it would simply instantiate the ConfigParser (name the class like that) and use it to set the values in SolrConfig. That would be cotrrect design. The good thing with this design: One could instantiate a SolrConfig and populate it programmatically or via a JSON parser or whatever. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028975#comment-13028975 ] Uwe Schindler edited comment on SOLR-2493 at 5/4/11 9:32 PM: - bq. The cause of the problem came from needing public access to a getLuceneVersion type method on SolrConfig (which is a subclass of Config) This is not true. getLuceneVersion is in Config not SolrConfig and its public like all the other getXxx() methods. Version is just a datatype like int/float/String. Thats all. It does not need to be public (like all other getters in Config class). In general the bad thing about the whole config stuff in solr is mixing parsing and value holder. This should theoretically separate classes. So SolrConfig has no parse methods at all. In its ctor it would simply instantiate the ConfigParser (name the class like that) and use it to set the values in SolrConfig. That would be cotrrect design. The good thing with this design: One could instantiate a SolrConfig and populate it programmatically or via a JSON parser or whatever. was (Author: thetaphi): bq. The cause of the problem came from needing public access to a getLuceneVersion type method on SolrConfig (which is a subclass of Config) This is not true. getLuceneVersion is in Config not SolrConfig and its public like all the other getXxx() methods. Version is just a datatype like int/float/String. Thats all. In general the bad thing about the whole config stuff in solr is mixing parsing and value holder. This should theoretically separate classes. So SolrConfig has no parse methods at all. In its ctor it would simply instantiate the ConfigParser (name the class like that) and use it to set the values in SolrConfig. That would be cotrrect design. The good thing with this design: One could instantiate a SolrConfig and populate it programmatically or via a JSON parser or whatever. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see:
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028980#comment-13028980 ] Uwe Schindler commented on SOLR-2493: - bq. I rather actually wonder why the config is not parsed entirely at startup rather than have nodes lying around and cherry-picked. It is mostly and should. The problem here is as noted before: SolrConfig subclasses Config which is only for parsing. SolrConfig should simply sublass Object and simply instantiate a parser on ctor to parse and store all parsed content in itsself. After that the parser is useless and can be freed. This would even free the DTM/DOM staying alive until Solr shuts down. SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028986#comment-13028986 ] Yonik Seeley commented on SOLR-2493: bq. How is that obvious? The signature of these methods might be a tip-off: public double getDouble(String path, double def) One can't be passing a String and have no idea what the string is used for ;-) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Improvements to the maven build
Steve Row, I thought I'd put together a list of interesting differences between the ant build output and the maven build output. Before each build I did a full clean and then after the build I saved a file listing to a text file so that I could diff it. I'm using svn revision 1087373 (March 31st). 1. The ant build invokes a JSP compiler to validate them. The maven build does not. 2. Maven seems(?) to compile more modules' tests than the ant build does. 3. The ant build builds the tools module. The maven build does not. Probably fine it stays this way? 4. Ant doesn't build the benchmark module; maven will by default. A problem for the ant build? 5. The ant build artifacts tend to have a leading apache- in front of them. But the maven artifactId does not have this so the artifacts file names are different, trivially so any way. 6. The ant solr build puts all its final artifacts into the solr/dist directory, the maven build does not--it leaves all of them in their build directory. Not a big deal but maybe there's a way to have the output file go someplace else? Not sure. There were two issues that seemed like clear bugs to me that I fixed with an attached patch. 1. solrj's build directory and compile output directory were the same directory, but that's problematic since building the output jar will result in an error if it sees its own jar file as an input file to its output jar. So I added a classes directory. This will result in a different directory than where the ant builds, though. 2. The dataimporthandler-extras output location was specified such that there was a redundant path: /extras/extras/, so I fixed this. By the way, I think it would be really nice to have a maven build instructions file that is put into place when the get-maven-poms task is run. The file would have the essential instructions, it would explain some of the relevant differences to the ant build (notably output file placement, file name differences), and it would include tips such as how to install the sources and javadoc jar files into your local repo. ~ David Smiley mvnfix.patch Description: mvnfix.patch - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028989#comment-13028989 ] Uwe Schindler commented on SOLR-2493: - We should not again fight here against each other. The problem is fixed, we could release 3.1.1 if we fixed the last slowdown in MultiPhraseQuery. The discussion here is just about how to prevent this. For me as a non-Solr comitter, when I did this code with Robert last year, I was also really confused about the design of Config (and in my opinion this is a wrong design). We should maybe open another issue and separate parsing and value-holding in two spearate classes (SolrConfig and ConfigParser). If we would do this all is solved (see above). SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2495) update noggit json parser
[ https://issues.apache.org/jira/browse/SOLR-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2495. Resolution: Fixed update noggit json parser - Key: SOLR-2495 URL: https://issues.apache.org/jira/browse/SOLR-2495 Project: Solr Issue Type: Bug Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 3.1.1 The latest version of noggit has fixes for long overflow detection (only important for numbers that don't fit in a long), and for a bug where corrupted JSON input could lead to an infinite loop. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly
JSON Update Handler doesn't handle multiple docs properly - Key: SOLR-2496 URL: https://issues.apache.org/jira/browse/SOLR-2496 Project: Solr Issue Type: Bug Components: update Affects Versions: 3.1 Reporter: Neil Hooey The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } The original XML method: add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly
[ https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2496: - Description: The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} was: The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } The original XML method: add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add Issue Type: Improvement (was: Bug) JSON Update Handler doesn't handle multiple docs properly - Key: SOLR-2496 URL: https://issues.apache.org/jira/browse/SOLR-2496 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.1 Reporter: Neil Hooey Labels: json, update Original Estimate: 4h Remaining Estimate: 4h The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly
[ https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2496: - Remaining Estimate: (was: 4h) Original Estimate: (was: 4h) JSON Update Handler doesn't handle multiple docs properly - Key: SOLR-2496 URL: https://issues.apache.org/jira/browse/SOLR-2496 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.1 Reporter: Neil Hooey Labels: json, update The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly
[ https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neil Hooey updated SOLR-2496: - Description: The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} was: The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} JSON Update Handler doesn't handle multiple docs properly - Key: SOLR-2496 URL: https://issues.apache.org/jira/browse/SOLR-2496 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.1 Reporter: Neil Hooey Labels: json, update The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly
[ https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028999#comment-13028999 ] Yonik Seeley commented on SOLR-2496: Yeah, I agree we should be able to add multiple docs w/o having to repeat tags in the same hash/object. I proposed something like what you have, and the original thinking of the current format is in this issue: SOLR-945 JSON Update Handler doesn't handle multiple docs properly - Key: SOLR-2496 URL: https://issues.apache.org/jira/browse/SOLR-2496 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.1 Reporter: Neil Hooey Labels: json, update The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for add. It's cited at this page: http://wiki.apache.org/solr/UpdateJSON Near the text: Here's a simple example of adding more than one document at once: {code} { add: {doc: {id : TestDoc1, title : test1} }, add: {doc: {id : TestDoc2, title : another test} } }' {code} Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON: {code} { add: { doc: [ {id : TestDoc1, title : test1}, {id : TestDoc2, title : another test}, ], }, } {code} The original XML method: {code} add doc field name=idTestDoc1fieldfield name=titletest1/field /doc doc field name=idTestDoc2fieldfield name=titletest2/field/field /doc /add {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029003#comment-13029003 ] Ryan McKinley commented on SOLR-2493: - bq. I was also really confused about the design of Config (and in my opinion this is a wrong design) Like many things in solr/lucene the current design is the product many incremental back-compatible changes -- not a top down view of what it should be. I would love to use 4.0 as a chance to revisit configs and their relationship to xml/validation etc, but that is a load of work with very little glory... SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Improvements to the maven build
Do you want to make a JIRA issue with a patch? This is a good example of a patch that is easy to get committed quickly because it is simple, clear, and understandable. ryan On Wed, May 4, 2011 at 5:41 PM, Smiley, David W. dsmi...@mitre.org wrote: Steve Row, I thought I'd put together a list of interesting differences between the ant build output and the maven build output. Before each build I did a full clean and then after the build I saved a file listing to a text file so that I could diff it. I'm using svn revision 1087373 (March 31st). 1. The ant build invokes a JSP compiler to validate them. The maven build does not. 2. Maven seems(?) to compile more modules' tests than the ant build does. 3. The ant build builds the tools module. The maven build does not. Probably fine it stays this way? 4. Ant doesn't build the benchmark module; maven will by default. A problem for the ant build? 5. The ant build artifacts tend to have a leading apache- in front of them. But the maven artifactId does not have this so the artifacts file names are different, trivially so any way. 6. The ant solr build puts all its final artifacts into the solr/dist directory, the maven build does not--it leaves all of them in their build directory. Not a big deal but maybe there's a way to have the output file go someplace else? Not sure. There were two issues that seemed like clear bugs to me that I fixed with an attached patch. 1. solrj's build directory and compile output directory were the same directory, but that's problematic since building the output jar will result in an error if it sees its own jar file as an input file to its output jar. So I added a classes directory. This will result in a different directory than where the ant builds, though. 2. The dataimporthandler-extras output location was specified such that there was a redundant path: /extras/extras/, so I fixed this. By the way, I think it would be really nice to have a maven build instructions file that is put into place when the get-maven-poms task is run. The file would have the essential instructions, it would explain some of the relevant differences to the ant build (notably output file placement, file name differences), and it would include tips such as how to install the sources and javadoc jar files into your local repo. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065
Move Solr to new NumericField stored field impl of LUCENE-3065 -- Key: SOLR-2497 URL: https://issues.apache.org/jira/browse/SOLR-2497 Project: Solr Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2, 4.0 This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field - Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065
[ https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2497: Attachment: SOLR-2497.patch Patch applies to 3.2 branch only and needs the patch from LUCENE-3065 applied before: Here a first step in cutover of Solr to NumericField. Most tests work, except: - TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong - TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account). The following changes had to be made: - Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable - TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal()) If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests. Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!! Move Solr to new NumericField stored field impl of LUCENE-3065 -- Key: SOLR-2497 URL: https://issues.apache.org/jira/browse/SOLR-2497 Project: Solr Issue Type: Improvement Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.2, 4.0 Attachments: SOLR-2497.patch This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField Co would use NumericField for indexing and reading stored fields. To enable this some missing changes in Solr's internals (Field - Fieldable) need to be done. Also some backwards compatible stored fields parsing is needed to read pre-3.2 indexes without reindexing (as the format changed a little bit and Document.getFieldable returns NumericField instances now). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Comment: was deleted (was: Here a first step in cutover of Solr to NumericField. Most tests work, except: - TestDistributedSearch, fails with a strange date problem - I have no idea what goes wrong - TestMoreLikeThis: fails because the returned documents are different than expected. The reason for this is simple: As TrieField's underlying Lucene fields now are NumericField, stringValue() returns something (in contrast, solr's old fields returned null because they were binary). This confuses maybe MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). Maybe we should simply exclude those fields or fix the test (I prefer latter one, because the numerics should also taken into account). The following changes had to be made: - Cut over all places in Solr where Field instead of abstract Fieldable is used to Fieldable. This affects some leftover parts in various components (calling Document.getField instead of Document.getFieldable), but mainly SchemaField/FieldType: createField() now returns Fieldable - TrieDateField code duplication was removed, all methods delegate to a wrapped TrieField. There was also an inconsitency between TrieField and TrieDateField's toExternal(). This was fixed to work correct (the date format was wrong, now it uses dateField.toExternal()) If somebody could help with the rest of the solr stuff and maybe test test test! Yonik? Ryan? There may be some itches not covered by tests. Thanks for help from Solr specialists (I am definitely not one, I am more afraid of the code than I can help)!!!) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029011#comment-13029011 ] Uwe Schindler commented on LUCENE-3065: --- I started a new issue in Solr for the changes there: SOLR-2497 NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Attachment: (was: LUCENE-3065-solr-only.patch) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)
[ https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3065: -- Issue Type: Improvement (was: Bug) NumericField should be stored in binary format in index (matching Solr's format) Key: LUCENE-3065 URL: https://issues.apache.org/jira/browse/LUCENE-3065 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Uwe Schindler Priority: Minor Fix For: 3.2, 4.0 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch (Spinoff of LUCENE-3001) Today when writing stored fields we don't record that the field was a NumericField, and so at IndexReader time you get back an ordinary Field and your number has turned into a string. See https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972 We have spare bits already in stored fields, so, we should use one to record that the field is numeric, and then encode the numeric field in Solr's more-compact binary format. A nice side-effect is we fix the long standing issue that you don't get a NumericField back when loading your document. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.
[ https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029018#comment-13029018 ] Uwe Schindler commented on SOLR-2493: - Ryan: I agree, this is why I always bring this up. With 4.0 we can reimplement APIs. On the other hand: I thought Solr's backwards policy is about public HTTP-REST-APIs, why care on implementation details behind, why do we need to keep backwards? This is just a dumb question I never understood. As long as Solr behaves identical to the outside who cares if we change method signatures/class names? SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit. - Key: SOLR-2493 URL: https://issues.apache.org/jira/browse/SOLR-2493 Project: Solr Issue Type: Bug Components: search Affects Versions: 3.1 Reporter: Stephane Bailliez Assignee: Uwe Schindler Priority: Blocker Labels: core, parser, performance, request, solr Fix For: 3.1.1, 3.2, 4.0 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch I' m putting this as blocker as I think this is a serious issue that should be adressed asap with a release. With the current code this is no way near suitable for production use. For each instance created SolrQueryParser calls getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, Version.LUCENE_24) instead of using getSchema().getSolrConfig().luceneMatchVersion This creates a massive performance hit. For each request, there is generally 3 query parsers created and each of them will parse the xml node in config which involve creating an instance of XPath and behind the scene the usual factory finder pattern quicks in within the xml parser and does a loadClass. The stack is typically: at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131) at com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101) at com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135) at com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201) at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275) at org.apache.solr.core.Config.getNode(Config.java:230) at org.apache.solr.core.Config.getVal(Config.java:256) at org.apache.solr.core.Config.getLuceneVersion(Config.java:325) at org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76) at org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277) With the current 3.1 code, I do barely 250 qps with 16 concurrent users with a near empty index. Switching SolrQueryParser to use getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, performance become reasonable beyond 2000 qps. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Improvements to the maven build
Hi David, I thought I'd put together a list of interesting differences between the ant build output and the maven build output. Before each build I did a full clean and then after the build I saved a file listing to a text file so that I could diff it. Cool! Thanks for the effort. 1. The ant build invokes a JSP compiler to validate them. The maven build does not. My take on the Maven build is that it should ensure the POMs are in sync with the artifacts produced. I don't think the Maven build needs to perform all of the validity checks that the Ant build performs (e.g. also javadocs). I'd like to keep the build simple, so that maintenance is easier and keeping the POMs in sync is thereby more likely. 2. Maven seems(?) to compile more modules' tests than the ant build does. Not sure why this would be, but I don't think it's necessarily a problem (for the Maven build). 3. The ant build builds the tools module. The maven build does not. Probably fine it stays this way? In general, code under tools is used to generate other code, the results of which is kept checked in. So I agree, the Maven build doesn't need to support this, KISS principlishly. 4. Ant doesn't build the benchmark module; maven will by default. A problem for the ant build? I don't think it's necessary to enable this by default in Ant. For the Maven build, it's just simpler to include every module - that way, every module's POM is checked. 5. The ant build artifacts tend to have a leading apache- in front of them. But the maven artifactId does not have this so the artifacts file names are different, trivially so any way. This different naming of artifacts has always been so, as far as I can tell, as long as Solr has released Maven artifacts. The conventional Maven artifact name template is artifactId-version[-classifier].jar, where -classifier is optional; artifactId would have to be apache-solr in order to make this change. The full artifact name would be org.apache.solr:apache-solr:version, which seems weird to me in the double inclusion of maven. I'm guessing this also seemed weird to whoever first put the Maven artifact naming scheme in place. Specifying finalName in the maven-jar-plugin might alternatively do the trick? If this were done, though, the Maven naming convention would not be followed, and that's fairly predictably an omen of Bad Things To Come. (BTW, Lucene's Maven artifacts are the same as the regular ones - this is a Solr-only issue.) 6. The ant solr build puts all its final artifacts into the solr/dist directory, the maven build does not--it leaves all of them in their build directory. Not a big deal but maybe there's a way to have the output file go someplace else? Not sure. I meant to keep the Maven build output location the same as the Ant build output location. I think the Solr modules' POMs can and should be changed to eliminate this difference. There were two issues that seemed like clear bugs to me that I fixed with an attached patch. 1. solrj's build directory and compile output directory were the same directory, but that's problematic since building the output jar will result in an error if it sees its own jar file as an input file to its output jar. So I added a classes directory. This will result in a different directory than where the ant builds, though. 2. The dataimporthandler-extras output location was specified such that there was a redundant path: /extras/extras/, so I fixed this. Thanks, I agree with these changes. I committed your patch. By the way, I think it would be really nice to have a maven build instructions file that is put into place when the get-maven-poms task is run. The file would have the essential instructions, it would explain some of the relevant differences to the ant build (notably output file placement, file name differences), and it would include tips such as how to install the sources and javadoc jar files into your local repo. +1. I've hesitated to include these instructions with other build instructions, since it might confuse users into thinking that the Maven build is officially supported. (It's not. Ant is the only official build.) Steve
RE: Improvements to the maven build
Hi Ryan, Do you want to make a JIRA issue with a patch? This is a good example of a patch that is easy to get committed quickly because it is simple, clear, and understandable. Earlier today on #lucene IRC, David described the changes he had in mind, and asked me where to put the patch, and I told him that if the patch was small, the mailing list might make sense. I also told him that JIRA issues are generally a good idea, but that for the officially-non-official Maven stuff, I haven't been using JIRA, since it seemed to me like the attendant noise to signal ratio would be too high. (Maven appears to be appreciated by Lucene/Solr users way more than the devs, and the users don't generally follow JIRA.) That said, I'm open to being convinced otherwise. Steve On Wed, May 4, 2011 at 5:41 PM, Smiley, David W. dsmi...@mitre.org wrote: Steve Row, I thought I'd put together a list of interesting differences between the ant build output and the maven build output. Before each build I did a full clean and then after the build I saved a file listing to a text file so that I could diff it. I'm using svn revision 1087373 (March 31st). 1. The ant build invokes a JSP compiler to validate them. The maven build does not. 2. Maven seems(?) to compile more modules' tests than the ant build does. 3. The ant build builds the tools module. The maven build does not. Probably fine it stays this way? 4. Ant doesn't build the benchmark module; maven will by default. A problem for the ant build? 5. The ant build artifacts tend to have a leading apache- in front of them. But the maven artifactId does not have this so the artifacts file names are different, trivially so any way. 6. The ant solr build puts all its final artifacts into the solr/dist directory, the maven build does not--it leaves all of them in their build directory. Not a big deal but maybe there's a way to have the output file go someplace else? Not sure. There were two issues that seemed like clear bugs to me that I fixed with an attached patch. 1. solrj's build directory and compile output directory were the same directory, but that's problematic since building the output jar will result in an error if it sees its own jar file as an input file to its output jar. So I added a classes directory. This will result in a different directory than where the ant builds, though. 2. The dataimporthandler-extras output location was specified such that there was a redundant path: /extras/extras/, so I fixed this. By the way, I think it would be really nice to have a maven build instructions file that is put into place when the get-maven-poms task is run. The file would have the essential instructions, it would explain some of the relevant differences to the ant build (notably output file placement, file name differences), and it would include tips such as how to install the sources and javadoc jar files into your local repo. ~ David Smiley