[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Sami Siren (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028620#comment-13028620
 ] 

Sami Siren commented on SOLR-2493:
--

Trunk version of Solr has this same problem too, I just timed comparable 
difference in req/sec when caching the Version vs the current implementation.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Priority: Blocker
  Labels: core, parser, performance, request, solr

 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2481) Add support for commitWithin in DataImportHandler

2011-05-04 Thread Sami Siren (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren updated SOLR-2481:
-

Component/s: contrib - DataImportHandler

 Add support for commitWithin in DataImportHandler
 -

 Key: SOLR-2481
 URL: https://issues.apache.org/jira/browse/SOLR-2481
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Reporter: Sami Siren
Priority: Trivial
 Attachments: SOLR-2481.patch


 It looks like DataImportHandler does not support commitWithin. Would be nice 
 if it did.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-2493:
---

Assignee: Uwe Schindler

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr

 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2493:


Attachment: SOLR-2493.patch

Patch for trunk, 3.x/3.1 is similar, will attach after merge.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2493:


Fix Version/s: 4.0
   3.2
   3.1.1

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2493:


Attachment: SOLR-2493-3.x.patch

Patch for 3.x and 3.1 branch.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028642#comment-13028642
 ] 

Uwe Schindler commented on SOLR-2493:
-

I also reviewed other places where luceneMatchVersion is used, all other places 
are correct (SpellChecker...).

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2493:


Attachment: SOLR-2493-3.x.patch

Here the final 3.x patch (the prev one was incomplete)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved SOLR-2493.
-

Resolution: Fixed

Committed trunk revision: 1099340
Merged 3.x revision: 1099347
Merged 3.1 branch revision: 1099349

You can fix this in you local installation by using the latest 3.1 stable 
branch, if you can't wait for 3.1.1 :-)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2493:


Attachment: (was: SOLR-2493-3.x.patch)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028649#comment-13028649
 ] 

Simon Willnauer commented on LUCENE-3018:
-

hey varun

sorry for the long delay! I have a couple of comments for you:

* I think we should default the compiler to whatever ant-conrtib uses as the 
default so we can remove the -Dcompilername 
* -Dbuild64 is an option that only works on x86-64 architectures so I think we 
can remove that too entirely.
* we are going to commit the cpptasks jar file into the ant_lib directory so it 
comes with the checkout meaning you can remove the line in the overview.html 
file saying that you need to place the jar there.
* the overview should say cd lucene/contrib/misc/ instead of cd 
lucene/dev/trunk/lucene/contrib/misc/ same is true for ... will be located in 
the lucene/dev/trunk/lucene/build/native/

simon

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



33 Days left to Berlin Buzzwords 2011

2011-05-04 Thread Simon Willnauer
hey folks,

BerlinBuzzwords 2011 is close only 33 days left until the big Search,
Store and Scale opensource crowd is gathering
in Berlin on June 6th/7th.

The conference again focuses on the topics search,
data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin.

We are looking forward to two awesome keynote speakers who shaped the world of
open source data analysis: Doug Cutting, founder of Apache Lucene and
Hadoop) as
well as Ted Dunning (Chief Application Architect at MapR Technologies
and active
developer at Apache Hadoop and Mahout).

We are amazed by the amount and quality of the talk submissions we
got. As a result
this year we have added one more track to the main conference. If you haven't
done so already, make sure to book your ticket now - early bird tickets are
already sold out since April 7th and there might not be many tickets left.

As we would like to give visitors of our main conference a reason to stay in
town for the whole week, we have been talking to local co-working spaces and
companies asking them for free space and WiFi to host Hackathons right after the
main conference - that is on June 8th through 10th.

If you would like to gather with fellow developers and users of your project,
fix bugs together, hack on new features or give users a hands-on introduction to
your tools, please submit your workshop proposal to our wiki:

http://berlinbuzzwords.de/node/428

Please note that slots are assigned on a first come first serve basis. We are
doing our best to get you connected, however space is limited.

The deal is simple: We get you in touch with a conference room provider. Your
event gets promoted in our schedule. Co-Ordination however is completely up to
you: Make sure to provide an interesting abstract, provide a Hackathon
registration area - see the Barcamp page for a good example:

http://berlinbuzzwords.de/wiki/barcamp

Attending Hackathons requires a Berlin Buzzwords ticket and (then free)
registration at the Hackathon in question.

Hope I see you all around in Berlin,

Simon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: I was accepted in GSoC!!!

2011-05-04 Thread Uwe Schindler
Hi Vinicius,

 

Submitting patches via JIRA is fine! We were just thinking about possibly
providing some SVN to work with (as additional training), but came to the
conclusion, that all students should go the standard Apache Lucene way of
submitting patches to JIRA issues. You can of course still use SVN / GIT
locally to organize your code. At the end we just need a patch to be
committed by one of the core committers.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] 
Sent: Wednesday, May 04, 2011 6:23 AM
To: dev@lucene.apache.org
Subject: RE: I was accepted in GSoC!!!

 


Hi Uwe,

Sorry, I only saw your email today, busy lately with college homeworks.

I was planning to submit patches to Lucene (through JIRA/email?). Do you
have something else in mind?

Regards,
Vinicius Barros

--- Em dom, 1/5/11, Uwe Schindler u...@thetaphi.de escreveu:


De: Uwe Schindler u...@thetaphi.de
Assunto: RE: I was accepted in GSoC!!!
Para: dev@lucene.apache.org
Data: Domingo, 1 de Maio de 2011, 7:36

Welcome Vinicius,

 

I am glad to hear that you (my mentee) are one of the 5 students that are
working for Apache Lucene/Solr this year. Until the coding officially
starts, we should also sort out the infrastructure things like where to put
the code and make a plan how to start. We should keep in close contact.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] 
Sent: Sunday, May 01, 2011 3:18 AM
To: dev@lucene.apache.org; uschind...@apache.org
Subject: I was accepted in GSoC!!!

 



Hi,

 

That's great, I am waiting next instructions from google, it seems there is
some paperwork to do.

 

Regards,
Vinicius Barros

--- Em seg, 25/4/11, no-re...@socghop.appspotmail.com
no-re...@socghop.appspotmail.com escreveu:


De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com
Assunto: Congratulations!
Para: viniciusbarros.g...@yahoo.com.br
Data: Segunda-feira, 25 de Abril de 2011, 15:48

Dear Vinicius, 

Congratulations! Your proposal LUCENE-1768: NumericRange support for new
query parser as submitted to Apache Software Foundation has been accepted
for Google Summer of Code 2011. Over the next few days, we will add you to
the private Google Summer of Code Student Discussion List. Over the next few
weeks, we will send instructions to this list regarding turn in proof of
enrollment, tax forms, etc. 

Now that you've been accepted, please take the opportunity to speak with
your mentors about plans for the Community Bonding Period: what
documentation should you be reading, what version control system will you
need to set up, etc., before start of coding begins on May 23rd. 

Welcome to Google Summer of Code 2011! We look forward to having you with
us. 

With best regards,
The Google Summer of Code Program Administration Team 

 

 



[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-3018:
--

Attachment: LUCENE-3018.patch

I have made the changes mentioned above. 

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028677#comment-13028677
 ] 

Uwe Schindler commented on LUCENE-3065:
---

Mike:
I reviewed the patch again: You are currently using 3 bits already. 1 bit is 
solely for detecting numerics, the other two are the type.

In my opinion, to check if its a numeric field, use a MASK of 3 bits and check 
for !=0. As soon as any bit in this mask is set, its numeric. The actual 
numeric fields have values !=0:

{code}
private static final int _NUMERIC_BIT_SHIFT = 3;
static final byte FIELD_IS_NUMERIC_MASK = 0x07  _NUMERIC_BIT_SHIFT;

static final byte FIELD_IS_NUMERIC_INT = 1  _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_LONG = 2  _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_FLOAT = 3  _NUMERIC_BIT_SHIFT;
static final byte FIELD_IS_NUMERIC_DOUBLE = 4  _NUMERIC_BIT_SHIFT;
// unused: static final byte FIELD_IS_NUMERIC_SHORT = 5  _NUMERIC_BIT_SHIFT;
// unused: static final byte FIELD_IS_NUMERIC_BYTE = 6  _NUMERIC_BIT_SHIFT;
// and we have still one more over :-)  7  _NUMERIC_BIT_SHIFT

// check if field is numeric:
if ((bits  FIELD_IS_NUMERIC_MASK) != 0) {}

// parse type:
switch (bits  FIELD_IS_NUMERIC_MASK) {
  case FIELD_IS_NUMERIC_INT: ...
}
{code}

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Updated] (LUCENENET-413) Medium trust security issue

2011-05-04 Thread Digy (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy updated LUCENENET-413:
---

Attachment: MediumTrust.2.9.4.patch

constants.cs fix added into patch

  Medium trust security issue 
 -

 Key: LUCENENET-413
 URL: https://issues.apache.org/jira/browse/LUCENENET-413
 Project: Lucene.Net
  Issue Type: Improvement
Affects Versions: Lucene.Net 2.9.4
 Environment: Lucene.Net 2.9.4, Lucene.Net 2.9.4g , .Net 4.0
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4

 Attachments: MediumTrust.2.9.4.patch, MediumTrust.2.9.4.patch, 
 MediumTrust.2.9.4g.patch


 On behalf of Richard Wilde:
 Exceptions in Medium Trust(.NET 4.0)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028680#comment-13028680
 ] 

Uwe Schindler commented on LUCENE-3065:
---

This gives us more freedom in future, as we are limit to completely 8 bits, 3 
are already used - this only adds 3 more not 4.

By the way, for performance reasons all constants should be declared as int not 
byte, as the byte read from index is already an int.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028689#comment-13028689
 ] 

Michael McCandless commented on LUCENE-3018:


Patch works well for me -- I installed cpptasks-1.0b5.jar under 
lucene/contrib/misc/ant_lib, and was then able to simply ant 
build-native-unix, which produced the .so under lucene/build/native.

I then added lucene/build/native to my LD_LIBRARY_PATH, and ran:
{noformat}
ant test -lib lucene/build/contrib/misc/lucene-misc-4.0-SNAPSHOT.jar 
-Dtests.directory=org.apache.lucene.store.DirectIOLinuxDirectory
{noformat}

at the top of the source tree, ie, it runs all unit tests, forcing the dir impl 
to be DirectIOLinuxDirectory.  All tests passed!

For grins I tried the first step on OpenSolaris too, and it generated a big 
number of compilation errors, which seems strange.  EG it could not find jni.h 
on this platform.  (I expect a few compilation errors because we are using 
Linux-only flags, but not that it could not find jni.h)... any ideas?

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

Here the patch with my changes

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028693#comment-13028693
 ] 

Michael McCandless commented on LUCENE-3065:


Patch looks great Uwe!  Except we need to resolve this 
Field/Fieldable/AbstractField.  Probably we should go and finish LUCENE-2310...

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Michael McCandless (JIRA)
The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
same position
--

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.1, 3.0.3, 4.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0


In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
matching docs that it shouldn't; but I think those changes caused it
to fail to match docs that it should, specifically when the doc itself
has tokens at the same position.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3068:
---

Attachment: LUCENE-3068.patch

Patch w/ test case showing the problem.

If you set slop to 0 for the PhraseQuery, the test passes.  The 
MultiPhraseQuery passes with slop or no slop because it handles the 
same-position case itself (Union*Enum).

That got me thinking... maybe any time a *PhraseQuery has overlapping 
positions, we should rewrite to a MultiPhraseQuery and let it handle the same 
positions...?  Is there any downside to that?

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3068:
---

Assignee: Doron Cohen

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Re: Solr Config XML DTD's

2011-05-04 Thread Michael McCandless
Hi Michael,

This looks compelling!  I'm also not sure what, specifically, we can
validate in Solr's configuration... and I also don't know how much
validation we do today.  What hard errors does Solr produce on startup
when configuration is wrong?

I know one challenge is the fact that plugins can reach in and claim
attrs/elements, which makes validation more interesting.  But we could
do something like this: when a plugin claims a certain attr/element,
this is recorded.  If at the end of loading the config, there are
unclaimed attrs/elements, then that's an error.

More generally, before we hash out an approach here, I'd like to know
if anyone disagree that we should move Solr to more strict error
checking of its configuration on startup.  I think being silent on
configuration errors is the wrong choice... and I think that's
generally Solr's approach today (I think?  Or do we catch
configuration errors w/ a hard error and clear message?).

Mike

http://blog.mikemccandless.com

On Sun, May 1, 2011 at 7:34 PM, Michael Sokolov soko...@ifactory.com wrote:
 My first post too - but if I can offer a suggestion - there are more modern
 XML validation technologies available than DTD.  I would heartily recommend
 RelaxNG/Compact notation (see
 http://relaxng.org/compact-tutorial-20030326.html) - you can generate Relax
 from a DTD, but it is more expressive, while still being easy on the eyes
 (uses curly-brace syntax), and much simpler than XML schema.

 In particular it lets you express wildcard constraints like:

 start = anyElement
 anyElement =
  element * {
    (attribute * { text }
     | text
     | anyElement)*
  }

 which matches absolutely anything.

 I'm not sure what kinds of constraints can actually be applied to solr's
 configuration in practice?

 But using a formal constraint language will give decent error reporting out
 of the box.

 Java-based tools for Relax validation and conversion are available here:
 http://code.google.com/p/jing-trang/

 -Mike S

 On 2:59 PM, Michael McCandless wrote:

 If not a DTD, can we put some more customized form of validation for
 Solr's configuration?

 In general, I think servers should be anal on startup, refusing to
 start if there's anything off in their configuration.

 (Of course, along with this, the error messaging has to be *excellent*
 so you know precisely where the problem is, what's wrong, how to fix
 it).

 If you take the lenient/forgiving approach then you wind up with Solr
 instances in unknown states -- the app developer thinks they turned X
 on, everything starts fine, but then, silently, inexplicably, it's not
 working.  This then leads to frustration, thinking Solr is buggy, not
 using this feature, blogging about problems, etc.

 Mike

 http://blog.mikemccandless.com

 On Tue, Mar 29, 2011 at 7:15 PM, Chris Hostetter
 hossman_luc...@fucit.org  wrote:

 : Hi, this is my first post to the mailing list.  I'm working on a
 commercial

 Welcome!

 : My DTD works for our internal version of queryElevation.xml, but since
 the
 : ATTRIB name of thedoc/  tag could be anything, I'm not sure how to
 write a
 : DTD that would validate any valid query elevation file.

 right .. this is one of the reasons why we've never tried to publish a
 DTD
 for the solrconfig.xml or schema.xml files either.  there are lots of
 cases where plugins can define arbitrary attributes on the XML nodes.

 If i had the chance to do it all over again, and i better understood xml
 back when yonik first showed me what the configs would look like, i would
 have suggested using xml namespaces .. but that ship kind of sailed a
 while ago.

 we're getting a little better -- moving towards using the same type of
 NamedList backed XML for the initialization anytime new plugins are
 added, but i don't see it being feasible to have a config DTD anytime
 soon.

 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

More refactoring:
- Now NumericFields also reproduce the indexed/omitNorms/omitTF settings - only 
precStep cannot be reproduced
- Cut over to int instead of byte, this removes lots of casting in FieldsReader

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

New patch, previous one had a leftover unused constant from Mike's patch.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: (was: LUCENE-3065.patch)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: I was accepted in GSoC!!!

2011-05-04 Thread David Nemeskey
Hi Uwe,

do you mean one issue per GSoC proposal, or one for every logical unit in 
the project?

If the second: Robert told me to use the flexscoring branch as a base for my 
project, since preliminary work has already been done in that branch. Should I 
open JIRA issues nevertheless?

Thanks,
David

On 2011 May 04, Wednesday 09:56:02 Uwe Schindler wrote:
 Hi Vinicius,
 
 Submitting patches via JIRA is fine! We were just thinking about possibly
 providing some SVN to work with (as additional training), but came to the
 conclusion, that all students should go the standard Apache Lucene way of
 submitting patches to JIRA issues. You can of course still use SVN / GIT
 locally to organize your code. At the end we just need a patch to be
 committed by one of the core committers.

Uwe

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

This patch adds some refactoring because FieldSelectorResult is an enum since 
3.0, so the (slow) queue of id-statements can be replaced by a fast switch.

Also some minor comments and a missing  0xFF when casting byte to int.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028707#comment-13028707
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 11:06 AM:


This patch adds some refactoring because FieldSelectorResult is an enum since 
3.0, so the (slow) queue of if-statements can be replaced by a fast switch.

Also some minor comments and a missing  0xFF when casting byte to int.

  was (Author: thetaphi):
This patch adds some refactoring because FieldSelectorResult is an enum 
since 3.0, so the (slow) queue of id-statements can be replaced by a fast 
switch.

Also some minor comments and a missing  0xFF when casting byte to int.
  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Bug in boilerpipe 1.1.0 referenced from solr-cell

2011-05-04 Thread Andrew Bisson
Solr-cell references boilerpipe 1.1.0 which contains a modified version
of nekohtml 1.9.9.

It seems that this version of nekohtml is broken in that it references
the class LostText without including it.



The unmodified release of nekohtml 1.9.9 does not reference or include
this class and the latest release, 1.9.14, both references and includes
it.



As a result, our application has been broken because it independently
uses nekohtml and is now finding a broken version of the jar.



How should I report this issue as it is not directly a bug in solr?

Andrew Le Couteur Bisson

Senior Software Engineer

GOSS Interactive


t:  0844 880 3637

f:  0844 880 3638
e:  andrew.bis...@gossinteractive.com
w: www.gossinteractive.com http://www.gossinteractive.com/





Have you registered for our e-Newsletter? www.gossinteractive.com/newsletter

Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, 
Plymouth, PL1 1LG. Company Registration No: 3553908

This email contains proprietary information, some or all of which may be 
legally privileged. It is for the intended recipient only. If an addressing or 
transmission error has misdirected this email, please notify the author by 
replying to this email. If you are not the intended recipient you may not use, 
disclose, distribute, copy, print or rely on this email.

Email transmission cannot be guaranteed to be secure or error free, as 
information may be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete or contain viruses. This email and any files attached to it have 
been checked with virus detection software before transmission. You should 
nonetheless carry out your own virus check before opening any attachment. GOSS 
Interactive Ltd accepts no liability for any loss or damage that may be caused 
by software viruses




[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

Next iteration:

Reverted changes in Solr (they should come later), Lucene instead uses natively 
IndexInput and IndexOutput to write/read ints and longs.

Solr's changes are completely unrelated.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028718#comment-13028718
 ] 

Uwe Schindler commented on LUCENE-3065:
---

Just to note: We also need to change the Forrest index format documentation!

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-3065:
-

Assignee: Uwe Schindler

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

Moved test to TestFieldsReader

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028727#comment-13028727
 ] 

Robert Muir commented on SOLR-2493:
---

this wasn't broken by the lucene commit.

this is solr's fault by having a getter that does some heavy duty xml shit 
I don't think the issue is fixed until these getters that parse xml are 
removed!


 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-05-04 Thread Michael McCandless
Mark,

Can you give some more details on your disagreement here...?

Are there certain modules from my list that you don't think should be
modules?  The timeframe (1-2 years) is too optimistic/aggressive?  Or
you disagree that we should poach from outside projects too...?

Or, more generally, you don't think Solr benefits from being opened up
/ modularized?

Mike

http://blog.mikemccandless.com

On Tue, May 3, 2011 at 1:11 PM, Mark Miller markrmil...@gmail.com wrote:

 On May 3, 2011, at 12:49 PM, Michael McCandless wrote:

 Isn't this the future we are working towards?

 No, not really. Others perhaps, but not me. I'm on board with some modules. I 
 do think there are tradeoffs when considering them and considering Lucene and 
 Solr. I'm happy to take everything one issue at a time.

 When I voted to merge, no, I certainly was not thinking, I hope in a year or 
 two we have taken everything from Solr and made it a module. I did it for a 
 few specific things to start - analyzers for sure, perhaps some other things 
 as people did something that made sense. I did it so we could share some code 
 more easily - not all code.

 Others did it for their own reasons I assume.

 But no - I'm not sure I have ever fully subscribed to what you are saying.

 - Mark Miller
 lucidimagination.com

 Lucene/Solr User Conference
 May 25-26, San Francisco
 www.lucenerevolution.org






 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028728#comment-13028728
 ] 

Uwe Schindler commented on SOLR-2493:
-

...as I said before :-)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028729#comment-13028729
 ] 

Uwe Schindler commented on SOLR-2493:
-

In my opinion, the correct way to solve this is to make all methods in 
o.a.solr.core.Config *protected* as they should only be called by subclasses 
doing the actual parsing.

Uwe

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Re: Solr Config XML DTD's

2011-05-04 Thread Dawid Weiss

 if anyone disagree that we should move Solr to more strict error
 checking of its configuration on startup.  I think being silent on
 configuration errors is the wrong choice... and I think that's


+1 for validation/ warning/ error messages from config files. Excellent
link, Michael (Sokolov), I didn't know about this at all.

Dawid


[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028733#comment-13028733
 ] 

Michael McCandless commented on LUCENE-3065:


Patch looks great Uwe!

I think we should deprecate Document.getField?  And advertise in CHANGES that 
this is an [intentional] BW break, ie, you can no longer .getField if it's a 
NumericField (you'll hit CCE, just like you already do for lazy fields)?  I 
think that's the lesser evil here?

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028734#comment-13028734
 ] 

Robert Muir commented on SOLR-2493:
---

{quote}
In my opinion, the correct way to solve this is to make all methods in 
o.a.solr.core.Config protected as they should only be called by subclasses 
doing the actual parsing
{quote}

+1

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028736#comment-13028736
 ] 

Chris Male commented on SOLR-2493:
--

bq. In my opinion, the correct way to solve this is to make all methods in 
o.a.solr.core.Config protected as they should only be called by subclasses 
doing the actual parsing.

+1

We dont need getters doing parsing available to every component.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028738#comment-13028738
 ] 

Michael McCandless commented on LUCENE-3018:


OK, if I change the /linux to /solaris (in the build.xml), then on OpenSolaris 
I get the expected compilation errors (using the wrong IO flags).  Can this 
somehow be done automagically...?

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-05-04 Thread Mark Miller

On May 4, 2011, at 8:25 AM, Michael McCandless wrote:

 Mark,
 
 Can you give some more details on your disagreement here...?
 
 Are there certain modules from my list that you don't think should be
 modules?  The timeframe (1-2 years) is too optimistic/aggressive?  Or
 you disagree that we should poach from outside projects too...?

I don't necessarily disagree with your goals - I'm just saying those are not my 
goals. 

I think just like minix vs linux (should I mention hurd for stallman?), there 
are tradeoffs when trying to tackle some of these things modules style vs 
monolithic style. Yes, an OS is not Lucene/Solr, I'm going for more connotation 
than anything here.

Now, if some people came in and just did things module style in a way that 
matches the monolithic style (quality, feature wise), and they do that module 
after module, that is one thing. But I think that is indeed a daunting task, 
and I think there are a lot of other things to focus on. The end result is not 
even any guarantee - we seem just as likely to end up with a mess of modules 
with all kinds of crazy interdependencies. It's really easy to say, yeah, 
everything should be a module, sounds great, but there are large practical 
issues there. And from an open source project perspective, it's all even harder 
to plan. That's why I'm so about case by case.

I think poaching compatible license open source code is always okay.

 
 Or, more generally, you don't think Solr benefits from being opened up
 / modularized?

I think there would be benefits for many types of modules. And perhaps some 
downsides for some depending on the developers involved and how long they stay 
involved, and some of the interdependency issues that seem likely. Overall, I'm 
not terribly concerned about modules - they are not on my short term priority 
list (Analyzers would be for sure though, thanks Robert!).

On the one hand, you might think, well other Lucene users could take advantage 
of more of this stuff - and I see that as something kind of nice myself - but 
they already can use this stuff too - use Solr. So it's just not on the tip of 
my priority poll. I happily accept others are more concerned about it.

To wrap up, like I've said a million times, I'm not against modules. I also 
just don't share that same long term vision right now I guess.

Side note (plug): I have been playing with the benchmark module (who did that 
module? I had missed it), and I've got some cool stuff to show at Berlin 
Buzzwords this year for my solr performance talk!



 
 Mike
 
 http://blog.mikemccandless.com
 
 On Tue, May 3, 2011 at 1:11 PM, Mark Miller markrmil...@gmail.com wrote:
 
 On May 3, 2011, at 12:49 PM, Michael McCandless wrote:
 
 Isn't this the future we are working towards?
 
 No, not really. Others perhaps, but not me. I'm on board with some modules. 
 I do think there are tradeoffs when considering them and considering Lucene 
 and Solr. I'm happy to take everything one issue at a time.
 
 When I voted to merge, no, I certainly was not thinking, I hope in a year or 
 two we have taken everything from Solr and made it a module. I did it for a 
 few specific things to start - analyzers for sure, perhaps some other things 
 as people did something that made sense. I did it so we could share some 
 code more easily - not all code.
 
 Others did it for their own reasons I assume.
 
 But no - I'm not sure I have ever fully subscribed to what you are saying.
 
 - Mark Miller
 lucidimagination.com
 
 Lucene/Solr User Conference
 May 25-26, San Francisco
 www.lucenerevolution.org
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-05-04 Thread Robert Muir
On Wed, May 4, 2011 at 9:11 AM, Mark Miller markrmil...@gmail.com wrote:
 Side note (plug): I have been playing with the benchmark module (who did that 
 module? I had missed it), and I've got some cool stuff to show at Berlin 
 Buzzwords this year for my solr performance talk!


we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845

We should feel free to make this depend upon solr now (I know we
probably have to change some things about the build for that to
totally work, but thats the idea).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: modularization discussion

2011-05-04 Thread Uwe Schindler
 From: Robert Muir [mailto:rcm...@gmail.com]
 Sent: Wednesday, May 04, 2011 3:30 PM
 To: dev@lucene.apache.org
 Subject: Re: modularization discussion
 
 On Wed, May 4, 2011 at 9:11 AM, Mark Miller markrmil...@gmail.com
 wrote:
  Side note (plug): I have been playing with the benchmark module (who did
 that module? I had missed it), and I've got some cool stuff to show at Berlin
 Buzzwords this year for my solr performance talk!
 
 
 we svn move'd it here: https://issues.apache.org/jira/browse/LUCENE-2845
 
 We should feel free to make this depend upon solr now (I know we probably
 have to change some things about the build for that to totally work, but thats
 the idea).

Hihi,

Solr has no performance testing framework, see the issue from today (SOLR-2493).

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-05-04 Thread Mark Miller

On May 4, 2011, at 9:42 AM, Uwe Schindler wrote:

 Solr has no performance testing framework, see the issue from today 
 (SOLR-2493).

Come to Berlin Buzzwords!

(I know you already are :) )

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: modularization discussion

2011-05-04 Thread Simon Willnauer
On Wed, May 4, 2011 at 3:49 PM, Mark Miller markrmil...@gmail.com wrote:

 On May 4, 2011, at 9:42 AM, Uwe Schindler wrote:

 Solr has no performance testing framework, see the issue from today 
 (SOLR-2493).

 Come to Berlin Buzzwords!
I think I will come :)
simon

 (I know you already are :) )

 - Mark Miller
 lucidimagination.com

 Lucene/Solr User Conference
 May 25-26, San Francisco
 www.lucenerevolution.org






 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3069) Lucene should be able to have a entirely memory resident term dictionary

2011-05-04 Thread Simon Willnauer (JIRA)
Lucene should be able to have a entirely memory resident term dictionary


 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index, Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


FST based TermDictionary has been a great improvement yet it still uses a delta 
codec file for scanning to terms. Some environments have enough memory 
available to keep the entire FST based term dict in memory. We should add a 
TermDictionary implementation that encodes all needed information for each term 
into the FST (custom fst.Output) and builds a FST from the entire term not just 
the delta.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2011-05-04 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3069:


Summary: Lucene should have an entirely memory resident term dictionary  
(was: Lucene should be able to have a entirely memory resident term dictionary)

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index, Search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028757#comment-13028757
 ] 

Simon Willnauer commented on LUCENE-3018:
-

bq. OK, if I change the /linux to /solaris (in the build.xml), then on 
OpenSolaris I get the expected compilation errors (using the wrong IO flags). 
Can this somehow be done automagically...?

kind of quick and dirty but we could simply include 
{noformat}
pathelement location=${java.home}/../include/solaris/
{noformat}

so we automatically build on solaris too?

simon

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3070) Enable DocValues by default for every Codec

2011-05-04 Thread Simon Willnauer (JIRA)
Enable DocValues by default for every Codec
---

 Key: LUCENE-3070
 URL: https://issues.apache.org/jira/browse/LUCENE-3070
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Affects Versions: CSF branch
Reporter: Simon Willnauer
 Fix For: CSF branch


Currently DocValues are enable with a wrapper Codec so each codec which needs 
DocValues must be wrapped by DocValuesCodec. The DocValues writer and reader 
should be moved to Codec to be enabled by default.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065.patch

I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in ]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not 
include the Field into the returned array, if its not instanceof Field? We can 
add that to documentation, that lazy loaded and numerical fields are not 
returned.
- I would also like to add a method Document.getNumericValue(s), that returns 
Number[] or Number like the NumericField one. Like above getField() it can 
return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under bugs - we shold 
move.

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028764#comment-13028764
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 2:44 PM:
---

I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in LUCENE-2310]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not 
include the Field into the returned array, if its not instanceof Field? We can 
add that to documentation, that lazy loaded and numerical fields are not 
returned.
- I would also like to add a method Document.getNumericValue(s), that returns 
Number[] or Number like the NumericField one. Like above getField() it can 
return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under bugs - we shold 
move.

  was (Author: thetaphi):
I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in ]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not 
include the Field into the returned array, if its not instanceof Field? We can 
add that to documentation, that lazy loaded and numerical fields are not 
returned.
- I would also like to add a method Document.getNumericValue(s), that returns 
Number[] or Number like the NumericField one. Like above getField() it can 
return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under bugs - we shold 
move.
  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7718 - Failure

2011-05-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7718/

No tests ran.

Build Log (for compile errors):
[...truncated 81 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7717 - Failure

2011-05-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7717/

No tests ran.

Build Log (for compile errors):
[...truncated 19 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 7718 - Still Failing

2011-05-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7718/

No tests ran.

Build Log (for compile errors):
[...truncated 46 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 7719 - Still Failing

2011-05-04 Thread Apache Jenkins Server
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7719/

No tests ran.

Build Log (for compile errors):
[...truncated 35 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028798#comment-13028798
 ] 

Michael McCandless commented on LUCENE-3018:


That sounds good for starters?  Just stick /solaris (and others...?) in?

 Lucene Native Directory implementation need automated build
 ---

 Key: LUCENE-3018
 URL: https://issues.apache.org/jira/browse/LUCENE-3018
 Project: Lucene - Java
  Issue Type: Wish
  Components: Build
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Varun Thacker
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, 
 LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, 
 cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar


 Currently the native directory impl in contrib/misc require manual action to 
 compile the c code (partially) documented in 
  
 https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html
 yet it would be nice if we had an ant task and documentation for all 
 platforms how to compile them and set up the prerequisites.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3071) PathHierarchyTokenizer adaptation for urls: splits reversed

2011-05-04 Thread Olivier Favre (JIRA)
PathHierarchyTokenizer adaptation for urls: splits reversed
---

 Key: LUCENE-3071
 URL: https://issues.apache.org/jira/browse/LUCENE-3071
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Reporter: Olivier Favre
Priority: Minor


{{PathHierarchyTokenizer}} should be usable to split urls the a reversed way 
(useful for faceted search against urls):
{{www.site.com}} - {{www.site.com, site.com, com}}

Moreover, it should be able to skip a given number of first (or last, if 
reversed) tokens:
{{/usr/share/doc/somesoftware/INTERESTING/PART}}
Should give with 4 tokens skipped:
{{INTERESTING}}
{{INTERESTING/PART}}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3072) 3.1 fileformats out of date

2011-05-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3072:


Attachment: LUCENE-3072.patch

 3.1 fileformats out of date
 ---

 Key: LUCENE-3072
 URL: https://issues.apache.org/jira/browse/LUCENE-3072
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3072.patch


 The 3.1 fileformats is missing the change from LUCENE-2811

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3072) 3.1 fileformats out of date

2011-05-04 Thread Robert Muir (JIRA)
3.1 fileformats out of date
---

 Key: LUCENE-3072
 URL: https://issues.apache.org/jira/browse/LUCENE-3072
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3072.patch

The 3.1 fileformats is missing the change from LUCENE-2811

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3072) 3.1 fileformats out of date

2011-05-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028822#comment-13028822
 ] 

Michael McCandless commented on LUCENE-3072:


Looks good Robert, nice catch!

 3.1 fileformats out of date
 ---

 Key: LUCENE-3072
 URL: https://issues.apache.org/jira/browse/LUCENE-3072
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3072.patch


 The 3.1 fileformats is missing the change from LUCENE-2811

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-05-04 Thread Shawn Heisey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-2462:
---

Attachment: SOLR-2462_3_1.patch

The original patch would not apply cleanly for me against 3.1 without fuzz and 
whitespace options, and when those are used, it applies incorrectly.  Here's a 
new patch specific to 3.1.  Before creating this, I checked 3.1 out from SVN 
and then applied the patch for SOLR-2469, which should not interfere in any way.

Hopefully the patch is suitable.  I am only putting it up here for convenience, 
in case anyone else runs into this.

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3072) 3.1 fileformats out of date

2011-05-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3072.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

Committed revision 1099529, 1099534 (branch-3x)

 3.1 fileformats out of date
 ---

 Key: LUCENE-3072
 URL: https://issues.apache.org/jira/browse/LUCENE-3072
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3072.patch


 The 3.1 fileformats is missing the change from LUCENE-2811

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2494) Error in Context.setSessionAttribute implementation (ContextImpl.putVal)

2011-05-04 Thread Tom Klonikowski (JIRA)
Error in Context.setSessionAttribute implementation (ContextImpl.putVal)


 Key: SOLR-2494
 URL: https://issues.apache.org/jira/browse/SOLR-2494
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 3.1
Reporter: Tom Klonikowski


Session attributes are set to SCOPE_ENTITY, if SCOPE_GLOBAL or SCOPE_SOLR_CORE 
is given, due to an error in 
org.apache.solr.handler.dataimport.ContextImpl.putVal in line 159 
(entitySession.put instead of map.put).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028895#comment-13028895
 ] 

Doron Cohen commented on LUCENE-3068:
-

bq. specifically when the doc itself has tokens at the same position.

I am not convinced yet that there is a bug here - I think the code does allow 
this? 

There is another assumption in the code, that any two different PPs are in 
different TPs - which underlines the assumption that originally each PP differs 
in position, This seems a valid assumption, because QP will create MFQ if there 
are two terms in the (phrase) query with same position. 

bq. maybe any time a *PhraseQuery has overlapping positions, we should rewrite 
to a MultiPhraseQuery and let it handle the same positions...? Is there any 
downside to that?

I think this is the correct behavior - in particular this will be the query 
that a QP will create. The only way to create a PQ (not MPQ) for PPs in same 
positions is to create it manually. But why would anyone do that? And they did, 
wouldn't such a rewrite be a surprise to them?

A patch to follow with a revised version of this test - one that uses the QP. 
In this patch the QP indeed creates an MFQ, and I am yet unable to make it 
fail. Still trying.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3068) The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at same position

2011-05-04 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3068:


Attachment: LUCENE-3068.patch

Attached modified version of the test - one that invokes the query parser to 
create an MFQ. The test passes.

 The repeats mechanism in SloppyPhraseScorer is broken when doc has tokens at 
 same position
 --

 Key: LUCENE-3068
 URL: https://issues.apache.org/jira/browse/LUCENE-3068
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.3, 3.1, 4.0
Reporter: Michael McCandless
Assignee: Doron Cohen
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3068.patch, LUCENE-3068.patch


 In LUCENE-736 we made fixes to SloppyPhraseScorer, because it was
 matching docs that it shouldn't; but I think those changes caused it
 to fail to match docs that it should, specifically when the doc itself
 has tokens at the same position.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: LUCENE-3065-solr-only.patch

Here a first step in cutover of Solr to NumericField. Most tests work, except:
- TestDistributedSearch, fails with a strange date problem - I have no idea 
what goes wrong
- TestMoreLikeThis: fails because the returned documents are different than 
expected. The reason for this is simple: As TrieField's underlying Lucene 
fields now are NumericField, stringValue() returns something (in contrast, 
solr's old fields returned null because they were binary). This confuses maybe 
MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). 
Maybe we should simply exclude those fields or fix the test (I prefer latter 
one, because the numerics should also taken into account).

The following changes had to be made:
- Cut over all places in Solr where Field instead of abstract Fieldable is used 
to Fieldable. This affects some leftover parts in various components (calling 
Document.getField instead of Document.getFieldable), but mainly 
SchemaField/FieldType: createField() now returns Fieldable
- TrieDateField code duplication was removed, all methods delegate to a wrapped 
TrieField. There was also an inconsitency between TrieField and TrieDateField's 
toExternal(). This was fixed to work correct (the date format was wrong, now it 
uses dateField.toExternal())

If somebody could help with the rest of the solr stuff and maybe test test 
test! Yonik? Ryan? There may be some itches not covered by tests.

Thanks for help from Solr specialists (I am definitely not one, I am more 
afraid of the code than I can help)!!!

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065-solr-only.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2495) update noggit json parser

2011-05-04 Thread Yonik Seeley (JIRA)
update noggit json parser
-

 Key: SOLR-2495
 URL: https://issues.apache.org/jira/browse/SOLR-2495
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1.1


The latest version of noggit has fixes for long overflow detection (only 
important for numbers that don't fit in a long), and for a bug where corrupted 
JSON input could lead to an infinite loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: jira issues falling off the radar -- Next JIRA version

2011-05-04 Thread Smiley, David W.

On May 2, 2011, at 7:54 PM, Chris Hostetter wrote:

 We should definitely kill of Next ... i would suggest just removing it, 
 and not bulk applying a new version (there is no requirement that issues 
 have a version)

Chris, in JIRA, Next has this description:

 Placeholder for commiters to track issues that are not ready to commit, but 
 seem close enough to being ready to warrant focus before the next feature 
 release

Based on that, I think it would be irresponsible to just delete Next because 
any issues assigned to this version on the basis of that description (like 
SOLR-2191) is going to be dropped on the floor.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028943#comment-13028943
 ] 

Hoss Man commented on SOLR-2493:


bq. this is solr's fault by having a getter that does some heavy duty xml shit.

that sounds like some serious buck passing.

All of the get methods on the Config class take in xpath expressions -- it 
should be obvious to any one who uses them that they are going to do xpath 
parsing.

By the looks of it, the SolrConfig constructor was already creating a public 
final luceneMatchVersion variable in it's constructor (using the xml parsing 
based COnfig method) it just wasn't getting used by the query parser.

bq. In my opinion, the correct way to solve this is to make all methods in 
o.a.solr.core.Config protected as they should only be called by subclasses 
doing the actual parsing.

I don't see how that would inherently protect us from this kind of mistake.

The cause of the problem came from needing public access to a 
getLuceneVersion type method on SolrConfig (which is a subclass of Config)

even if all the methods in COnfig were protected, that could have very easily 
wound up being implemented like so ...

{code}
  public Version getLuceneVersion() { return 
super.inefficientProtectedMethod(...) }
{code}

...and we would have had the same problem.

Bottom line: we just need to be careful about how/when the Config XML parsing 
methods are used (protected or otherwise)

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028954#comment-13028954
 ] 

Robert Muir commented on SOLR-2493:
---

{quote}
All of the get methods on the Config class take in xpath expressions – it 
should be obvious to any one who uses them that they are going to do xpath 
parsing.
{quote}

How is that obvious? There's definitely no javadoc saying this. In general, if 
you have an api that contains XYZ and you add a getXYZ() with absolutely no 
javadocs that behaves as more than a getter, thats a trap.

So I still agree with Uwe, it should be protected to prevent problems, also it 
would be nice if these methods were called *parse*XYZ() instead of *get*XYZ().

Otherwise this is going to continue to happen!

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3073) make compoundfilewriter public

2011-05-04 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3073:


Attachment: LUCENE-3073.patch

 make compoundfilewriter public
 --

 Key: LUCENE-3073
 URL: https://issues.apache.org/jira/browse/LUCENE-3073
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-3073.patch


 CompoundFileReader is public, but CompoundFileWriter is not.
 I propose we make it public + @lucene.internal instead (just in case someone 
 else finds themselves wanting to manipulate cfs files)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3073) make compoundfilewriter public

2011-05-04 Thread Robert Muir (JIRA)
make compoundfilewriter public
--

 Key: LUCENE-3073
 URL: https://issues.apache.org/jira/browse/LUCENE-3073
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-3073.patch

CompoundFileReader is public, but CompoundFileWriter is not.

I propose we make it public + @lucene.internal instead (just in case someone 
else finds themselves wanting to manipulate cfs files)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Stephane Bailliez (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028974#comment-13028974
 ] 

Stephane Bailliez commented on SOLR-2493:
-

The problem is hardly about naming here, it is about correctly using classes 
when offered the choice. Mistake was made. That's it. We expect committers to 
be sufficiently knowledgeable about the codebase when committing code. That's 
true anywhere.

You can hardly expect a service ItemService to have methods such as:

getItemFromDatabase() or getItemFromServerOnTheOtherSideOfThePlanet() or 
getItemFromFile() or getItemFromMemory() if there are 4 different 
implementations of it., you have getItem() and the 4 different implementation 
do something different internally.

I rather actually wonder why the config is not parsed entirely at startup 
rather than have nodes lying around and cherry-picked.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028975#comment-13028975
 ] 

Uwe Schindler commented on SOLR-2493:
-

bq. The cause of the problem came from needing public access to a 
getLuceneVersion type method on SolrConfig (which is a subclass of Config)

This is not true. getLuceneVersion is in Config not SolrConfig and its public 
like all the other getXxx() methods. Version is just a datatype like 
int/float/String. Thats all.

In general the bad thing about the whole config stuff in solr is mixing parsing 
and value holder. This should theoretically separate classes. So SolrConfig has 
no parse methods at all. In its ctor it would simply instantiate the 
ConfigParser (name the class like that) and use it to set the values in 
SolrConfig. That would be cotrrect design.

The good thing with this design: One could instantiate a SolrConfig and 
populate it programmatically or via a JSON parser or whatever.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028975#comment-13028975
 ] 

Uwe Schindler edited comment on SOLR-2493 at 5/4/11 9:32 PM:
-

bq. The cause of the problem came from needing public access to a 
getLuceneVersion type method on SolrConfig (which is a subclass of Config)

This is not true. getLuceneVersion is in Config not SolrConfig and its public 
like all the other getXxx() methods. Version is just a datatype like 
int/float/String. Thats all. It does not need to be public (like all other 
getters in Config class).

In general the bad thing about the whole config stuff in solr is mixing parsing 
and value holder. This should theoretically separate classes. So SolrConfig has 
no parse methods at all. In its ctor it would simply instantiate the 
ConfigParser (name the class like that) and use it to set the values in 
SolrConfig. That would be cotrrect design.

The good thing with this design: One could instantiate a SolrConfig and 
populate it programmatically or via a JSON parser or whatever.

  was (Author: thetaphi):
bq. The cause of the problem came from needing public access to a 
getLuceneVersion type method on SolrConfig (which is a subclass of Config)

This is not true. getLuceneVersion is in Config not SolrConfig and its public 
like all the other getXxx() methods. Version is just a datatype like 
int/float/String. Thats all.

In general the bad thing about the whole config stuff in solr is mixing parsing 
and value holder. This should theoretically separate classes. So SolrConfig has 
no parse methods at all. In its ctor it would simply instantiate the 
ConfigParser (name the class like that) and use it to set the values in 
SolrConfig. That would be cotrrect design.

The good thing with this design: One could instantiate a SolrConfig and 
populate it programmatically or via a JSON parser or whatever.
  
 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: 

[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028980#comment-13028980
 ] 

Uwe Schindler commented on SOLR-2493:
-

bq. I rather actually wonder why the config is not parsed entirely at startup 
rather than have nodes lying around and cherry-picked.

It is mostly and should. The problem here is as noted before: SolrConfig 
subclasses Config which is only for parsing. SolrConfig should simply sublass 
Object and simply instantiate a parser on ctor to parse and store all parsed 
content in itsself. After that the parser is useless and can be freed. This 
would even free the DTM/DOM staying alive until Solr shuts down.

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028986#comment-13028986
 ] 

Yonik Seeley commented on SOLR-2493:


bq. How is that obvious?

The signature of these methods might be a tip-off:
 public double getDouble(String path, double def)

One can't be passing a String and have no idea what the string is used for ;-)


 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Improvements to the maven build

2011-05-04 Thread Smiley, David W.
Steve Row,

I thought I'd put together a list of interesting differences between the ant 
build output and the maven build output.  Before each build I did a full clean 
and then after the build I saved a file listing to a text file so that I could 
diff it.

I'm using svn revision 1087373 (March 31st).

1. The ant build invokes a JSP compiler to validate them.  The maven build does 
not.
2. Maven seems(?) to compile more modules' tests than the ant build does.
3. The ant build builds the tools module.  The maven build does not.  Probably 
fine it stays this way?
4. Ant doesn't build the benchmark module; maven will by default.  A problem 
for the ant build?
5. The ant build artifacts tend to have a leading apache- in front of them. 
But the maven artifactId does not have this so the artifacts file names are 
different, trivially so any way.
6. The ant solr build puts all its final artifacts into the solr/dist 
directory, the maven build does not--it leaves all of them in their build 
directory. Not a big deal but maybe there's a way to have the output file go 
someplace else?  Not sure.

There were two issues that seemed like clear bugs to me that I fixed with an 
attached patch.
1. solrj's build directory and compile output directory were the same 
directory, but that's problematic since building the output jar will result in 
an error if it sees its own jar file as an input file to its output jar.  So I 
added a classes directory.  This will result in a different directory than 
where the ant builds, though.
2. The dataimporthandler-extras output location was specified such that there 
was a redundant path: /extras/extras/, so I fixed this.

By the way, I think it would be really nice to have a maven build instructions 
file that is put into place when the get-maven-poms task is run.  The file 
would have the essential instructions, it would explain some of the relevant 
differences to the ant build (notably output file placement, file name 
differences), and it would include tips such as how to install the sources and 
javadoc jar files into your local repo.  

~ David Smiley


mvnfix.patch
Description: mvnfix.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028989#comment-13028989
 ] 

Uwe Schindler commented on SOLR-2493:
-

We should not again fight here against each other. The problem is fixed, we 
could release 3.1.1 if we fixed the last slowdown in MultiPhraseQuery.

The discussion here is just about how to prevent this. For me as a non-Solr 
comitter, when I did this code with Robert last year, I was also really 
confused about the design of Config (and in my opinion this is a wrong design). 
We should maybe open another issue and separate parsing and value-holding in 
two spearate classes (SolrConfig and ConfigParser). If we would do this all is 
solved (see above).

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2495) update noggit json parser

2011-05-04 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2495.


Resolution: Fixed

 update noggit json parser
 -

 Key: SOLR-2495
 URL: https://issues.apache.org/jira/browse/SOLR-2495
 Project: Solr
  Issue Type: Bug
Reporter: Yonik Seeley
Assignee: Yonik Seeley
 Fix For: 3.1.1


 The latest version of noggit has fixes for long overflow detection (only 
 important for numbers that don't fit in a long), and for a bug where 
 corrupted JSON input could lead to an infinite loop.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly

2011-05-04 Thread Neil Hooey (JIRA)
JSON Update Handler doesn't handle multiple docs properly
-

 Key: SOLR-2496
 URL: https://issues.apache.org/jira/browse/SOLR-2496
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 3.1
Reporter: Neil Hooey


The following is the current Solr 3.1 format for sending multiple
documents by JSON. It's not analogous to the XML method, and
isn't easily generated and serialized from a hash in Perl,
Python, Ruby, et al to JSON, because it has duplicate keys for add.

It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
Near the text: Here's a simple example of adding more than one document at 
once:
{
add: {doc: {id : TestDoc1, title : test1} },
add: {doc: {id : TestDoc2, title : another test} }
}'

Here's a better format that's analogous to the XML method of submission, and is 
easily serialized from a hash to JSON:
{
add: {
doc: [
{id : TestDoc1, title : test1},
{id : TestDoc2, title : another test},
],
},
}

The original XML method:
add
doc
   field name=idTestDoc1fieldfield name=titletest1/field
/doc
doc
   field name=idTestDoc2fieldfield 
name=titletest2/field/field
/doc
/add


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly

2011-05-04 Thread Neil Hooey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2496:
-

Description: 
The following is the current Solr 3.1 format for sending multiple
documents by JSON. It's not analogous to the XML method, and
isn't easily generated and serialized from a hash in Perl,
Python, Ruby, et al to JSON, because it has duplicate keys for add.

It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
Near the text: Here's a simple example of adding more than one document at 
once:
{code}
{
add: {doc: {id : TestDoc1, title : test1} },
add: {doc: {id : TestDoc2, title : another test} }
}'
{code}

Here's a better format that's analogous to the XML method of submission, and is 
easily serialized from a hash to JSON:
{code}
{
add: {
doc: [
{id : TestDoc1, title : test1},
{id : TestDoc2, title : another test},
],
},
}
{code}

The original XML method:
{code}
add
doc
   field name=idTestDoc1fieldfield name=titletest1/field
/doc
doc
   field name=idTestDoc2fieldfield 
name=titletest2/field/field
/doc
/add
{code}

  was:
The following is the current Solr 3.1 format for sending multiple
documents by JSON. It's not analogous to the XML method, and
isn't easily generated and serialized from a hash in Perl,
Python, Ruby, et al to JSON, because it has duplicate keys for add.

It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
Near the text: Here's a simple example of adding more than one document at 
once:
{
add: {doc: {id : TestDoc1, title : test1} },
add: {doc: {id : TestDoc2, title : another test} }
}'

Here's a better format that's analogous to the XML method of submission, and is 
easily serialized from a hash to JSON:
{
add: {
doc: [
{id : TestDoc1, title : test1},
{id : TestDoc2, title : another test},
],
},
}

The original XML method:
add
doc
   field name=idTestDoc1fieldfield name=titletest1/field
/doc
doc
   field name=idTestDoc2fieldfield 
name=titletest2/field/field
/doc
/add


 Issue Type: Improvement  (was: Bug)

 JSON Update Handler doesn't handle multiple docs properly
 -

 Key: SOLR-2496
 URL: https://issues.apache.org/jira/browse/SOLR-2496
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.1
Reporter: Neil Hooey
  Labels: json, update
   Original Estimate: 4h
  Remaining Estimate: 4h

 The following is the current Solr 3.1 format for sending multiple
 documents by JSON. It's not analogous to the XML method, and
 isn't easily generated and serialized from a hash in Perl,
 Python, Ruby, et al to JSON, because it has duplicate keys for add.
 It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
 Near the text: Here's a simple example of adding more than one document at 
 once:
 {code}
 {
 add: {doc: {id : TestDoc1, title : test1} },
 add: {doc: {id : TestDoc2, title : another test} }
 }'
 {code}
 Here's a better format that's analogous to the XML method of submission, and 
 is easily serialized from a hash to JSON:
 {code}
 {
 add: {
 doc: [
 {id : TestDoc1, title : test1},
 {id : TestDoc2, title : another test},
 ],
 },
 }
 {code}
 The original XML method:
 {code}
 add
 doc
field name=idTestDoc1fieldfield name=titletest1/field
 /doc
 doc
field name=idTestDoc2fieldfield 
 name=titletest2/field/field
 /doc
 /add
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly

2011-05-04 Thread Neil Hooey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2496:
-

Remaining Estimate: (was: 4h)
 Original Estimate: (was: 4h)

 JSON Update Handler doesn't handle multiple docs properly
 -

 Key: SOLR-2496
 URL: https://issues.apache.org/jira/browse/SOLR-2496
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.1
Reporter: Neil Hooey
  Labels: json, update

 The following is the current Solr 3.1 format for sending multiple
 documents by JSON. It's not analogous to the XML method, and
 isn't easily generated and serialized from a hash in Perl,
 Python, Ruby, et al to JSON, because it has duplicate keys for add.
 It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
 Near the text: Here's a simple example of adding more than one document at 
 once:
 {code}
 {
 add: {doc: {id : TestDoc1, title : test1} },
 add: {doc: {id : TestDoc2, title : another test} }
 }'
 {code}
 Here's a better format that's analogous to the XML method of submission, and 
 is easily serialized from a hash to JSON:
 {code}
 {
 add: {
 doc: [
 {id : TestDoc1, title : test1},
 {id : TestDoc2, title : another test},
 ],
 },
 }
 {code}
 The original XML method:
 {code}
 add
 doc
field name=idTestDoc1fieldfield name=titletest1/field
 /doc
 doc
field name=idTestDoc2fieldfield 
 name=titletest2/field/field
 /doc
 /add
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly

2011-05-04 Thread Neil Hooey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neil Hooey updated SOLR-2496:
-

Description: 
The following is the current Solr 3.1 format for sending multiple documents by 
JSON. It's not analogous to the XML method, and isn't easily generated and 
serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has 
duplicate keys for add.

It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
Near the text: Here's a simple example of adding more than one document at 
once:
{code}
{
add: {doc: {id : TestDoc1, title : test1} },
add: {doc: {id : TestDoc2, title : another test} }
}'
{code}

Here's a better format that's analogous to the XML method of submission, and is 
easily serialized from a hash to JSON:
{code}
{
add: {
doc: [
{id : TestDoc1, title : test1},
{id : TestDoc2, title : another test},
],
},
}
{code}

The original XML method:
{code}
add
doc
   field name=idTestDoc1fieldfield name=titletest1/field
/doc
doc
   field name=idTestDoc2fieldfield 
name=titletest2/field/field
/doc
/add
{code}

  was:
The following is the current Solr 3.1 format for sending multiple
documents by JSON. It's not analogous to the XML method, and
isn't easily generated and serialized from a hash in Perl,
Python, Ruby, et al to JSON, because it has duplicate keys for add.

It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
Near the text: Here's a simple example of adding more than one document at 
once:
{code}
{
add: {doc: {id : TestDoc1, title : test1} },
add: {doc: {id : TestDoc2, title : another test} }
}'
{code}

Here's a better format that's analogous to the XML method of submission, and is 
easily serialized from a hash to JSON:
{code}
{
add: {
doc: [
{id : TestDoc1, title : test1},
{id : TestDoc2, title : another test},
],
},
}
{code}

The original XML method:
{code}
add
doc
   field name=idTestDoc1fieldfield name=titletest1/field
/doc
doc
   field name=idTestDoc2fieldfield 
name=titletest2/field/field
/doc
/add
{code}


 JSON Update Handler doesn't handle multiple docs properly
 -

 Key: SOLR-2496
 URL: https://issues.apache.org/jira/browse/SOLR-2496
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.1
Reporter: Neil Hooey
  Labels: json, update

 The following is the current Solr 3.1 format for sending multiple documents 
 by JSON. It's not analogous to the XML method, and isn't easily generated and 
 serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has 
 duplicate keys for add.
 It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
 Near the text: Here's a simple example of adding more than one document at 
 once:
 {code}
 {
 add: {doc: {id : TestDoc1, title : test1} },
 add: {doc: {id : TestDoc2, title : another test} }
 }'
 {code}
 Here's a better format that's analogous to the XML method of submission, and 
 is easily serialized from a hash to JSON:
 {code}
 {
 add: {
 doc: [
 {id : TestDoc1, title : test1},
 {id : TestDoc2, title : another test},
 ],
 },
 }
 {code}
 The original XML method:
 {code}
 add
 doc
field name=idTestDoc1fieldfield name=titletest1/field
 /doc
 doc
field name=idTestDoc2fieldfield 
 name=titletest2/field/field
 /doc
 /add
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2496) JSON Update Handler doesn't handle multiple docs properly

2011-05-04 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028999#comment-13028999
 ] 

Yonik Seeley commented on SOLR-2496:


Yeah, I agree we should be able to add multiple docs w/o having to repeat tags 
in the same hash/object.
I proposed something like what you have, and the original thinking of the 
current
format is in this issue: SOLR-945


 JSON Update Handler doesn't handle multiple docs properly
 -

 Key: SOLR-2496
 URL: https://issues.apache.org/jira/browse/SOLR-2496
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.1
Reporter: Neil Hooey
  Labels: json, update

 The following is the current Solr 3.1 format for sending multiple documents 
 by JSON. It's not analogous to the XML method, and isn't easily generated and 
 serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has 
 duplicate keys for add.
 It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
 Near the text: Here's a simple example of adding more than one document at 
 once:
 {code}
 {
 add: {doc: {id : TestDoc1, title : test1} },
 add: {doc: {id : TestDoc2, title : another test} }
 }'
 {code}
 Here's a better format that's analogous to the XML method of submission, and 
 is easily serialized from a hash to JSON:
 {code}
 {
 add: {
 doc: [
 {id : TestDoc1, title : test1},
 {id : TestDoc2, title : another test},
 ],
 },
 }
 {code}
 The original XML method:
 {code}
 add
 doc
field name=idTestDoc1fieldfield name=titletest1/field
 /doc
 doc
field name=idTestDoc2fieldfield 
 name=titletest2/field/field
 /doc
 /add
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029003#comment-13029003
 ] 

Ryan McKinley commented on SOLR-2493:
-

bq. I was also really confused about the design of Config (and in my opinion 
this is a wrong design)

Like many things in solr/lucene the current design is the product many 
incremental back-compatible changes -- not a top down view of what it should 
be.  I would love to use 4.0 as a chance to revisit configs and their 
relationship to xml/validation etc, but that is a load of work with very little 
glory...






 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Improvements to the maven build

2011-05-04 Thread Ryan McKinley
Do you want to make a JIRA issue with a patch?

This is a good example of a patch that is easy to get committed
quickly because it is simple, clear, and understandable.

ryan


On Wed, May 4, 2011 at 5:41 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Steve Row,

 I thought I'd put together a list of interesting differences between the ant 
 build output and the maven build output.  Before each build I did a full 
 clean and then after the build I saved a file listing to a text file so that 
 I could diff it.

 I'm using svn revision 1087373 (March 31st).

 1. The ant build invokes a JSP compiler to validate them.  The maven build 
 does not.
 2. Maven seems(?) to compile more modules' tests than the ant build does.
 3. The ant build builds the tools module.  The maven build does not.  
 Probably fine it stays this way?
 4. Ant doesn't build the benchmark module; maven will by default.  A problem 
 for the ant build?
 5. The ant build artifacts tend to have a leading apache- in front of them. 
 But the maven artifactId does not have this so the artifacts file names are 
 different, trivially so any way.
 6. The ant solr build puts all its final artifacts into the solr/dist 
 directory, the maven build does not--it leaves all of them in their build 
 directory. Not a big deal but maybe there's a way to have the output file go 
 someplace else?  Not sure.

 There were two issues that seemed like clear bugs to me that I fixed with an 
 attached patch.
 1. solrj's build directory and compile output directory were the same 
 directory, but that's problematic since building the output jar will result 
 in an error if it sees its own jar file as an input file to its output jar.  
 So I added a classes directory.  This will result in a different directory 
 than where the ant builds, though.
 2. The dataimporthandler-extras output location was specified such that there 
 was a redundant path: /extras/extras/, so I fixed this.

 By the way, I think it would be really nice to have a maven build 
 instructions file that is put into place when the get-maven-poms task is run. 
  The file would have the essential instructions, it would explain some of the 
 relevant differences to the ant build (notably output file placement, file 
 name differences), and it would include tips such as how to install the 
 sources and javadoc jar files into your local repo.

 ~ David Smiley


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065

2011-05-04 Thread Uwe Schindler (JIRA)
Move Solr to new NumericField stored field impl of LUCENE-3065
--

 Key: SOLR-2497
 URL: https://issues.apache.org/jira/browse/SOLR-2497
 Project: Solr
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2, 4.0


This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField  
Co would use NumericField for indexing and reading stored fields. To enable 
this some missing changes in Solr's internals (Field - Fieldable) need to be 
done. Also some backwards compatible stored fields parsing is needed to read 
pre-3.2 indexes without reindexing (as the format changed a little bit and 
Document.getFieldable returns NumericField instances now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2497) Move Solr to new NumericField stored field impl of LUCENE-3065

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2497:


Attachment: SOLR-2497.patch

Patch applies to 3.2 branch only and needs the patch from LUCENE-3065 applied 
before:

Here a first step in cutover of Solr to NumericField. Most tests work, except:
- TestDistributedSearch, fails with a strange date problem - I have no idea 
what goes wrong
- TestMoreLikeThis: fails because the returned documents are different than 
expected. The reason for this is simple: As TrieField's underlying Lucene 
fields now are NumericField, stringValue() returns something (in contrast, 
solr's old fields returned null because they were binary). This confuses maybe 
MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). 
Maybe we should simply exclude those fields or fix the test (I prefer latter 
one, because the numerics should also taken into account).

The following changes had to be made:
- Cut over all places in Solr where Field instead of abstract Fieldable is used 
to Fieldable. This affects some leftover parts in various components (calling 
Document.getField instead of Document.getFieldable), but mainly 
SchemaField/FieldType: createField() now returns Fieldable
- TrieDateField code duplication was removed, all methods delegate to a wrapped 
TrieField. There was also an inconsitency between TrieField and TrieDateField's 
toExternal(). This was fixed to work correct (the date format was wrong, now it 
uses dateField.toExternal())

If somebody could help with the rest of the solr stuff and maybe test test 
test! Yonik? Ryan? There may be some itches not covered by tests.

Thanks for help from Solr specialists (I am definitely not one, I am more 
afraid of the code than I can help)!!!

 Move Solr to new NumericField stored field impl of LUCENE-3065
 --

 Key: SOLR-2497
 URL: https://issues.apache.org/jira/browse/SOLR-2497
 Project: Solr
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.2, 4.0

 Attachments: SOLR-2497.patch


 This implements the changes to NumericField (LUCENE-3065) in Solr. TrieField 
  Co would use NumericField for indexing and reading stored fields. To enable 
 this some missing changes in Solr's internals (Field - Fieldable) need to be 
 done. Also some backwards compatible stored fields parsing is needed to read 
 pre-3.2 indexes without reindexing (as the format changed a little bit and 
 Document.getFieldable returns NumericField instances now).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Comment: was deleted

(was: Here a first step in cutover of Solr to NumericField. Most tests work, 
except:
- TestDistributedSearch, fails with a strange date problem - I have no idea 
what goes wrong
- TestMoreLikeThis: fails because the returned documents are different than 
expected. The reason for this is simple: As TrieField's underlying Lucene 
fields now are NumericField, stringValue() returns something (in contrast, 
solr's old fields returned null because they were binary). This confuses maybe 
MoreLikeThis (needs maybe fixed in Lucene, I havent looked into the code). 
Maybe we should simply exclude those fields or fix the test (I prefer latter 
one, because the numerics should also taken into account).

The following changes had to be made:
- Cut over all places in Solr where Field instead of abstract Fieldable is used 
to Fieldable. This affects some leftover parts in various components (calling 
Document.getField instead of Document.getFieldable), but mainly 
SchemaField/FieldType: createField() now returns Fieldable
- TrieDateField code duplication was removed, all methods delegate to a wrapped 
TrieField. There was also an inconsitency between TrieField and TrieDateField's 
toExternal(). This was fixed to work correct (the date format was wrong, now it 
uses dateField.toExternal())

If somebody could help with the rest of the solr stuff and maybe test test 
test! Yonik? Ryan? There may be some itches not covered by tests.

Thanks for help from Solr specialists (I am definitely not one, I am more 
afraid of the code than I can help)!!!)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029011#comment-13029011
 ] 

Uwe Schindler commented on LUCENE-3065:
---

I started a new issue in Solr for the changes there: SOLR-2497

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Attachment: (was: LUCENE-3065-solr-only.patch)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3065:
--

Issue Type: Improvement  (was: Bug)

 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2493) SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large performance hit.

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029018#comment-13029018
 ] 

Uwe Schindler commented on SOLR-2493:
-

Ryan: I agree, this is why I always bring this up. With 4.0 we can reimplement 
APIs.

On the other hand: I thought Solr's backwards policy is about public 
HTTP-REST-APIs, why care on implementation details behind, why do we need to 
keep backwards? This is just a dumb question I never understood. As long as 
Solr behaves identical to the outside who cares if we change method 
signatures/class names?

 SolrQueryParser constantly parse luceneMatchVersion in solrconfig. Large 
 performance hit.
 -

 Key: SOLR-2493
 URL: https://issues.apache.org/jira/browse/SOLR-2493
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.1
Reporter: Stephane Bailliez
Assignee: Uwe Schindler
Priority: Blocker
  Labels: core, parser, performance, request, solr
 Fix For: 3.1.1, 3.2, 4.0

 Attachments: SOLR-2493-3.x.patch, SOLR-2493.patch


 I' m putting this as blocker as I think this is a serious issue that should 
 be adressed asap with a release. With the current code this is no way near 
 suitable for production use.
 For each instance created SolrQueryParser calls
  
 getSchema().getSolrConfig().getLuceneVersion(luceneMatchVersion, 
 Version.LUCENE_24)
 instead of using
 getSchema().getSolrConfig().luceneMatchVersion
 This creates a massive performance hit. For each request, there is generally 
 3 query parsers created and each of them will parse the xml node in config 
 which involve creating an instance of XPath and behind the scene the usual 
 factory finder pattern quicks in within the xml parser and does a loadClass.
 The stack is typically:
at 
 org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:363)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.findProviderClass(ObjectFactory.java:506)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.lookUpFactoryClass(ObjectFactory.java:217)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:131)
at 
 com.sun.org.apache.xml.internal.dtm.ObjectFactory.createObject(ObjectFactory.java:101)
at 
 com.sun.org.apache.xml.internal.dtm.DTMManager.newInstance(DTMManager.java:135)
at 
 com.sun.org.apache.xpath.internal.XPathContext.init(XPathContext.java:100)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:201)
at 
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at org.apache.solr.core.Config.getNode(Config.java:230)
at org.apache.solr.core.Config.getVal(Config.java:256)
at org.apache.solr.core.Config.getLuceneVersion(Config.java:325)
at 
 org.apache.solr.search.SolrQueryParser.init(SolrQueryParser.java:76)
at 
 org.apache.solr.schema.IndexSchema.getSolrQueryParser(IndexSchema.java:277)
 With the current 3.1 code, I do barely 250 qps with 16 concurrent users with 
 a near empty index.
 Switching SolrQueryParser to use 
 getSchema().getSolrConfig().luceneMatchVersion and doing a quick bench test, 
 performance become reasonable beyond 2000 qps.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Improvements to the maven build

2011-05-04 Thread Steven A Rowe
Hi David,

 I thought I'd put together a list of interesting differences between the
 ant build output and the maven build output.  Before each build I did a
 full clean and then after the build I saved a file listing to a text file
 so that I could diff it.

Cool!  Thanks for the effort.

 1. The ant build invokes a JSP compiler to validate them.  The maven
 build does not.

My take on the Maven build is that it should ensure the POMs are in sync with 
the artifacts produced.  I don't think the Maven build needs to perform all of 
the validity checks that the Ant build performs (e.g. also javadocs).  I'd like 
to keep the build simple, so that maintenance is easier and keeping the POMs in 
sync is thereby more likely.

 2. Maven seems(?) to compile more modules' tests than the ant build does.

Not sure why this would be, but I don't think it's necessarily a problem (for 
the Maven build).

 3. The ant build builds the tools module.  The maven build does not.
 Probably fine it stays this way?

In general, code under tools is used to generate other code, the results of  
which is kept checked in.  So I agree, the Maven build doesn't need to support 
this, KISS principlishly.

 4. Ant doesn't build the benchmark module; maven will by default.  A
 problem for the ant build?

I don't think it's necessary to enable this by default in Ant.  For the Maven 
build, it's just simpler to include every module - that way, every module's POM 
is checked.

 5. The ant build artifacts tend to have a leading apache- in front of
 them. But the maven artifactId does not have this so the artifacts file
 names are different, trivially so any way.

This different naming of artifacts has always been so, as far as I can tell, as 
long as Solr has released Maven artifacts.  

The conventional Maven artifact name template is 
artifactId-version[-classifier].jar, where -classifier is optional; 
artifactId would have to be apache-solr in order to make this change.  The 
full artifact name would be org.apache.solr:apache-solr:version, which seems 
weird to me in the double inclusion of maven. I'm guessing this also seemed 
weird to whoever first put the Maven artifact naming scheme in place.  

Specifying finalName in the maven-jar-plugin might alternatively do the 
trick?  If this were done, though, the Maven naming convention would not be 
followed, and that's fairly predictably an omen of Bad Things To Come.

(BTW, Lucene's Maven artifacts are the same as the regular ones - this is a 
Solr-only issue.)

 6. The ant solr build puts all its final artifacts into the solr/dist
 directory, the maven build does not--it leaves all of them in their build
 directory. Not a big deal but maybe there's a way to have the output file
 go someplace else?  Not sure.

I meant to keep the Maven build output location the same as the Ant build 
output location.  I think the Solr modules' POMs can and should be changed to 
eliminate this difference.

 There were two issues that seemed like clear bugs to me that I fixed with
 an attached patch.
 1. solrj's build directory and compile output directory were the same
 directory, but that's problematic since building the output jar will
 result in an error if it sees its own jar file as an input file to its
 output jar.  So I added a classes directory.  This will result in a
 different directory than where the ant builds, though.
 2. The dataimporthandler-extras output location was specified such that
 there was a redundant path: /extras/extras/, so I fixed this.

Thanks, I agree with these changes.  I committed your patch.

 By the way, I think it would be really nice to have a maven build
 instructions file that is put into place when the get-maven-poms task is
 run.  The file would have the essential instructions, it would explain
 some of the relevant differences to the ant build (notably output file
 placement, file name differences), and it would include tips such as how
 to install the sources and javadoc jar files into your local repo.

+1.  I've hesitated to include these instructions with other build 
instructions, since it might confuse users into thinking that the Maven build 
is officially supported.  (It's not.  Ant is the only official build.)

Steve



RE: Improvements to the maven build

2011-05-04 Thread Steven A Rowe
Hi Ryan,

 Do you want to make a JIRA issue with a patch?
 
 This is a good example of a patch that is easy to get committed
 quickly because it is simple, clear, and understandable.

Earlier today on #lucene IRC, David described the changes he had in mind, and 
asked me where to put the patch, and I told him that if the patch was small, 
the mailing list might make sense.

I also told him that JIRA issues are generally a good idea, but that for the 
officially-non-official Maven stuff, I haven't been using JIRA, since it seemed 
to me like the attendant noise to signal ratio would be too high.  (Maven 
appears to be appreciated by Lucene/Solr users way more than the devs, and the 
users don't generally follow JIRA.)

That said, I'm open to being convinced otherwise.

Steve

 On Wed, May 4, 2011 at 5:41 PM, Smiley, David W. dsmi...@mitre.org
 wrote:
  Steve Row,
 
  I thought I'd put together a list of interesting differences between
 the ant build output and the maven build output.  Before each build I did
 a full clean and then after the build I saved a file listing to a text
 file so that I could diff it.
 
  I'm using svn revision 1087373 (March 31st).
 
  1. The ant build invokes a JSP compiler to validate them.  The maven
 build does not.
  2. Maven seems(?) to compile more modules' tests than the ant build
 does.
  3. The ant build builds the tools module.  The maven build does
 not.  Probably fine it stays this way?
  4. Ant doesn't build the benchmark module; maven will by default.  A
 problem for the ant build?
  5. The ant build artifacts tend to have a leading apache- in front of
 them. But the maven artifactId does not have this so the artifacts file
 names are different, trivially so any way.
  6. The ant solr build puts all its final artifacts into the solr/dist
 directory, the maven build does not--it leaves all of them in their build
 directory. Not a big deal but maybe there's a way to have the output file
 go someplace else?  Not sure.
 
  There were two issues that seemed like clear bugs to me that I fixed
 with an attached patch.
  1. solrj's build directory and compile output directory were the same
 directory, but that's problematic since building the output jar will
 result in an error if it sees its own jar file as an input file to its
 output jar.  So I added a classes directory.  This will result in a
 different directory than where the ant builds, though.
  2. The dataimporthandler-extras output location was specified such that
 there was a redundant path: /extras/extras/, so I fixed this.
 
  By the way, I think it would be really nice to have a maven build
 instructions file that is put into place when the get-maven-poms task is
 run.  The file would have the essential instructions, it would explain
 some of the relevant differences to the ant build (notably output file
 placement, file name differences), and it would include tips such as how
 to install the sources and javadoc jar files into your local repo.
 
  ~ David Smiley



  1   2   >