[jira] Resolved: (SOLR-1697) PluginInfo should load plugins w/o class attribute also
[ https://issues.apache.org/jira/browse/SOLR-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-1697. -- Resolution: Fixed resolved Revision: 895909 PluginInfo should load plugins w/o class attribute also --- Key: SOLR-1697 URL: https://issues.apache.org/jira/browse/SOLR-1697 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 Attachments: SOLR-1697.patch This should enable components to load plugins w/ a default classname too -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796167#action_12796167 ] Noble Paul edited comment on SOLR-1696 at 1/5/10 8:40 AM: -- The new syntax can be as follows {code:xml} searchComponent class=solr.HighLightComponent name=highlight highlighting class=DefaultSolrHighlighter !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (f.i., for sentence extraction) -- fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter /highlighting /searchComponent {code} This way SolrCore can be totally agnostic of highlighter was (Author: noble.paul): The new syntax can be as follows {code:xml} searchComponent class=solr.HighLightComponent highlighting class=DefaultSolrHighlighter !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter default=true lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (f.i., for sentence extraction) -- fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html class=org.apache.solr.highlight.HtmlFormatter default=true lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter /highlighting /searchComponent {code} This way SolrCore can be totally agnostic of highlighter Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-1696: - Attachment: SOLR-1696.patch The old syntax is deprecated and all the code moves in into HighlightComponent. SolrCore is agnostic of loading and managing HighlightComponent Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1699) deprecate the updateHandler configuration syntax
deprecate the updateHandler configuration syntax Key: SOLR-1699 URL: https://issues.apache.org/jira/browse/SOLR-1699 Project: Solr Issue Type: Improvement Reporter: Noble Paul Fix For: 1.5 For all practical purposes, an updatehandler is a requestHandler. We can do away with a custom syntax for updatehandler example {code:xml} requestHandler class=solr.DirectUpdateHandler2 lst name=autoCommit int name=maxDocs1/int int name=maxTime360/int /lst !-- represents a lower bound on the frequency that commits may occur (in seconds). NOTE: not yet implemented int name=commitIntervalLowerBound0/int -- /requestHandler {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796584#action_12796584 ] Chris Male commented on SOLR-1696: -- Are you planning on logging a warning if they continue to use the deprecated syntax? Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1698) load balanced distributed search
[ https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796585#action_12796585 ] Noble Paul commented on SOLR-1698: -- is this related to SOLR-1431 . I though we can have custom ShardComponents for these things load balanced distributed search Key: SOLR-1698 URL: https://issues.apache.org/jira/browse/SOLR-1698 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Provide syntax and implementation of load-balancing across shard replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1699) deprecate the updateHandler configuration syntax
[ https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796586#action_12796586 ] Chris Male commented on SOLR-1699: -- Hi, I like the idea of standardising the syntax in the solrconfig.xml, but I think this is actually not categorising the UpdateHandler correctly. It gives the impression that it can respond to requests, and is just an alternative to the other update request handlers (XmlUpdateRequestHandlers co), which it isn't. deprecate the updateHandler configuration syntax Key: SOLR-1699 URL: https://issues.apache.org/jira/browse/SOLR-1699 Project: Solr Issue Type: Improvement Reporter: Noble Paul Fix For: 1.5 For all practical purposes, an updatehandler is a requestHandler. We can do away with a custom syntax for updatehandler example {code:xml} requestHandler class=solr.DirectUpdateHandler2 lst name=autoCommit int name=maxDocs1/int int name=maxTime360/int /lst !-- represents a lower bound on the frequency that commits may occur (in seconds). NOTE: not yet implemented int name=commitIntervalLowerBound0/int -- /requestHandler {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Deleted: (SOLR-1699) deprecate the updateHandler configuration syntax
[ https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul deleted SOLR-1699: - deprecate the updateHandler configuration syntax Key: SOLR-1699 URL: https://issues.apache.org/jira/browse/SOLR-1699 Project: Solr Issue Type: Improvement Reporter: Noble Paul For all practical purposes, an updatehandler is a requestHandler. We can do away with a custom syntax for updatehandler example {code:xml} requestHandler class=solr.DirectUpdateHandler2 lst name=autoCommit int name=maxDocs1/int int name=maxTime360/int /lst !-- represents a lower bound on the frequency that commits may occur (in seconds). NOTE: not yet implemented int name=commitIntervalLowerBound0/int -- /requestHandler {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1699) deprecate the updateHandler configuration syntax
[ https://issues.apache.org/jira/browse/SOLR-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796589#action_12796589 ] Noble Paul commented on SOLR-1699: -- I Chris , you are right . I opened this issue hastily i'm gonna remove this deprecate the updateHandler configuration syntax Key: SOLR-1699 URL: https://issues.apache.org/jira/browse/SOLR-1699 Project: Solr Issue Type: Improvement Reporter: Noble Paul For all practical purposes, an updatehandler is a requestHandler. We can do away with a custom syntax for updatehandler example {code:xml} requestHandler class=solr.DirectUpdateHandler2 lst name=autoCommit int name=maxDocs1/int int name=maxTime360/int /lst !-- represents a lower bound on the frequency that commits may occur (in seconds). NOTE: not yet implemented int name=commitIntervalLowerBound0/int -- /requestHandler {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build [mkdir] Created dir: /tmp/apache-solr-nightly/build/web compile-solrj: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj [javac] Compiling 88 source files to /tmp/apache-solr-nightly/build/solrj [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/solr [javac] Compiling 413 source files to /tmp/apache-solr-nightly/build/solr [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 208 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. dist-contrib: init: [mkdir] Created dir: /tmp/apache-solr-nightly/contrib/clustering/build/classes [mkdir] Created dir: /tmp/apache-solr-nightly/contrib/clustering/lib/downloads [mkdir] Created dir: /tmp/apache-solr-nightly/build/docs/api init-forrest-entities: compile-solrj: compile: [javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr [javac] Note: /tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. make-manifest: [mkdir] Created dir: /tmp/apache-solr-nightly/build/META-INF proxy.setup: check-files: get-colt: [get] Getting: http://repo1.maven.org/maven2/colt/colt/1.2.0/colt-1.2.0.jar [get] To: /tmp/apache-solr-nightly/contrib/clustering/lib/downloads/colt-1.2.0.jar get-pcj: [get] Getting: http://repo1.maven.org/maven2/pcj/pcj/1.2/pcj-1.2.jar [get] To: /tmp/apache-solr-nightly/contrib/clustering/lib/downloads/pcj-1.2.jar get-nni: [get] Getting: http://download.carrot2.org/maven2/org/carrot2/nni/1.0.0/nni-1.0.0.jar [get] To: /tmp/apache-solr-nightly/contrib/clustering/lib/downloads/nni-1.0.0.jar get-simple-xml: [get] Getting: http://mirrors.ibiblio.org/pub/mirrors/maven2/org/simpleframework/simple-xml/1.7.3/simple-xml-1.7.3.jar [get] To: /tmp/apache-solr-nightly/contrib/clustering/lib/downloads/simple-xml-1.7.3.jar get-libraries: compile: [javac] Compiling 7 source files to /tmp/apache-solr-nightly/contrib/clustering/build/classes build: [jar] Building jar: /tmp/apache-solr-nightly/contrib/clustering/build/apache-solr-clustering-1.5-dev.jar dist: [copy] Copying 1 file to /tmp/apache-solr-nightly/dist init: [mkdir] Created dir: /tmp/apache-solr-nightly/contrib/dataimporthandler/target/classes init-forrest-entities: compile-solrj: compile: [javac] Compiling 1 source file to /tmp/apache-solr-nightly/build/solr [javac] Note: /tmp/apache-solr-nightly/src/java/org/apache/solr/search/DocSetHitCollector.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. make-manifest: compile: [javac] Compiling 46 source files to /tmp/apache-solr-nightly/contrib/dataimporthandler/target/classes [javac] Note: /tmp/apache-solr-nightly/contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/DocBuilder.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compileExtras: [mkdir] Created dir: /tmp/apache-solr-nightly/contrib/dataimporthandler/target/extras/classes [javac] Compiling 2 source files to /tmp/apache-solr-nightly/contrib/dataimporthandler/target/extras/classes [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. build: [jar] Building jar: /tmp/apache-solr-nightly/contrib/dataimporthandler/target/apache-solr-dataimporthandler-1.5-dev.jar [jar] Building jar: /tmp/apache-solr-nightly/contrib/dataimporthandler/target/apache-solr-dataimporthandler-extras-1.5-dev.jar dist: [copy] Copying 2 files to /tmp/apache-solr-nightly/build/web [mkdir] Created dir: /tmp/apache-solr-nightly/build/web/WEB-INF/lib [copy] Copying 1 file to /tmp/apache-solr-nightly/build/web/WEB-INF/lib [copy]
[jira] Created: (SOLR-1700) LBHttpSolrServer - Connections managment
LBHttpSolrServer - Connections managment Key: SOLR-1700 URL: https://issues.apache.org/jira/browse/SOLR-1700 Project: Solr Issue Type: Improvement Components: clients - java Affects Versions: 1.4 Reporter: Patrick Sauts Priority: Minor Fix For: 1.5 As a LBHttpSolrServer is a wrapper to CommonsHttpSolrServer CommonsHttpSolrServer search1 = new CommonsHttpSolrServer(http://mysearch1;); search1.setConnectionTimeout(CONNECTION_TIMEOUT); search1.setSoTimeout(READ_TIMEOUT); search1.setConnectionManagerTimeout(CONNECTION_MANAGER_TIMEOUT); search1.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST1); search1.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS1); CommonsHttpSolrServer search2 = new CommonsHttpSolrServer(http://mysearch1;); search2.setConnectionTimeout(CONNECTION_TIMEOUT); search2.setSoTimeout(READ_TIMEOUT); search2.setConnectionManagerTimeout(CONNECTION_MANAGER_TIMEOUT); search2.setDefaultMaxConnectionsPerHost(MAX_CONNECTIONS_PER_HOST2); search2.setMaxTotalConnections(MAX_TOTAL_CONNECTIONS2); LBHttpSolrServer solrServers = new LBHttpSolrServer(search1, search2); So we can manage the parameters per server. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Solr-trunk #1023
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1023/changes Changes: [noble] SOLR-1697 PluginInfo should load plugins w/o class attribute also [gsingers] some useful constants [gsingers] SOLR-1302: some slight refactoring for more reusable distance calculations [gsingers] javadoc [gsingers] javadoc [gsingers] SOLR-1692: fix produceSummary issue with Carrot2 clustering -- [...truncated 4393 lines...] [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.263 sec [junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.015 sec [junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.458 sec [junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.351 sec [junit] Running org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.275 sec [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.409 sec [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.304 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 25.041 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 45.175 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 46.646 sec [junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.386 sec [junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.446 sec [junit] Running org.apache.solr.client.solrj.response.AnlysisResponseBaseTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.409 sec [junit] Running org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec [junit] Running org.apache.solr.client.solrj.response.FieldAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.449 sec [junit] Running org.apache.solr.client.solrj.response.QueryResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.587 sec [junit] Running org.apache.solr.client.solrj.response.TermsResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.285 sec [junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 10.837 sec [junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.491 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.563 sec [junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.465 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.443 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.68 sec [junit] Running org.apache.solr.common.util.DOMUtilTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.499 sec [junit] Running org.apache.solr.common.util.FileUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.552 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.439 sec [junit] Running org.apache.solr.common.util.NamedListTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.386 sec [junit] Running org.apache.solr.common.util.TestFastInputStream [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.593 sec [junit] Running org.apache.solr.common.util.TestHash [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.618 sec [junit] Running org.apache.solr.common.util.TestNamedListCodec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.866 sec [junit] Running org.apache.solr.common.util.TestXMLEscaping [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.566 sec
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: This looks like a hack. It currently only uses highlighter for prefetching docs and fields . There is no standard way of other components to take part in this. Possibly, but highlighting is one of the more expensive things to do and making sure the fields are there (and not lazily loaded) is important. Of course, it doesn't help if you want to use Term Vectors w/ highlighter We should either remove this altogether -1. or have a standard way for all components to take part in this. Perhaps a component could register what fields it needs? However, do you have a use case in mind? What component would you like to have leverage this? -Grant
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote: On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: This looks like a hack. It currently only uses highlighter for prefetching docs and fields . There is no standard way of other components to take part in this. Possibly, but highlighting is one of the more expensive things to do and making sure the fields are there (and not lazily loaded) is important. Of course, it doesn't help if you want to use Term Vectors w/ highlighter We should either remove this altogether -1. or have a standard way for all components to take part in this. Perhaps a component could register what fields it needs? However, do you have a use case in mind? What component would you like to have leverage this? I don't know. But the point is can we have a an interface PrefetchAware (or anything nicer) and components can choose to return the list of fields which it is interested in prefetching. I would like to remove the Strong coupling of QueryComponent on highlighting. -Grant -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
2010/1/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote: On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: This looks like a hack. It currently only uses highlighter for prefetching docs and fields . There is no standard way of other components to take part in this. Possibly, but highlighting is one of the more expensive things to do and making sure the fields are there (and not lazily loaded) is important. Of course, it doesn't help if you want to use Term Vectors w/ highlighter We should either remove this altogether -1. or have a standard way for all components to take part in this. Perhaps a component could register what fields it needs? However, do you have a use case in mind? What component would you like to have leverage this? I don't know. But the point is can we have a an interface PrefetchAware (or anything nicer) and components can choose to return the list of fields which it is interested in prefetching. I would like to remove the Strong coupling of QueryComponent on highlighting. Or we can add a method to ResponseBuilder.addPrefetchFields(String[] fieldNames) and SearchComponents can use this in prepare()/process() to express interest in prefetching. -Grant -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
[jira] Created: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search
Off-by-one error in calculating numFound in Distributed Search -- Key: SOLR-1701 URL: https://issues.apache.org/jira/browse/SOLR-1701 Project: Solr Issue Type: Bug Components: search Reporter: Shalin Shekhar Mangar Fix For: 1.5 {code} // This passes query(q, *:*, sort, id asc, fl, id,text); // This also passes (notice the rows param) query(q, *:*, sort, id desc, rows, 12, fl, id,text); // But this fails query(q, *:*, sort, id desc, fl, id,text); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1701) Off-by-one error in calculating numFound in Distributed Search
[ https://issues.apache.org/jira/browse/SOLR-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-1701: Attachment: SOLR-1701.patch Test to demonstrate the bug Off-by-one error in calculating numFound in Distributed Search -- Key: SOLR-1701 URL: https://issues.apache.org/jira/browse/SOLR-1701 Project: Solr Issue Type: Bug Components: search Reporter: Shalin Shekhar Mangar Fix For: 1.5 Attachments: SOLR-1701.patch {code} // This passes query(q, *:*, sort, id asc, fl, id,text); // This also passes (notice the rows param) query(q, *:*, sort, id desc, rows, 12, fl, id,text); // But this fails query(q, *:*, sort, id desc, fl, id,text); {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
On Jan 5, 2010, at 6:52 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote: On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: This looks like a hack. It currently only uses highlighter for prefetching docs and fields . There is no standard way of other components to take part in this. Possibly, but highlighting is one of the more expensive things to do and making sure the fields are there (and not lazily loaded) is important. Of course, it doesn't help if you want to use Term Vectors w/ highlighter We should either remove this altogether -1. or have a standard way for all components to take part in this. Perhaps a component could register what fields it needs? However, do you have a use case in mind? What component would you like to have leverage this? I don't know. But the point is can we have a an interface PrefetchAware (or anything nicer) and components can choose to return the list of fields which it is interested in prefetching. I would like to remove the Strong coupling of QueryComponent on highlighting. Sounds reasonable to me.
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
ok I have opened a new issue https://issues.apache.org/jira/browse/SOLR-1702 On Tue, Jan 5, 2010 at 5:50 PM, Grant Ingersoll gsing...@apache.org wrote: On Jan 5, 2010, at 6:52 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Tue, Jan 5, 2010 at 4:52 PM, Grant Ingersoll gsing...@apache.org wrote: On Jan 5, 2010, at 1:56 AM, Noble Paul നോബിള് नोब्ळ् wrote: This looks like a hack. It currently only uses highlighter for prefetching docs and fields . There is no standard way of other components to take part in this. Possibly, but highlighting is one of the more expensive things to do and making sure the fields are there (and not lazily loaded) is important. Of course, it doesn't help if you want to use Term Vectors w/ highlighter We should either remove this altogether -1. or have a standard way for all components to take part in this. Perhaps a component could register what fields it needs? However, do you have a use case in mind? What component would you like to have leverage this? I don't know. But the point is can we have a an interface PrefetchAware (or anything nicer) and components can choose to return the list of fields which it is interested in prefetching. I would like to remove the Strong coupling of QueryComponent on highlighting. Sounds reasonable to me. -- - Noble Paul | Systems Architect| AOL | http://aol.com
[jira] Created: (SOLR-1702) Standardize mechanism for components to prefetch fields
Standardize mechanism for components to prefetch fields --- Key: SOLR-1702 URL: https://issues.apache.org/jira/browse/SOLR-1702 Project: Solr Issue Type: Improvement Components: search Reporter: Noble Paul Priority: Minor Fix For: 1.5 The only component that is consulted now for prefetching fields is SolrHighlighter. This introduces tight coupling of QueryComponent w/ SolrHighlighter. We should standardize how this is done by all componnets so that there is no coupling. One way would be to register the prefetch fields with ResponseBuilder in the prepare phase and Querycomponent can make use of that info to do prefetching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1702) Standardize mechanism for components to prefetch fields
[ https://issues.apache.org/jira/browse/SOLR-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796648#action_12796648 ] Noble Paul commented on SOLR-1702: -- The mail thread http://markmail.org/thread/5c2f2qofz6xpg42c Standardize mechanism for components to prefetch fields --- Key: SOLR-1702 URL: https://issues.apache.org/jira/browse/SOLR-1702 Project: Solr Issue Type: Improvement Components: search Reporter: Noble Paul Priority: Minor Fix For: 1.5 The only component that is consulted now for prefetching fields is SolrHighlighter. This introduces tight coupling of QueryComponent w/ SolrHighlighter. We should standardize how this is done by all componnets so that there is no coupling. One way would be to register the prefetch fields with ResponseBuilder in the prepare phase and Querycomponent can make use of that info to do prefetching. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1657) convert the rest of solr to use the new tokenstream API
[ https://issues.apache.org/jira/browse/SOLR-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1657: -- Description: org.apache.solr.analysis: BufferedTokenStream - -CommonGramsFilter- - -CommonGramsQueryFilter- - -RemoveDuplicatesTokenFilter- -CapitalizationFilterFactory- -HyphenatedWordsFilter- -LengthFilter (deprecated, remove)- SynonymFilter SynonymFilterFactory WordDelimiterFilter org.apache.solr.handler: AnalysisRequestHandler AnalysisRequestHandlerBase org.apache.solr.handler.component: QueryElevationComponent SpellCheckComponent org.apache.solr.highlight: DefaultSolrHighlighter org.apache.solr.search: FieldQParserPlugin org.apache.solr.spelling: SpellingQueryConverter was: org.apache.solr.analysis: BufferedTokenStream - CommonGramsFilter - CommonGramsQueryFilter - RemoveDuplicatesTokenFilter CapitalizationFilterFactory HyphenatedWordsFilter LengthFilter (deprecated, remove) PatternTokenizerFactory (remove deprecated methods) SynonymFilter SynonymFilterFactory WordDelimiterFilter org.apache.solr.handler: AnalysisRequestHandler AnalysisRequestHandlerBase org.apache.solr.handler.component: QueryElevationComponent SpellCheckComponent org.apache.solr.highlight: DefaultSolrHighlighter org.apache.solr.search: FieldQParserPlugin org.apache.solr.spelling: SpellingQueryConverter convert the rest of solr to use the new tokenstream API --- Key: SOLR-1657 URL: https://issues.apache.org/jira/browse/SOLR-1657 Project: Solr Issue Type: Task Reporter: Robert Muir Attachments: SOLR-1657.patch, SOLR-1657.patch org.apache.solr.analysis: BufferedTokenStream - -CommonGramsFilter- - -CommonGramsQueryFilter- - -RemoveDuplicatesTokenFilter- -CapitalizationFilterFactory- -HyphenatedWordsFilter- -LengthFilter (deprecated, remove)- SynonymFilter SynonymFilterFactory WordDelimiterFilter org.apache.solr.handler: AnalysisRequestHandler AnalysisRequestHandlerBase org.apache.solr.handler.component: QueryElevationComponent SpellCheckComponent org.apache.solr.highlight: DefaultSolrHighlighter org.apache.solr.search: FieldQParserPlugin org.apache.solr.spelling: SpellingQueryConverter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr Cell revamped as an UpdateProcessor?
Hi, I'm developing a directory monitor to add in a Sor implementation. Tell me if it could be interesting for you we will be glad to share it with the comunity. Also I would like your opinion about the propousal if it looks ok for you and if you like to make any change or question it will be very well welcome. Regards Zacarias www.linebee.com 2009/12/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor is a good idea On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: Integrating Extraction w/ DIH is a better option. DIH makes it easier to do the mapping of fields etc. Which comment is this directed at? I'm lacking context here. On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org wrote: On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: ASs someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields. Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields. Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial? It probably could, but am not sure how it works in a processor chain. However, I'm not sure I understand how they work all that much either. I also plan on adding, BTW, a SolrJ client for Tika that does the extraction on the client. In many cases, the ExtrReqHandler is really only designed for lighter weight extraction cases, as one would simply not want to send that much rich content over the wire. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr Cell revamped as an UpdateProcessor?
Here is my propousal Regards On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote: Hi, I'm developing a directory monitor to add in a Sor implementation. Tell me if it could be interesting for you we will be glad to share it with the comunity. Also I would like your opinion about the propousal if it looks ok for you and if you like to make any change or question it will be very well welcome. Regards Zacarias www.linebee.com 2009/12/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor is a good idea On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: Integrating Extraction w/ DIH is a better option. DIH makes it easier to do the mapping of fields etc. Which comment is this directed at? I'm lacking context here. On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org wrote: On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: ASs someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields. Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields. Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial? It probably could, but am not sure how it works in a processor chain. However, I'm not sure I understand how they work all that much either. I also plan on adding, BTW, a SolrJ client for Tika that does the extraction on the client. In many cases, the ExtrReqHandler is really only designed for lighter weight extraction cases, as one would simply not want to send that much rich content over the wire. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com
[jira] Issue Comment Edited: (SOLR-1698) load balanced distributed search
[ https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796760#action_12796760 ] Yonik Seeley edited comment on SOLR-1698 at 1/5/10 5:13 PM: Another big question is: can we use LBHttpSolrServer for this, or are the needs too different? Some of the issues: - need control over order (so same server will be used in a single request) - if we have a big cluster (100 shards), we don't want every node or every core to have 100 background threads checking liveness - one request may want to hit addresses [A,B] while another may want [A,B,C] - a single LBHttpSolrServer can't currently do both at once, and separate instances wouldn't share liveness info. One way: have many LBHttpSolrServer instances (one per shard group) but have them share certain things like the liveness of a shard and the background cleaning threads Another way: have a single static LBHttpSolrServer instance that's shared for all requests, with an extra method that allows passing of a list of addresses on a per-request basis. was (Author: ysee...@gmail.com): Another big question is: can we use LBHttpSolrServer for this, or are the needs too different? load balanced distributed search Key: SOLR-1698 URL: https://issues.apache.org/jira/browse/SOLR-1698 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Provide syntax and implementation of load-balancing across shard replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr Cell revamped as an UpdateProcessor?
I'd attached a file to the previous mail. Is there any filter for pdf files or any other reason. On Tue, Jan 5, 2010 at 12:49 PM, Zacarias zacar...@linebee.com wrote: Here is my propousal Regards On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote: Hi, I'm developing a directory monitor to add in a Sor implementation. Tell me if it could be interesting for you we will be glad to share it with the comunity. Also I would like your opinion about the propousal if it looks ok for you and if you like to make any change or question it will be very well welcome. Regards Zacarias www.linebee.com 2009/12/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor is a good idea On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: Integrating Extraction w/ DIH is a better option. DIH makes it easier to do the mapping of fields etc. Which comment is this directed at? I'm lacking context here. On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org wrote: On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: ASs someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields. Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields. Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial? It probably could, but am not sure how it works in a processor chain. However, I'm not sure I understand how they work all that much either. I also plan on adding, BTW, a SolrJ client for Tika that does the extraction on the client. In many cases, the ExtrReqHandler is really only designed for lighter weight extraction cases, as one would simply not want to send that much rich content over the wire. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr Cell revamped as an UpdateProcessor?
On Jan 5, 2010, at 1:53 PM, Zacarias wrote: I'd attached a file to the previous mail. Is there any filter for pdf files or any other reason. The mailer strips attachments, although you might be able to get a zip through. Perhaps send a pointer to somewhere else or just describe it here. On Tue, Jan 5, 2010 at 12:49 PM, Zacarias zacar...@linebee.com wrote: Here is my propousal Regards On Tue, Jan 5, 2010 at 12:48 PM, Zacarias zacar...@linebee.com wrote: Hi, I'm developing a directory monitor to add in a Sor implementation. Tell me if it could be interesting for you we will be glad to share it with the comunity. Also I would like your opinion about the propousal if it looks ok for you and if you like to make any change or question it will be very well welcome. Regards Zacarias www.linebee.com 2009/12/8 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com I was refering to SOLR-1358. Anyway , SolrCell as an updateprocessor is a good idea On Tue, Dec 8, 2009 at 4:47 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 8, 2009, at 12:22 AM, Noble Paul നോബിള് नोब्ळ् wrote: Integrating Extraction w/ DIH is a better option. DIH makes it easier to do the mapping of fields etc. Which comment is this directed at? I'm lacking context here. On Tue, Dec 8, 2009 at 4:59 AM, Grant Ingersoll gsing...@apache.org wrote: On Dec 7, 2009, at 3:51 PM, Chris Hostetter wrote: ASs someone with very little knowledge of Solr Cell and/or Tika, I find myself wondering if ExtractingRequestHandler would make more sense as an extractingUpdateProcessor -- where it could be configured to take take either binary fields (or string fields containing URLs) out of the Documents, parse them with tika, and add the various XPath matching hunks of text back into the document as new fields. Then ExtractingRequestHandler just becomes a handler that slurps up it's ContentStreams and adds them as binary data fields and adds the other literal params as fields. Wouldn't that make things like SOLR-1358, and using Tika with URLs/filepaths in XML and CSV based updates fairly trivial? It probably could, but am not sure how it works in a processor chain. However, I'm not sure I understand how they work all that much either. I also plan on adding, BTW, a SolrJ client for Tika that does the extraction on the client. In many cases, the ExtrReqHandler is really only designed for lighter weight extraction cases, as one would simply not want to send that much rich content over the wire. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr Cell revamped as an UpdateProcessor?
: Subject: Re: Solr Cell revamped as an UpdateProcessor? : : Hi, I'm developing a directory monitor to add in a Sor implementation. Hmmm ... Is this really related to the Solr Cell thread you replied to? Please start a a new thread if you want to discuss a new topic... http://people.apache.org/~hossman/#threadhijack -Hoss
[jira] Commented: (SOLR-1698) load balanced distributed search
[ https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796838#action_12796838 ] Yonik Seeley commented on SOLR-1698: Looking into LBHttpSolrServer more, it looks like we have some serious concurrency issues. When a request does fail, a global lock is aquired to move from alive to zombie - but this same lock is used while making requests to check if zombies have come back (i.e. actual requests to zombies are being made with the lock held!). For distributed search use (SearchHandler) I'm thinking of going with option #2 from my previous message (have a single static LBHttpSolrServer instance that's shared for all requests, with an extra method that allows passing of a list of addresses on a per-request basis.). I'll address the concurrency issues at the same time. load balanced distributed search Key: SOLR-1698 URL: https://issues.apache.org/jira/browse/SOLR-1698 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Provide syntax and implementation of load-balancing across shard replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1277) Implement a Solr specific naming service (using Zookeeper)
[ https://issues.apache.org/jira/browse/SOLR-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796836#action_12796836 ] Patrick Hunt commented on SOLR-1277: Yonik, others, you might find this interesting: http://github.com/phunt/zookeeper_dashboard It's apache licensed, based on python/django. I had thoughts of having plugins for things like hbase, solr etc... based on their usage pattern (also for std recipes that might benefit from similar) Implement a Solr specific naming service (using Zookeeper) -- Key: SOLR-1277 URL: https://issues.apache.org/jira/browse/SOLR-1277 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Jason Rutherglen Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: log4j-1.2.15.jar, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, SOLR-1277.patch, zookeeper-3.2.1.jar Original Estimate: 672h Remaining Estimate: 672h The goal is to give Solr server clusters self-healing attributes where if a server fails, indexing and searching don't stop and all of the partitions remain searchable. For configuration, the ability to centrally deploy a new configuration without servers going offline. We can start with basic failover and start from there? Features: * Automatic failover (i.e. when a server fails, clients stop trying to index to or search it) * Centralized configuration management (i.e. new solrconfig.xml or schema.xml propagates to a live Solr cluster) * Optionally allow shards of a partition to be moved to another server (i.e. if a server gets hot, move the hot segments out to cooler servers). Ideally we'd have a way to detect hot segments and move them seamlessly. With NRT this becomes somewhat more difficult but not impossible? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796854#action_12796854 ] Hoss Man commented on SOLR-1677: bq. User Carl isn't helpful, user Carl is an idiot. Oh come on now ... that's not really a fair criticism of the example: there are plenty of legitimate ways to use (some) TokenFilters only at search time and I specifically structured my example to point out potential problems in cases just like that -- Carl was very clear that if you used FooTokenFilterFactory in an index analyzer you'll need to reindex. But fine, I'll amend my example to do it your way... {panel} ... Bob Asks his question (see previous example) User Carl is on vacation and never sees Bob's email User Dwight helpfully replies... bq. That was identified as a bug with FooTokenFilter that was fixed in Lucene 3.1, but the default behavior was left as is for backcompatibility. If you change your luceneAnalyzerVersionDefault/ value to 3.1 (or 3.2) you'll get the newer/better behavior - but you _must_ reindex all of your data after you make this change. Bob makes the change to 3.2 that Carl recommended, reindexes all of his data, and is happy to see now his queries work and every thing seems fine. What Bob doesn't realize (and what Carl wasn't aware of) is that elsewhere in his schema.xml file, Bob is also using the YakTokenizerFactory on a differnet field (yakField), and the behavior of the YakTokenizer changed in Lucene 3.0. This change is generally considered better behavior then YakTokenizer had before, but in combination with another TokenFilter Bob is using on the yakField it causes behavior that is not what Bob wants. Now some types of queries that use the yakField are failing, and *failing silently*. {panel} You could now argue that User Dwight is an idiot because he didn't warn Bob that other Analyzers/Tokenizers/TokenFilters might be affected. But that just leads us to scenerious that re-iterates my point that this type of global value is something that would be dangerous to ever change {panel} ... Bob Asks his question (see previous examples) User Carl has unsubscribed from the solr-user list (because a Bill Murray look-a-like hurt his feelings) and never sees Bob's email. User Dwight is on vacation and never sees Bob's email. User Ernest helpfully replies... {quote} That was identified as a bug with FooTokenFilter that was fixed in Lucene 3.1, but the default behavior was left as is for backcompatibility. If you change your luceneAnalyzerVersionDefault/ value to 3.1 (or 3.2) you'll get the newer/better behavior -- *But this is Very VERY Dangerous: It could potentially affect the behavior of other analyzers you are using. You need to check the javadocs for each and every Analyzer, Tokenizer, and TokenFilter you use to see what their behavior is with various values of the Version property before you make a change like this. Personally I never change the value of luceneAnalyzerVersionDefault/ once i have an existing schema.xml file. Instead i suggest you add {{luceneVersion=3.2}} to your {{filter class=solr.FooTokenFilterFactory /}} declaration so that you know you are only changing the behavior you want to change. BTW: You _must_ reindex all of your data after doing either of these things in order for it to work. {quote} Bob follow's Ernest's advice, and everything is fine .. but Bob is left wondering what the point is of a config option that's so dangerous to change, and wishes there was an easy way to know which of his Analyzers and Factories are depending on that scary gobal value. {panel} At the end of the day it just seems like a bigger risk then a feature ... I feel like i must still be misunderstanding the motivation you guys have for adding it, because it really seems like it boils down to easier then having the property 2.9 set on every analyzer/factory I guess i ultimately have no stringent objection to a global schema.xml seting like this existing as an expert level feature (for people who want really compact config files i guess), I just don't want to see it used in the example schema.xml file(s) where it's likely to screw novice users over. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most
[jira] Created: (SOLR-1703) Sorting by function problems on multicore (more than one core)
Sorting by function problems on multicore (more than one core) -- Key: SOLR-1703 URL: https://issues.apache.org/jira/browse/SOLR-1703 Project: Solr Issue Type: Bug Components: multicore, search Affects Versions: 1.5 Environment: Linux (debian, ubuntu), 64bits Reporter: Rafał Kuć When using sort by function (for example dist function) with multicore with more than one core (on multicore with one core, ie. the example deployment the problem doesn`t exist) there is a problem with not using the right schema. I think there is a problem with this portion of code: QueryParsing.java: public static FunctionQuery parseFunction(String func, IndexSchema schema) throws ParseException { SolrCore core = SolrCore.getSolrCore(); return (FunctionQuery) (QParser.getParser(func, func, new LocalSolrQueryRequest(core, new HashMap())).parse()); // return new FunctionQuery(parseValSource(new StrParser(func), schema)); } Code above uses deprecated method to get the core sometimes getting the wrong core effecting in impossibility to find the right fields in index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796862#action_12796862 ] Robert Muir commented on SOLR-1677: --- bq. Oh come on now ... that's not really a fair criticism of the example: there are plenty of legitimate ways to use (some) TokenFilters only at search time and I specifically structured my example to point out potential problems in cases just like that - Carl was very clear that if you used FooTokenFilterFactory in an index analyzer you'll need to reindex. I disagree, Version applies to all of lucene (even more than tokenstreams), so for Carl to imply that you don't need to reindex by bumping Version simply because you aren't using X or Y or Z, for that he should be renamed Oscar. bq. You could now argue that User Dwight is an idiot because he didn't warn Bob that other Analyzers/Tokenizers/TokenFilters might be affected. But that just leads us to scenerious that re-iterates my point that this type of global value is something that would be dangerous to ever change Yeah, I guess I don't think he is an idiot. I just think he is a moron for suggesting such a thing without warning of the consequences. bq. Personally I never change the value of luceneAnalyzerVersionDefault/ once i have an existing schema.xml file. Instead i suggest you add luceneVersion=3.2 to your filter class=solr.FooTokenFilterFactory / declaration so that you know you are only changing the behavior you want to change. Good for Ernest, i guess he is probably using Windows 3.1 still too because he doesn't want to upgrade ever. Unless Ernest carefully reads Lucene CHANGES also and reads all the Solr source code and knows which solr features are tied to which lucene features, because its not obvious at all: i.e. solr's snowball factory doesn't use lucene's snowball, etc etc. bq. At the end of the day it just seems like a bigger risk then a feature ... I feel like i must still be misunderstanding the motivation you guys have for adding it, because it really seems like it boils down to easier then having the property 2.9 set on every analyzer/factory Yes you are right, personally I don't want all users to be stuck with Version.LUCENE_24 forever. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796872#action_12796872 ] Uwe Schindler commented on SOLR-1677: - In my opinion, the default in solrconfig.xml should be possible to set, because there is currently no requirement to set a version for all TS components. This default is in the shipped solrconfig.xml the version of the shipped lucene version. so new users can use the default config and extend it like learned in all courses and books about solr. They do not need to care about the version. If they upgrade their lucene version, their config keeps stuck on the previous seeting and they are fine. If they want to change some of the components (like query parser, index writer, index reader -- flex!!!), they can do it locally. So Bob could change like Ernest proposed. If we do not have a default, all users will keep stuck with lucene 2.4, because they do not care about version (it is not required, because it defaults to 2.4 for BW compatibility). So lots of configs will never use the new unicode features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for Lucene 3 is removed, then all users cry. With a default version set to 2.4, they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 is no longer available as enum constant. If you really do not want to have a default version in config (not schema, because it applies to *all* lucene components), then you should go the way like Lucene 3.0: Require a matchVersion for all components. As there may be tokenstream components not from lucene, make this attribute in the schema only mandatory for lucene-streams (this can be done by my initial patch, too: if the matchVersion property is missing then the matchVersion will get NULL and the factory should thow IAE if required. In my original patch, only the parsing code should be moved out of the factory into a util class in solr. Maybe also possible to parse x.y-style versions). The problem here: Users upgrading from solr 1.4 will suddenly get errors, because their configs get invalid. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1703) Sorting by function problems on multicore (more than one core)
[ https://issues.apache.org/jira/browse/SOLR-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796874#action_12796874 ] Yonik Seeley commented on SOLR-1703: Indeed - that's a bug. New solr code shouldn't be using parseFunction. Sorting by function problems on multicore (more than one core) -- Key: SOLR-1703 URL: https://issues.apache.org/jira/browse/SOLR-1703 Project: Solr Issue Type: Bug Components: multicore, search Affects Versions: 1.5 Environment: Linux (debian, ubuntu), 64bits Reporter: Rafał Kuć When using sort by function (for example dist function) with multicore with more than one core (on multicore with one core, ie. the example deployment the problem doesn`t exist) there is a problem with not using the right schema. I think there is a problem with this portion of code: QueryParsing.java: public static FunctionQuery parseFunction(String func, IndexSchema schema) throws ParseException { SolrCore core = SolrCore.getSolrCore(); return (FunctionQuery) (QParser.getParser(func, func, new LocalSolrQueryRequest(core, new HashMap())).parse()); // return new FunctionQuery(parseValSource(new StrParser(func), schema)); } Code above uses deprecated method to get the core sometimes getting the wrong core effecting in impossibility to find the right fields in index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796872#action_12796872 ] Uwe Schindler edited comment on SOLR-1677 at 1/5/10 10:29 PM: -- In my opinion, the default in solrconfig.xml should be possible to set, because there is currently no requirement to set a version for all TS components. This default is in the shipped solrconfig.xml the version of the shipped lucene version. so new users can use the default config and extend it like learned in all courses and books about solr. They do not need to care about the version. If they upgrade their lucene version, their config keeps stuck on the previous seeting and they are fine. If they want to change some of the components (like query parser, index writer, index reader -- flex!!!), they can do it locally. So Bob could change like Ernest proposed. If we do not have a default, all users will keep stuck with lucene 2.4, because they do not care about version (it is not required, because it defaults to 2.4 for BW compatibility). So lots of configs will never use the new unicode features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for Lucene 3 is removed, then all users cry. With a default version set to 2.4, they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 is no longer available as enum constant. If you really do not want to have a default version in config (not schema, because it applies to *all* lucene components), then you should go the way like Lucene 3.0: Require a matchVersion for all components. As there may be tokenstream components not from lucene, make this attribute in the schema only mandatory for lucene-streams (this can be done by my initial patch, too: if the matchVersion property is missing then the matchVersion will get NULL and the factory should thow IAE if required. In my original patch, only the parsing code should be moved out of the factory into a util class in solr. Maybe also possible to parse x.y-style versions). The problem here: Users upgrading from solr 1.4 will suddenly get errors, because their configs get invalid. Ahh, and because they are stupid they add LUCENE_29 (from where should they know that Solr 1.4 used Lucene 2.4 compatibility?). And then the mailing list gets flooded by questions because suddenly the configs fail to produce results with old indexes. was (Author: thetaphi): In my opinion, the default in solrconfig.xml should be possible to set, because there is currently no requirement to set a version for all TS components. This default is in the shipped solrconfig.xml the version of the shipped lucene version. so new users can use the default config and extend it like learned in all courses and books about solr. They do not need to care about the version. If they upgrade their lucene version, their config keeps stuck on the previous seeting and they are fine. If they want to change some of the components (like query parser, index writer, index reader -- flex!!!), they can do it locally. So Bob could change like Ernest proposed. If we do not have a default, all users will keep stuck with lucene 2.4, because they do not care about version (it is not required, because it defaults to 2.4 for BW compatibility). So lots of configs will never use the new unicode features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for Lucene 3 is removed, then all users cry. With a default version set to 2.4, they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 is no longer available as enum constant. If you really do not want to have a default version in config (not schema, because it applies to *all* lucene components), then you should go the way like Lucene 3.0: Require a matchVersion for all components. As there may be tokenstream components not from lucene, make this attribute in the schema only mandatory for lucene-streams (this can be done by my initial patch, too: if the matchVersion property is missing then the matchVersion will get NULL and the factory should thow IAE if required. In my original patch, only the parsing code should be moved out of the factory into a util class in solr. Maybe also possible to parse x.y-style versions). The problem here: Users upgrading from solr 1.4 will suddenly get errors, because their configs get invalid. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch,
Re: Why do we need SolrPluginUtils#optimizePreFetchDocs()
: This looks like a hack. It currently only uses highlighter for : prefetching docs and fields . There is no standard way of other : components to take part in this. It was an optimization that improved the common when it was added, but it predates all of the search component stuff, so that's not much of a suprise that it's not easy for components to use. : Or we can add a method to ResponseBuilder.addPrefetchFields(String[] : fieldNames) and SearchComponents can use this in prepare()/process() : to express interest in prefetching. I would suggest using something like a FieldSelector instead of a String[], so that components wanting all files that match a rule don't have to manifest a large array. I can't put my finger on it, but it feels like there is a larger issue here ... relating to SOLR-1298, and how/when a DocList might/should get manifested as DocumentList ... It would be fairly easy to modify optimizePreFetchDocs to check properties on the ResponseBuilder (via the SolrQueryRequest) to decide which fields to ask for, but ultimately all optimizePreFetchDocs does is ask the SolrIndexSearche to load the doc and then throws it away (relying on the documentCache to have those fields handy for later use). as long as we're changing that, we might as well make it do ... something ... better. -Hoss
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796937#action_12796937 ] Hoss Man commented on SOLR-1677: bq. Version applies to all of lucene (even more than tokenstreams), so for Carl to imply that you don't need to reindex by bumping Version simply because you aren't using X or Y or Z, for that he should be renamed Oscar. Ok, fair enough ... i was supposing in that example that since i called it {{luceneAnalyzerVersionDefault/}} it was clearly specific to analysis objects in schema.xml and didn't affect any of the other things Version is used for (which would be specified in solrconfig.xml) bq. i guess he is probably using Windows 3.1 still too because he doesn't want to upgrade ever. No, he uses an OS where he can upgrade indivudal things individually with clear implications -- he sets {{luceneMatchVersion=2.9}} on each and every {{analyzer/}}, {{tokenizer/}} and {{filter/}} that he declares in his schema so that he knows exactly what behavior is changing when he modifies any of them. bq. personally I don't want all users to be stuck with Version.LUCENE_24 forever. I still must be missing something? ... why would all users be stuck with Version.LUCENE_24 forever? I'm not advocating that we don't allow a way to specify Version, i'm saying that having a global value for it that affects things opaquely sounds dangerous -- we should certianly have a way for people to specify the Version they want on each of the objects that care, but it shouldn't be global. The luceneMatchVersion property that Uwe added to BaseTokenizerFactory and BaseTokenFilterFactory in his patch seems perfect to me, it's just the {{SolrCoreAware}} / {{core.getSolrConfig().luceneMatchVersion}} that i think is a bad idea. If we modify the analyzer/ initialization to allow constructor args as Erik suggested (I'm pretty sure there's already code in Solr to do this, we just aren't using it for Analyzers) then we should be good to go for everything in schema.xml If anything declared in solrconfig.xml starts caring about Version (QParser, SolrIndexWriter, etc...) then likewise it should get a luceneMatchVersion init property as well. No one will ever be stuck with LUCENE_24, but they won't be surprised by behavior changes either. bq. If we do not have a default, all users will keep stuck with lucene 2.4, because they do not care about version (it is not required, because it defaults to 2.4 for BW compatibility). So lots of configs will never use the new unicode features of Lucene 3.1. I don't believe that. Almost every solr user on the planet starts with the example configs. if the example configs start specifying luceneMatchVersion=2.9 on every analyzer and factory then people will care about Version just as much as they care about the stopwords.txt file that ships with solr -- that may be not at all, or it may be a lot, but it will be up to them, and it will be obvious to them, because it's right there in the declaration where they can see it, and easy for them to refrence and recognize that changing that value will affect things. bq. If you really do not want to have a default version in config (not schema, because it applies to all lucene components), then you should go the way like Lucene 3.0: Require a matchVersion for all components. I'm totally on board with that idea in the long run -- but there are ways to get there gradually that are back compatible with existing configs. Individual factories that care about luceneMatchVersion should absolutely start warning on startup that users should set luceneMatchVersion to get newer/better behavior may be available if it is unset (or doesn't match the current value of Version.LUCENE_CURRENT) and provide a URL for a wiki page somewhere where more detail is available. The Analyzer init code can do likewise if if sees an {{analyzer class=.../}} being inited w/ a constructor that takes in a Version which is using an old value. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and
[jira] Commented: (SOLR-1212) TestNG Test Case
[ https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796953#action_12796953 ] Hoss Man commented on SOLR-1212: I'm on the fence ... i agree it's (probably) useful to TestNG users and i would like to do as much as possible to make it easy for people to use the TestHarness (regardless of how they write tests) but the idea of including it in the release does smell fishy if we're not actually using it anywhere in Solr -- it may not seem like much overhead to maintain it, but if it never gets used internally then it's not really clear if/when there are problems with it (even Test code needs to be tested to be sure that it's not broken). If it's not included in the Solr repository, then it may fall out of sync with Solr -- but that's true of any plugin someone writes and hosts on sourceforge, or github, or googlecode -- we can advertise that it works with Solr 1.4, and if something changes in Solr 1.5, or Solr 1.6 or Solr 9.7 that breaks it then interested parties are free to update it with new version that does work. ...If i knew more about TestNG i might be able to form a stronger opinion like this is awesome, it's super useful, we should include it or this doesn't really provide any value add to users but I just don't know enough either way. TestNG Test Case - Key: SOLR-1212 URL: https://issues.apache.org/jira/browse/SOLR-1212 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Environment: Java 6 Reporter: Kay Kay Fix For: 1.5 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar Original Estimate: 1h Remaining Estimate: 1h TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . New Class created: AbstractSolrNGTest LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache License 2.0 ) TestNG 5.9-jdk15 added to lib. Justification: In some workplaces - people are moving towards TestNG and take out JUnit altogether from the classpath. Hence useful in those cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796965#action_12796965 ] Robert Muir commented on SOLR-1677: --- {quote} No, he uses an OS where he can upgrade indivudal things individually with clear implications - he sets luceneMatchVersion=2.9 on each and every analyzer/, tokenizer/ and filter/ that he declares in his schema so that he knows exactly what behavior is changing when he modifies any of them. {quote} Yeah, but this isnt how Version works in lucene either, please see below {quote} I'm not advocating that we don't allow a way to specify Version, i'm saying that having a global value for it that affects things opaquely sounds dangerous - we should certianly have a way for people to specify the Version they want on each of the objects that care, but it shouldn't be global. The luceneMatchVersion property that Uwe added to BaseTokenizerFactory and BaseTokenFilterFactory in his patch seems perfect to me, it's just the SolrCoreAware / core.getSolrConfig().luceneMatchVersion that i think is a bad idea. {quote} And I disagree, I think that the per-tokenfilter matchVersion should be the expert use, with the default global Version being the standard use. I don't think Version is intended so you can use X.Y on this part and Y.Z on this part and have any chance of anything working, for example it controls position increments on stopfilter but also in queryparser, if you use wacky combinations, things might not work. And I personally don't see anyone putting effort into supporting this either, because its enough to supply the back compat for previous versions, but not some cross product of all possible versions. this is too much. sometimes things interact in ways we cannot detect automatically (such as the query parser phrasequery / stopfilter thing), its my understanding that things like this are why Version was created in the first place. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SolrException.ErrorCode
: Any reason why ErrorCode.code isn't public final? It seems weird that we have no idea (to clarify, it was final and grant recently made it public final) : public void assertQEx(String message, SolrQueryRequest req, int code ) { : : instead of : public void assertQEx(String message, SolrQueryRequest req, SolrException.ErrorCode ) { I think because that assert code predates the ErrorCode class (once upon a time Solr was also using numeric codes like 0, -1 for some internal errors) and when the ErrorCode class got added, no one updated the AbstractTestCase to make it easy to use. : Also, if it is public final, there really isn't any harm in exposing it : elsewhere. However, it does seem weird that we have these codes, but : they aren't logged either, AFAICT. They do get logged, but indirectly -- since ErrorCode is an inner class of SolrException the SolrException class has always had access to it even though it wasn't public, so it inspects it when it's used to construct a SolrException, and then SolrException has a public int code() method for returning it. that method is used by the SolrDispatchFilter to set the response code, which the Servlet Container uses for logging. : Finally, do we have a plan for how they should be used? Can we add new : values to them? Do we document anywhere what they mean? The enum names seem fairly self documenting. The ErrorCode enum came about in SOLR-249, after there had been a push to move away from inconsistent values depending on where the SolrException was expected to wind up (0,-1 wsa getting used by some internal exception, and code that the UpdateServlet dealt with, while SOlrExceptions that were getting thrown to the user used HTTP status codes. I think we should try to stick with using a subset of the HTTP codes so we can always be safe leaking them to outside clients (via the Servlet Containers default error page mechanism). If we feel like we need finer grained control then that in some cases we could consider adding a sub-code to the ErrorCode enum -- but that sounds like it would smell fishy. If we find ourselves wanting more detial like that, we should probably subclass SolrException instead of adding more codes (we should probably be subclassing SolrException a lot more anyway) -Hoss
Re: NPE in MoreLikeThis referenced doc not found and debugQuery=True
: When I do a specific MLT search on a document with debugQuery=True I am getting : a NullPoniterException both on screen and in my catalina logs. The query is as : follows : : http://localhost:8080/solr2/select/?mlt.minwl=3mlt.fl=bodymlt.mintf=1mlt.maxwl=15mlt.maxqt=20version=1.2rows=5mlt.mindf=1fl=nid,title,path,url,digest,teaserstart=0q=nid:16036qt=mltdebugQuery=true : : Is this desired behavior? An NPE is never desired behavior ... can you elaborate on which version of Solr you are using? and to clarify: you are saying you only get this when debugQuery=true ... correct? : org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.java:399) : at : org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandle : r.java:189) The MLT Handler (not to be confused with the MLT Component) does things a bit differnetly then most other handlers, so it's not suprising that something like this might have gotten overlooked. skimming the code i don't see any obvious reason why it should encounter an NPE however. can you reproduce this using hte example configs/data, or is it something special about your data? -Hoss
[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1
[ https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796982#action_12796982 ] Lisa Carter commented on SOLR-534: -- I would argue that REALLY_BIG_NUMBER is actually significantly MORE dangerous than a crash. Here's why: A crash at least lets the programmer know something went wrong. Missing data is a silent failure. 1) If the result set is too large for the client, it will run out of memory and generate an exception. The programmer will immediately know they did something wrong. 2) If the result set is too large for the network (unlikely) this will disconnect and fail. The programmer will immediately know they did something wrong. 3) If the result set is too large for solr, solr should not crash but rather return a page with the standard error handler result set too large/out of memory. The programmer will immediately know they did something wrong. Solr sure as heck better be checking this already--you never know when you'll run into bizarre low memory conditions;allocations should ALWAYS be checked for. But if you use the REALLY_BIG_NUMBER approach, the same bad programmer who never thought he would get back more than a 1000 records will never check whether the result set contains more than 1000 records either. If the programmer was expecting the complete result set and the database now contains 1002 records instead of 999, they will not know there is a problem... the last records in the set are simply truncated. The programmer who wrote the code may not be the person maintaining the application, quite common in production environments. The maintenance person may not know for weeks or months that a problem even exists! The -1 approach ensures immediate, loud failure. The REALLY_BIG_NUMBER ensures only silent failure. While it's impossible to idiot-proof everything, loud failure is always preferable to silent failure. Barking loudly saves the poor soul who maintains the idiot's code a lot of heartache. Return all query results with parameter rows=-1 --- Key: SOLR-534 URL: https://issues.apache.org/jira/browse/SOLR-534 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: solr-all-results.patch The searcher should return all results matching a query when the parameter rows=-1 is given. I know that it is a bad idea to do this in general, but as it explicitly requires a special parameter, people using this feature will be aware of what they are doing. The main use case for this feature is probably debugging, but in some cases one might actually need to retrieve all results because they e.g. are to be merged with results from different sources. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1212) TestNG Test Case
[ https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797004#action_12797004 ] Kay Kay commented on SOLR-1212: --- @Shalin , @HossMan - I understand the pain of maintaining this one separate from Junit - but was concerned mostly about this being out of date with the tree. As regarding comparison between TestNG vs JUnit - one big advantage is to categorize the tests as different groups in testng and run them separately (especially useful as the code base of solr gets bigger + contrib ). Which is one of the primary reasons (After evaluating both ) - testng was chosen. So - if you guys are not comfortable with the patch - then (as shalin noted) - just make an entry in the wiki and leave this one as such. The code is definitely not big to warrant a sf project / github / google code at this point. A better patch would be to refactor existing JUnit Test case so that the testng version minimizes duplication as much as possible. TestNG Test Case - Key: SOLR-1212 URL: https://issues.apache.org/jira/browse/SOLR-1212 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Environment: Java 6 Reporter: Kay Kay Fix For: 1.5 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar Original Estimate: 1h Remaining Estimate: 1h TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . New Class created: AbstractSolrNGTest LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache License 2.0 ) TestNG 5.9-jdk15 added to lib. Justification: In some workplaces - people are moving towards TestNG and take out JUnit altogether from the classpath. Hence useful in those cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1212) TestNG Test Case
[ https://issues.apache.org/jira/browse/SOLR-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797005#action_12797005 ] Kay Kay commented on SOLR-1212: --- @HossMan - Interesting thread from stackoverflow - http://stackoverflow.com/questions/153427/is-there-a-junit-testrunner-for-running-groups-of-tests . Not sure - how much of junit has kept up with TestNG recently but TestNG is definitely a notch up (IMHO, of course). TestNG Test Case - Key: SOLR-1212 URL: https://issues.apache.org/jira/browse/SOLR-1212 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Environment: Java 6 Reporter: Kay Kay Fix For: 1.5 Attachments: SOLR-1212.patch, testng-5.9-jdk15.jar Original Estimate: 1h Remaining Estimate: 1h TestNG equivalent of AbstractSolrTestCase , without using JUnit altogether . New Class created: AbstractSolrNGTest LICENSE.txt , NOTICE.txt modified as appropriate. ( TestNG under Apache License 2.0 ) TestNG 5.9-jdk15 added to lib. Justification: In some workplaces - people are moving towards TestNG and take out JUnit altogether from the classpath. Hence useful in those cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1698) load balanced distributed search
[ https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797011#action_12797011 ] Noble Paul commented on SOLR-1698: -- LBHttpSolrServer can have the concept of a sticky session and the session object can be used for all shard requests made in a single solr request. load balanced distributed search Key: SOLR-1698 URL: https://issues.apache.org/jira/browse/SOLR-1698 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Provide syntax and implementation of load-balancing across shard replicas. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.