Build failed in Hudson: Solr-trunk #1043
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1043/ -- [...truncated 2370 lines...] [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.985 sec [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.419 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 32.127 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 50.634 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 43.331 sec [junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.016 sec [junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.409 sec [junit] Running org.apache.solr.client.solrj.response.AnlysisResponseBaseTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.41 sec [junit] Running org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.391 sec [junit] Running org.apache.solr.client.solrj.response.FieldAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.529 sec [junit] Running org.apache.solr.client.solrj.response.QueryResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.799 sec [junit] Running org.apache.solr.client.solrj.response.TermsResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.674 sec [junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 11.43 sec [junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.401 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.435 sec [junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.389 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.518 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.852 sec [junit] Running org.apache.solr.common.util.DOMUtilTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.655 sec [junit] Running org.apache.solr.common.util.FileUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.354 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.409 sec [junit] Running org.apache.solr.common.util.NamedListTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.509 sec [junit] Running org.apache.solr.common.util.TestFastInputStream [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.549 sec [junit] Running org.apache.solr.common.util.TestHash [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.595 sec [junit] Running org.apache.solr.common.util.TestNamedListCodec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.577 sec [junit] Running org.apache.solr.common.util.TestXMLEscaping [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.452 sec [junit] Running org.apache.solr.core.AlternateDirectoryTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.485 sec [junit] Running org.apache.solr.core.AlternateIndexReaderTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.295 sec [junit] Running org.apache.solr.core.IndexReaderFactoryTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.562 sec [junit] Running org.apache.solr.core.RequestHandlersTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.998 sec [junit] Running org.apache.solr.core.ResourceLoaderTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.483 sec [junit] Running org.apache.solr.core.SOLR749Test [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.898 sec [junit] Running org.apache.solr.core.SolrCoreTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 14.443 sec [junit] Running org.apache.solr.core.TestArbitraryIndexDir [junit] Tests run: 1,
configure FastVectorHihglighter in trunk
How do I activate FastVectorHighlighter in trunk? Wich of those params sets it up? !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ !-- Configure the standard fragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder default=true/ fragmentsBuilder name=scoreOrder class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder default=true/ Thanks in advance. -- View this message in context: http://old.nabble.com/configure-FastVectorHihglighter-in-trunk-tp27319976p27319976.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Need hardware recommendation
I am trying to do the following: Index 6 Million database records( SQL Server 2008). Full index daily. Differential every 15 minutes Index 2 Million rich documents. Full index weekly. Differential every 15 minutes Search queries: 1 per minute 20 cores I am looking for hardware recommendations. Any advice/recommendation will be appreciated. -Jayesh Wadhwani
Re: configure FastVectorHihglighter in trunk
Marc Sturlese wrote: How do I activate FastVectorHighlighter in trunk? Wich of those params sets it up? !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ !-- Configure the standard fragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder default=true/ fragmentsBuilder name=scoreOrder class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder default=true/ Thanks in advance. You do not need to activate it. DefaultSolrHighlighter, which is the default SolrHighlighter impl, calls automatically uses FVH when you specify field names that are termVectors, termPositions and termOffsets are true through hl.fl parameter. If you want to use multi colored tag feature, you need to specify MultiColored*FragmentsBuilder in solrconfig.xml. Koji -- http://www.rondhuit.com/en/
[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805051#action_12805051 ] Jan Høydahl commented on SOLR-1725: --- Lance, what do you mean by DIH language? In my example xml that you quoted, the two first processors, FileReaderProcessorFactory and TikaProcessorFactory, were supposed to be (imagined) ordinary Java processors, not scripting ones. Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the lst name=params part, and instead pass all top-level params directly to the script (except the scripts param itself). Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1733) Allow specification of date output format on date faceting
Allow specification of date output format on date faceting -- Key: SOLR-1733 URL: https://issues.apache.org/jira/browse/SOLR-1733 Project: Solr Issue Type: Improvement Components: SearchComponents - other Environment: my local mac book pro. Reporter: Chris A. Mattmann Fix For: 1.5 It would be really great if the facet component allowed the specification of the date output format, so that e.g., if I wanted to facet by month, I could also specify what the resultant date facets look like. In other words, a facet query like this: http://localhost:8993/solr/select/?q=*:*version=2.2start=0rows=146indent=onfacet=onfacet.date=startdatefacet.date.start=NOW/YEAR-50YEARSfacet.date.end=NOWfacet.date.gap=%2B1MONTHfacet.date.output=yy-MM-dd Showed output like: {code:xml} lst name=facet_dates lst name=startdate int name=1960-01-010/int int name=1960-02-011/int int name=1960-03-010/int int name=1960-04-010/int int name=1960-05-012/int ... /lst /lst {code} Patch forthcoming that implements this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Folder monitoring for diferential index
Hi, We are developing a folder monitor (watchdog) to add in a client's Sor implementation. Tell me if it could be interesting for any of you guys, we will be glad to share it with the community and eventually to integrate in the trunk. Also I would like the gurus opinion about the design (I wrote a proposal). Awaiting for feedback http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf%20 linebee labs http://www.linebee.com/?page_id=71 Zacarias zacar...@linebee.com www.linebee.com
Re: Folder monitoring for diferential index
http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf On Tue, Jan 26, 2010 at 12:06 PM, Zacarias zacar...@linebee.com wrote: Hi, We are developing a folder monitor (watchdog) to add in a client's Sor implementation. Tell me if it could be interesting for any of you guys, we will be glad to share it with the community and eventually to integrate in the trunk. Also I would like the gurus opinion about the design (I wrote a proposal). Awaiting for feedback http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf http://www.linebee.com/wp-content/uploads/2010/01/File_monitoring_for_diferencial_index_1.rtf%20 linebee labs http://www.linebee.com/?page_id=71 Zacarias zacar...@linebee.com www.linebee.com
Problem with German Wordendings
Hi List. We have made a suggest search and send this query with a facet.prefix kinderzim: facet=on facet.prefix=kinderzim facet.mincount=1 facet.field=content facet.limit=10 fl=content omitHeader=true bf=log%28supplier_faktor%29 version=1.2 wt=json json.nl=map q= start=0 rows=0 Now we get: lst name=content int name=kinderzimm7/int /lst SolR doesn't return the endings of the output Words. It must be kinderzimmer same with kindermode, we get kindermod. We add the words in our protwords.txt and include them with this line in schema.xml. filter class=solr.SnowballPorterFilterFactory language=German protected=protwords.txt/ Can anybody help us? Thanks and sorry about my english. So Long , David
how to sort facets?
hi, we make a Filter with Faceting feature. In our faceting list the order is by count by the matches: facet.sort=count but we need to sort by = facet.sort=manufacturer. Url manipulation doesn't change anything, why? select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 so long, David
[jira] Updated: (SOLR-1283) Mark Invalid error on indexing
[ https://issues.apache.org/jira/browse/SOLR-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Coloos updated SOLR-1283: Attachment: SOLR-1283.patch The issue is also happening in current trunk (revision 903234), with the class {{HTMLStripCharFilter}} (replacing deprecated {{HTMLStripReader}} it seems). Example of stacktrace: {noformat} 26 janv. 2010 16:02:56 org.apache.solr.common.SolrException log GRAVE: java.io.IOException: Mark invalid at java.io.BufferedReader.reset(BufferedReader.java:485) at org.apache.lucene.analysis.CharReader.reset(CharReader.java:63) at org.apache.solr.analysis.HTMLStripCharFilter.restoreState(HTMLStripCharFilter.java:172) at org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:734) at org.apache.solr.analysis.HTMLStripCharFilter.read(HTMLStripCharFilter.java:748) at java.io.Reader.read(Reader.java:122) at org.apache.lucene.analysis.CharTokenizer.incrementToken(CharTokenizer.java:77) at org.apache.lucene.analysis.ISOLatin1AccentFilter.incrementToken(ISOLatin1AccentFilter.java:43) at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:383) at org.apache.lucene.analysis.ISOLatin1AccentFilter.next(ISOLatin1AccentFilter.java:64) at org.apache.solr.analysis.WordDelimiterFilter.next(WordDelimiterFilter.java:379) at org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:318) at org.apache.lucene.analysis.StopFilter.incrementToken(StopFilter.java:225) at org.apache.lucene.analysis.LowerCaseFilter.incrementToken(LowerCaseFilter.java:38) at org.apache.solr.analysis.SnowballPorterFilter.incrementToken(SnowballPorterFilterFactory.java:116) at org.apache.lucene.analysis.TokenStream.next(TokenStream.java:406) at org.apache.solr.analysis.BufferedTokenStream.read(BufferedTokenStream.java:97) at org.apache.solr.analysis.BufferedTokenStream.next(BufferedTokenStream.java:83) at org.apache.lucene.analysis.TokenStream.incrementToken(TokenStream.java:321) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:781) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:764) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2630) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2602) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at
[jira] Commented: (SOLR-1722) Allowing changing the special default core name, and as a default default core name, switch to using collection1 rather than DEFAULT_CORE
[ https://issues.apache.org/jira/browse/SOLR-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805093#action_12805093 ] Mark Miller commented on SOLR-1722: --- Right - the issue with cloud was that if you ask for the core name, it's known as rather than its other name. So their are def advantages to doing this in the dispatch filter - you can use the non name and you can get rid of all the normalization going on in core container - the problem is back compat I think - if we just normalize in the DispatchFilter, anyone counting on getting the default core with getCore() now will have their code broken. Allowing changing the special default core name, and as a default default core name, switch to using collection1 rather than DEFAULT_CORE --- Key: SOLR-1722 URL: https://issues.apache.org/jira/browse/SOLR-1722 Project: Solr Issue Type: Improvement Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 1.5 Attachments: SOLR-1722.patch see http://search.lucidimagination.com/search/document/f5f2af7c5041a79e/default_core -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1711) Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
[ https://issues.apache.org/jira/browse/SOLR-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-1711. Resolution: Fixed Thanks Attila! I just committed this. Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java -- Key: SOLR-1711 URL: https://issues.apache.org/jira/browse/SOLR-1711 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 1.4, 1.5 Reporter: Attila Babo Priority: Critical Fix For: 1.5 Attachments: StreamingUpdateSolrServer.patch Original Estimate: 1h Remaining Estimate: 1h While inserting a large pile of documents using StreamingUpdateSolrServer there is a race condition as all Runner instances stop processing while the blocking queue is full. With a high performance client this could happen quite often, there is no way to recover from it at the client side. In StreamingUpdateSolrServer there is a BlockingQueue called queue to store UpdateRequests, there are up to threadCount number of workers threads from StreamingUpdateSolrServer.Runner to read that queue and push requests to a Solr instance. If at one point the BlockingQueue is empty all workers stop processing it and pushing the collected content to Solr which could be a time consuming process, sometimes all worker threads are waiting for Solr. If at this time the client fills the BlockingQueue to full all worker threads will quit without processing any further and the main thread will block forever. There is a simple, well tested patch attached to handle this situation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805167#action_12805167 ] Hoss Man commented on SOLR-1677: bq. And here are the JIRA issues for stemming bugs, since you didnt take my hint to go and actually read them. sigh. I read both those issues when you filed them, and I agreed with your assessment that they are bugs we should fix -- if i had thought you were wrong i would have said so in the issue comments. But that doesn't change the fact that sometimes people depend on buggy behavior -- and sometimes those people depend on the buggy behavior without even realizing it. Bug fixes in a stemmer might make it more correct according to the stemmer algorithm specification, or the language semantics, but in some peculuar use cases an application might find the correct implementation less useful then the previous buggy version. This is one reason why things like CHANGES.txt are important: to draw attention to what has changed between two versions of a piece of software, so people can make informed opinions about what they should test in their own applications when they upgrade things under the covers. luceneMatchVersion should be no different. We should try to find a simple way to inform people when you switch from luceneMatchVersion=X to luceneMatchVersion=Y here are the bug fixes you will get so they know what to test to determine if they are adversely affected by that bug fix in some way (and find their own work around) bq. Perhaps you should come up with a better example than stemming, as you don't know what you are talking about. 1) It's true, I frequently don't know what i'm talking about ... this issue was a prime example, and i thank you, Uwe, and Miller for helping me realize that i was completely wrong in my understanding about the intended purpose of o.a.l.Version, and that a global setting for it in Solr makes total sense -- But that doesn't make my concerns about documenting the affects of that global setting any less valid. 2) Perhaps you should read the StopFilter example i already posted in my last comment... {quote} bq. Robert mentioned in an earlier comment that StopFilter's position increment behavior changes depending on the luceneMatchVersion -- what if an existing Solr 1.3 user notices a bug in some Tokenizer, and adds {{luceneMatchVersion3.0/luceneMatchVersion}} to his schema.xml to fix it. Without clear documentation n _everything_ that is affected when doing that, he may not realize that StopFilter changed at all -- and even though the position incrememnt behavior may now be more correct, it might drasticly change the results he gets when using dismax with a particular qs or ps value. Hence my point that this becomes a serious documentation concern: finding a way to make it clear to users what they need to consider when modifying luceneMatchVersion. {quote} Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: CHANGES.txt updates for SOLR-1516 and SOLR-1592
: Not to be a best, but there's no CHANGES.txt updates for SOLR-1516 and : SOLR-1592. Could someone update them? A trivial patch is attached... Sorry about that. Every change (with the possible exception of fixing formating or documentation typos) *should* have a CHANGES.txt entry. Every change that affects the public API *MUST* have a CHANGES.txt entry. Committed revision 903398. -Hoss
[jira] Commented: (SOLR-1603) Perl Response Writer
[ https://issues.apache.org/jira/browse/SOLR-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805186#action_12805186 ] Hoss Man commented on SOLR-1603: I realize this is analogous to the python, php, and ruby writers, but while i can't speak much to how those (language) communities feel about evaling code from remote sources to generate data structures, i know that the majority of the Perl community considers that a bad practice ... it's the reason things like YAML was created: to allow simple serialization w/o needing to execute untrusted code. So i'm a little leery about adding this (beyond my general leeryness of adding code w/o tests). Perl Response Writer Key: SOLR-1603 URL: https://issues.apache.org/jira/browse/SOLR-1603 Project: Solr Issue Type: New Feature Components: Response Writers Reporter: Claudio Valente Priority: Minor Attachments: SOLR-1603.patch I've made a patch that implements a Perl response writer for Solr. It's nan/inf and unicode aware. I don't know whether some fields can be binary but if so I can probably extend it to support that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805187#action_12805187 ] Robert Muir commented on SOLR-1677: --- bq. 2) Perhaps you should read the StopFilter example i already posted in my last comment... https://issues.apache.org/jira/browse/LUCENE-2094?focusedCommentId=12783932page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12783932 as far as this one goes, i specifically commented before on this not being 'hidden' by Version (with Solr users in mind) but instead its own option that every user should consider, regardless of defaults. For the stopfilter posInc the user should think it through, its pretty strange, like i mention in my comment, that a definite article like 'the' gets a posInc bump in one language but not another, simply because it happens to be separated by a space. I guess I could care less what the default is, if you care about such things you shouldn't be using the defaults and instead specifying this yourself in the schema, and Version has no effect. I can't really defend the whole stopfilter posInc thing, as again i think it doesn't make a whole lot of sense, maybe it works good for english I guess, I won't argue about it. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1718) Carriage return should submit query admin form
[ https://issues.apache.org/jira/browse/SOLR-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805190#action_12805190 ] Hoss Man commented on SOLR-1718: I don't understand what you mean. both forms use a {{textarea}}, why should the behavior of one textarea be different from the behavior of the other (and every other html textarea on the web) ? Carriage return should submit query admin form -- Key: SOLR-1718 URL: https://issues.apache.org/jira/browse/SOLR-1718 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 1.4 Reporter: David Smiley Priority: Minor Hitting the carriage return on the keyboard should submit the search query on the admin front screen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1718) Carriage return should submit query admin form
[ https://issues.apache.org/jira/browse/SOLR-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805219#action_12805219 ] David Smiley commented on SOLR-1718: Consider the JIRA interface we are using to comment on this issue. At the top-right of the screen is a QUICK SEARCH: box. It doesn't even have a submit button, it just works with hitting the return key. Carriage return should submit query admin form -- Key: SOLR-1718 URL: https://issues.apache.org/jira/browse/SOLR-1718 Project: Solr Issue Type: Improvement Components: web gui Affects Versions: 1.4 Reporter: David Smiley Priority: Minor Hitting the carriage return on the keyboard should submit the search query on the admin front screen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1734) Add pid file to snappuller to skip script overruns, and recover from failure
Add pid file to snappuller to skip script overruns, and recover from failure Key: SOLR-1734 URL: https://issues.apache.org/jira/browse/SOLR-1734 Project: Solr Issue Type: Improvement Components: replication (scripts) Affects Versions: 1.4 Reporter: Bill Au Assignee: Bill Au Priority: Minor The pid file will allow snappuller to be run as fast as possible without overruns. Also it will recover from a last failed run should an older snappuller process no longer be running. The same has already been done to snapinstaller in SOLR-990. Overlapping snappuller could cause replication traffic to saturate the network if a large Solr index is being replicated to a large number of clients. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805235#action_12805235 ] Uri Boness commented on SOLR-1725: -- Lance, I lost you a bit as well. bq. Uri, I'd prefer if the manner of configuration was as similar as possible, i.e. if we could get rid of the lst name=params part, and instead pass all top-level params directly to the script (except the scripts param itself). Hmm... personally I prefer configurations that clearly indicate their purpose. leaving out the _params_ list will make things a bit confusing - some parameters are available for the scripts, others are not... it's not really clear. bq. manner of configuration was as similar as possible The configuration are similar. All elements in solrconfig.xml have one standard way of configuration which can be anything from a _lst_, _bool_, _str_, etc Tomorrow a new processor will popup which will also require a _lst_ configuration... and that's fine. bq.Even better if the definition of a processor was in a separate xml section and then refer by name only in each chain, but that is a bigger change outside scope of this patch. Well, indeed that's a bigger change. Like everything, this kind of configuration has it's proscons. I guess it's best if people will just state their preferences regarding how they would like to see this processor configured and based on that I'll adjust the patch. Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1734) Add pid file to snappuller to skip script overruns, and recover from failure
[ https://issues.apache.org/jira/browse/SOLR-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Au updated SOLR-1734: -- Attachment: SOLR-1734.patch I am reusing the code from SOLR-990 which adds the same feature to snapinstaller. I have added a -f command line argument to force the snappuller to run even if one is already running. That will be useful if network capacity is not an issue. Add pid file to snappuller to skip script overruns, and recover from failure Key: SOLR-1734 URL: https://issues.apache.org/jira/browse/SOLR-1734 Project: Solr Issue Type: Improvement Components: replication (scripts) Affects Versions: 1.4 Reporter: Bill Au Assignee: Bill Au Priority: Minor Attachments: SOLR-1734.patch The pid file will allow snappuller to be run as fast as possible without overruns. Also it will recover from a last failed run should an older snappuller process no longer be running. The same has already been done to snapinstaller in SOLR-990. Overlapping snappuller could cause replication traffic to saturate the network if a large Solr index is being replicated to a large number of clients. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: how to sort facets?
David Rühr wrote: hi, we make a Filter with Faceting feature. In our faceting list the order is by count by the matches: facet.sort=count but we need to sort by = facet.sort=manufacturer. Url manipulation doesn't change anything, why? select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 so long, David Try facet.sort=index. facet.sort accepts only count or index. http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort Koji -- http://www.rondhuit.com/en/
[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805247#action_12805247 ] Hoss Man commented on SOLR-1725: Some random comments/questions from the peanut gallery... 1) what is the value add in making ScriptUpdateProcessorFactory support multiple scripts ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor? 2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ... {code} processor class=solr.ScriptUpdateProcessorFactory lst name=scripts lst name=updateProcessor1.js bool name=someParamNametrue/bool int name=someOtherParamName3/int /lst lst name=updateProcessor2.js bool name=fooParamtrue/bool str name=barParam3/str /lst /lst lst name=otherProcessorOPtionsIfNeeded ... /lst /processor {code} Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Upgrading to the latest versions of wstx jars
: It has been a long time since the Woodstox jars have been updated. Is : this intentional, as in are there any issues if we use the latest jars : with Solr? I have no idea ... dependencies tend to get updated when people point out that newer releases have bug fixes or new features we want to take advantage of. If you could test out hte new version, and then open a Jira (to track upgrading the jar) where you posted your comments there about letting us know if it works better (or just plain works) that would be very helpful. -Hoss
Re: Planned release date for 1.5 with SOLR-236 fixed?
: Would anybody happen to know the planned release date for 1.5? Release dates don't tend to be explicitly planned in advance. Instead releases tend to coalesce when the community feels that new features warrant a new release, and that the APIs introduced by those new features are ready to be considered stable and supported. : And if so, whether or not the final fix for SOLR-236 will be included. I'm not really sure how to answer that ... SOLR-236 is not a bug report, it proposes a new feature -- so there is no fix Many, MANY people are interested in various aspects of the feature proposed, and i know lots of people are looking to try and get some version(s) of that functionality committed, but the answer to wether or not it will be included in 1.5 will depend on when 1.5 happens and wether some version of hte functionality is committed to the trunk prior to that release. -Hoss
[jira] Commented: (SOLR-1725) Script based UpdateRequestProcessorFactory
[ https://issues.apache.org/jira/browse/SOLR-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805254#action_12805254 ] Uri Boness commented on SOLR-1725: -- bq. 1) what is the value add in making ScriptUpdateProcessorFactory support multiple scripts ? ... wouldn't it be simpler to require that users declare multiple instances of ScriptUpdateProcessorFactory (that hte processor chain already executes in sequence) then to add sequential processing to the ScriptUpdateProcessor? Well... to my taste it makes the configuration cleaner (no need to define several script processors). The thing is, you have the choice here - either specify several scripts (comma separated) or split them to several processors. bq. 2) The NamedList init args can be as deep of a data structure as you want, so something like this would be totally feasible (if desired) ... That's definitely another option. The only thing is that you'd probably want some way to define shared parameters (shared between the scripts that is) and not be forced to specify them several times for each script. I guess you can do something like this: {code} processor class=solr.ScriptUpdateProcessorFactory lst name=sharedParams bool name=paramNametrue/bool /lst lst name=scripts lst name=updateProcessor1.js bool name=someParamNametrue/bool int name=someOtherParamName3/int /lst lst name=updateProcessor2.js bool name=fooParamtrue/bool str name=barParam3/str /lst /lst lst name=otherProcessorOPtionsIfNeeded ... /lst /processor {code} Script based UpdateRequestProcessorFactory -- Key: SOLR-1725 URL: https://issues.apache.org/jira/browse/SOLR-1725 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.4 Reporter: Uri Boness Attachments: SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch, SOLR-1725.patch A script based UpdateRequestProcessorFactory (Uses JDK6 script engine support). The main goal of this plugin is to be able to configure/write update processors without the need to write and package Java code. The update request processor factory enables writing update processors in scripts located in {{solr.solr.home}} directory. The functory accepts one (mandatory) configuration parameter named {{scripts}} which accepts a comma-separated list of file names. It will look for these files under the {{conf}} directory in solr home. When multiple scripts are defined, their execution order is defined by the lexicographical order of the script file name (so {{scriptA.js}} will be executed before {{scriptB.js}}). The script language is resolved based on the script file extension (that is, a *.js files will be treated as a JavaScript script), therefore an extension is mandatory. Each script file is expected to have one or more methods with the same signature as the methods in the {{UpdateRequestProcessor}} interface. It is *not* required to define all methods, only those hat are required by the processing logic. The following variables are define as global variables for each script: * {{req}} - The SolrQueryRequest * {{rsp}}- The SolrQueryResponse * {{logger}} - A logger that can be used for logging purposes in the script -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1735) shut down TimeLimitedCollection timer thread on application unload
shut down TimeLimitedCollection timer thread on application unload -- Key: SOLR-1735 URL: https://issues.apache.org/jira/browse/SOLR-1735 Project: Solr Issue Type: Improvement Affects Versions: 1.4, 1.3 Reporter: Chris Darroch As described in https://issues.apache.org/jira/browse/LUCENE-2237, shutting down the timer thread created by Lucene's TimeLimitedCollector allows Tomcat or another application server to cleanly unload solr.war (or any application using Lucene, for that matter). I'm attaching two patches for Solr 1.3 which use the patch provided in LUCENE-2237 to shut down the timer thread when a new servlet context listener for the solr.war application is informed the application is about to be unloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1735) shut down TimeLimitedCollection timer thread on application unload
[ https://issues.apache.org/jira/browse/SOLR-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Darroch updated SOLR-1735: Attachment: SOLR-1735-1_3.patch shut down TimeLimitedCollection timer thread on application unload -- Key: SOLR-1735 URL: https://issues.apache.org/jira/browse/SOLR-1735 Project: Solr Issue Type: Improvement Affects Versions: 1.3, 1.4 Reporter: Chris Darroch Attachments: SOLR-1735-1_3.patch As described in https://issues.apache.org/jira/browse/LUCENE-2237, shutting down the timer thread created by Lucene's TimeLimitedCollector allows Tomcat or another application server to cleanly unload solr.war (or any application using Lucene, for that matter). I'm attaching two patches for Solr 1.3 which use the patch provided in LUCENE-2237 to shut down the timer thread when a new servlet context listener for the solr.war application is informed the application is about to be unloaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Date Facet duplicate counts
: should we be inclusive of the lower or the upper? ... even if we make it : an option, how should it apply to the first and last ranges computed? : do the answers change if facet.date.other includes before and/or after : should the between option be inclusive of both end points as well? : I guess to be consistent, the 'inclusiveness' should be intrsinically handled in : detection of '[', '{' and/or ']', '}' -- i.e. match the facet.date field with the corresponding field : in the query - e.g.: q= *:* AND timestamp:[then TO now}date.facet=timestamp . ...that only works if the datefield being faceted on is included in the -- which is frequently not the case, particularly on the first request of a session, where you want to face on date, buthte user has not yet made any attempt to restrict by any of those facets. : If no such token exists in the query, perhaps the date.facet token parsing could process : an option ala: date.facet=[timestamp} to explicitly set the edge behaviour, or to override a match : in the query parser tokenization. : : This way, there's no new explicit option; it would work with existing queries (no extra []{}'s = default behaviour); : and people could easily add it if they need custom edge behaviour. I suppose ... but it still doesn't address some of hte outstanding questions i pointed out before (handling the first/last range in the block ... ie: i want inclusive of the lower, exclusive of hte upper, except for the last range which should be inclusive of both). Personally i think addign a new option is just as clear as adding markup to the date.facet param parsing ... the less we make assumptions about what special characters people have in their fieldnames the better. : Another way to deal with it is to add MILLISECOND logic to the DateMathParser. Then the '1ms' adjustment : at one and/or the other could be done by the caller at query time, leaving the stored data intact, and leaving : the server-side date faceting as it is. In fact, this can be done today using SECOND, but can be a problem if: : - You're using HOURS or DAYS and don't want to convert to SECONDS each time, or : - You need granularity to the SECOND I don't follow you at all ... yes this can be done today, but i don't understand what you mean about needing to convert to seconds, or requiring second granularity. If you don't index with millisecond precision, then no matter what precision you index with, this example would let you get ranges including the lower bound, but not the upper bound of each range using a 1ms fudge ... facet.date=timestamp facet.date.start=NOW/DAY-5DAYS-1MILLI facet.date.end=NOW/DAY+1DAY-1MILLI facet.date.gap=+1DAY Brainstorming a bit... I think the semantics that might make the most sense is to add a multivalued facet.date.include param that supports the following options: all, lower, upper, edge, outer - all is shorthand for lower,upper,edge,outer and is the default (for back compat) - if lower is specified, then all ranges include their lower bound - if upper is specified, then all ranges include their upper bound - if edge is specified, then the first and last ranges include their edge bounds (ie: lower for the first one, upper for the last one) even if the corrisponding upper/lower option is not specified. - the between count is inclusive of each of the start and end bounds iff the first and last range are inclusive of them - the before and after ranges are inclusive of their respective bounds if: - outer is specified ... OR ... - the first and last ranges don't already include them so assuming you started with something like... facet.date.start=1 facet.date.end=3 facet.date.gap=+1 facet.date.other=all ...your ranges would be... [1 TO 2], [2 TO 3] and [* TO 1], [1 TO 3], [3 TO *] w/ facet.date.include=lower ... [1 TO 2}, [2 TO 3} and [* TO 1}, [1 TO 3}, [3 TO *] w/ facet.date.include=upper ... {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], {3 TO *] w/ facet.date.include=lowerfacet.date.include=edge ... [1 TO 2}, [2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *] w/ facet.date.include=upperfacet.date.include=edge ... [1 TO 2], {2 TO 3] and [* TO 1}, [1 TO 3], {3 TO *] w/ facet.date.include=upperfacet.date.include=outer ... {1 TO 2], {2 TO 3] and [* TO 1], {1 TO 3], [3 TO *] ...etc. what do you think? -Hoss
[jira] Commented: (SOLR-1728) ResponseWriters should support byte[], ByteBuffer
[ https://issues.apache.org/jira/browse/SOLR-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805291#action_12805291 ] Hoss Man commented on SOLR-1728: Noble: your issue description is a bit terse, so i'm a little confused. Are you suggesting an API change such that binary write methods are added to QueryResponseWriter (making it equivalent to BinaryQueryResponseWriter) ? Or are you suggesting that the existing classes which implement QueryResponseWriter ( JSONResponseWriter, PHPResponseWriter, PythonResponseWriter, XMLResponseWriter, etc...) should start implementing BinaryQueryResponseWriter? In either case: what's the motivation? ResponseWriters should support byte[], ByteBuffer - Key: SOLR-1728 URL: https://issues.apache.org/jira/browse/SOLR-1728 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 Only BinaryResponseWriter supports byte[] and ByteBuffer. Other writers also should support these -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805303#action_12805303 ] Peter Wolanin commented on SOLR-1553: - some commented out debug code left in the committed parser? {code} protected void addClause(List clauses, int conj, int mods, Query q) { //System.out.println(addClause:clauses=+clauses+ conj=+conj+ mods=+mods+ q=+q); super.addClause(clauses, conj, mods, q); } {code} extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 1.5 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1729) Date Facet now override time parameter
[ https://issues.apache.org/jira/browse/SOLR-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805313#action_12805313 ] Hoss Man commented on SOLR-1729: bq. (e.g. they are in a different time-zone, not time-synced etc.). time-zones should be irrelevant since all calculations are done in UTC ... lack of time-sync is a legitimate concern, but the more serious problem is distributed requests and network lag. Even if all of the boxes have synchronized clocks, they might not all get queried at the exact same time, and multiple requets might be made to a single server for different phrases of the distributed request that expect to get the same answers. It should be noted that while adding support to date faceting for this type of when is now? is certainly _necessary_ to make distributed date faceting work sanely, it is not _sufficient_ ... unless filter queries that use date math also respect it the counts returned from date faceting will still potentially be non-sensical. Date Facet now override time parameter -- Key: SOLR-1729 URL: https://issues.apache.org/jira/browse/SOLR-1729 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Environment: Solr 1.4 Reporter: Peter Sturge Priority: Minor Attachments: FacetParams.java, SimpleFacets.java This PATCH introduces a new query parameter that tells a (typically, but not necessarily) remote server what time to use as 'NOW' when calculating date facets for a query (and, for the moment, date facets *only*) - overriding the default behaviour of using the local server's current time. This gets 'round a problem whereby an explicit time range is specified in a query (e.g. timestamp:[then0 TO then1]), and date facets are required for the given time range (in fact, any explicit time range). Because DateMathParser performs all its calculations from 'NOW', remote callers have to work out how long ago 'then0' and 'then1' are from 'now', and use the relative-to-now values in the facet.date.xxx parameters. If a remote server has a different opinion of NOW compared to the caller, the results will be skewed (e.g. they are in a different time-zone, not time-synced etc.). This becomes particularly salient when performing distributed date faceting (see SOLR-1709), where multiple shards may all be running with different times, and the faceting needs to be aligned. The new parameter is called 'facet.date.now', and takes as a parameter a (stringified) long that is the number of milliseconds from the epoch (1 Jan 1970 00:00) - i.e. the returned value from a System.currentTimeMillis() call. This was chosen over a formatted date to delineate it from a 'searchable' time and to avoid superfluous date parsing. This makes the value generally a programatically-set value, but as that is where the use-case is for this type of parameter, this should be ok. NOTE: This parameter affects date facet timing only. If there are other areas of a query that rely on 'NOW', these will not interpret this value. This is a broader issue about setting a 'query-global' NOW that all parts of query analysis can share. Source files affected: FacetParams.java (holds the new constant FACET_DATE_NOW) SimpleFacets.java getFacetDateCounts() NOW parameter modified This PATCH is mildly related to SOLR-1709 (Distributed Date Faceting), but as it's a general change for date faceting, it was deemed deserving of its own patch. I will be updating SOLR-1709 in due course to include the use of this new parameter, after some rfc acceptance. A possible enhancement to this is to detect facet.date fields, look for and match these fields in queries (if they exist), and potentially determine automatically the required time skew, if any. There are a whole host of reasons why this could be problematic to implement, so an explicit facet.date.now parameter is the safest route. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [PMX:FAKE_SENDER] Re: large OR-boolean query
: for reasons forthcoming. The QParser then just returns a : ConstantScoreQuery wrapped around a Filter subclass that I wrote which uses : these Terms. The Filter subclass does most of the work. If there is a lot of overlap in the Terms that are used from query to query, you might find it more efficient to construct individual TermFilters for each term, and utilize the filterCache to reuse them from request to request -- then your plugin (it would probably have to be a SearchComponent instead of a QParser) would only need to find the union of the individual DocSets : Correct me if I'm wrong, but it seemed important to have my input terms in : natural order of a TreeSet in order to take advantage of the seek() approach : to TermDocs (presuming it is sort of like a database cursor?). (I believe) You are correct .. seek can only move ahead. : In any event, we're getting rougly 2-3 second query times, with an : additional 1-2 seconds parsing input from the request. so our local client : app sees about a 6-8 second roundtrip on it's queries, with faceting turned : on. For such a large query: not bad! Unless the individual terms tend to be extermely unique, or your are opening a new search extremeley frequently, i would suggest you try the filterCache and DocSet union based approach DocSet main = BitDocSet(); for (Term t : myTerms) { main = searcher.getDocSet(new TermQuery(t)).union(main); } -Hoss
[jira] Commented: (SOLR-1728) ResponseWriters should support byte[], ByteBuffer
[ https://issues.apache.org/jira/browse/SOLR-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805357#action_12805357 ] Noble Paul commented on SOLR-1728: -- Everything works now in non-distributed search because , the BinaryField takes care of writing out the data as strings. In distributed search ,when the writers have to emit SolrDocument and if it contains byte[], XML, JSON and other response writers would do a toString() on the byte[]. ResponseWriters should support byte[], ByteBuffer - Key: SOLR-1728 URL: https://issues.apache.org/jira/browse/SOLR-1728 Project: Solr Issue Type: Improvement Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 Only BinaryResponseWriter supports byte[] and ByteBuffer. Other writers also should support these -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1736) In the slave , If 'mov'ing file does not succeed , copy the file
In the slave , If 'mov'ing file does not succeed , copy the file Key: SOLR-1736 URL: https://issues.apache.org/jira/browse/SOLR-1736 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 1.4 Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 A user has reported instances where the file#renameTo failing and replication fails. if renameTo does not succeed try doing a manual copy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1737) Add a FieldStreamDataSource
Add a FieldStreamDataSource --- Key: SOLR-1737 URL: https://issues.apache.org/jira/browse/SOLR-1737 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Noble Paul Assignee: Noble Paul Priority: Minor Fix For: 1.5 TikaEntityProcessor needs a DataSource which returns a Stream instead of a Reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.