Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue
You are right. The fix committed to source was not complete. I've reopened SOLR-3336 and I will put up a test and fix. https://issues.apache.org/jira/browse/SOLR-3336 On Mon, Aug 26, 2013 at 9:41 AM, harshchawla ha...@livecareer.com wrote: In the second reply of this link, it is discussed and more over I am facing the same issue here: http://stackoverflow.com/questions/15734308/solrentityprocessor-is-called-only-once-for-sub-entities?lq=1. See attached my data-config.xml of new core (let say) test entity name=can dataSource=dsms query=select candidateid from Candidate c field column=CandidateID name=candidateid / entity name=dt processor=SolrEntityProcessor url=http://localhost:8983/solr/csearch; query=candidateid :${can.CandidateID} fl=* /entity entity name=psu dataSource=dsms query=select Value from [CandidateData] where candidateid=${can.CandidateID} field column=Value name=psu/ /entity /entity Here only the first record is getting parsed properly otherwise for all the remaining records only two field are coming in new core test even though core csearch contains all the field values for all the records. I hope it clarifies my situation more, -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086564.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue
Thanks a lot in advance. I am eagerly waiting for your response. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086572.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom names for replicas in solrcloud
Hi smanad If I do not make a mistake, You can append the coreNodeName parameter to your creation command: http://10.7.23.125:8080/solr/admin/cores?action=CREATEname=dfscore8_3shard=shard3_3collection.configName=myconfschema=schema.xmlconfig=solrconfig3.xmlcollection=collection1dataDir=/soData/; coreNodeName=heihei May it be helpful Regards 2013/8/23 smanad sma...@gmail.com Hi, I am using Solr 4.3 with 3 solr hosta and with an external zookeeper ensemble of 3 servers. And just 1 shard currently. When I create collections using collections api it creates collections with names, collection1_shard1_replica1, collection1_shard1_replica2, collection1_shard1_replica3. Is there any way to pass a custom name? or can I have all the replicas with same name? Any pointers will be much appreciated. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html Sent from the Solr - User mailing list archive at Nabble.com.
SimpleFacet feature combinations..
Hi folks, Some of the features of SimpleFacet can't be combined -- the most notable missing combination being range + pivot. Another combination which we'd find very useful is integration with StatsComponent (pivot/ranged stats). Is anyone working on this? Or willing to work on this? This is a rather important feature for us, one which we currently implement by launching N+1 queries (or worse). Given the importance, I would be willing and able to donate some of my time to work on this. However, not being very familiar with the solr internals, it would probably be easier to team up with someone else on this? If anyone is interested, feel free to get in touch. - Bram
Re: Dropping Caches of Machine That Solr Runs At
Hi Walter; You are right about performance. However when I index documents on a machine that has a high percentage of Physical Memory usage I get EOF errors? 2013/8/26 Walter Underwood wun...@wunderwood.org On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote: Sometimes Physical Memory usage of Solr is over %99 and this may cause problems. Do you run such kind of a command periodically: sudo sh -c sync; echo 3 /proc/sys/vm/drop_caches to force dropping caches of machine that Solr runs at and avoid problems? This is a terrible idea. The OS automatically manages the file buffers. When they are all used, that is a good thing, because it reduced disk IO. After this, no files will be cached in RAM. Every single read from a file will have to go to disk. This will cause very slow performance until the files are recached. Recently, I did exactly the opposite to improve performance in our Solr installation. Before starting the Solr process, a script reads every file in the index so that it will already be in file buffers. This avoids several minutes of high disk IO and slow performance after startup. wunder Search Guy, Chegg.com
Re: Tokenization at query time
Hi Erick, escaping spaces doesn't work... Briefly, - In a document I have an ISBN field that (stored value) is *978-90-04-23560-1* - In the index I have this value: *9789004235601* Now, I want be able to search the document by using: 1) q=*978-90-04-23560-1* 2) q=*978 90 04 23560 1* 3) q=*9789004235601* 1 and 3 works perfectly, 2 doesn't work. My code is: /SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.getParameter(q)));/ isbn is declared in this way fieldtype name=isbn_issn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=*solr.KeywordTokenizerFactory*/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/ /analyzer /fieldtype field name=isbn_issn_search type=issn_isbn indexed=true/ search handler is: requestHandler name=any_bc class=solr.SearchHandler default=true lst name=defaults str name=defType*dismax*/str str name=mm100%/str str name=qf *isbn_issn_search*^1 /str str name=pf *isbn_issn_search*^10 /str int name=ps0/int float name=tie0.1/float ... /requestHandler This is what I get: *1) 978-90-04-23560-1** *path=/select params={start=0q=*978\-90\-04\-23560\-1*sfield=qt=any_bcwt=javabinrows=10version=2} *hits=1* status=0 QTime=5* 2) ***9789004235601* *webapp=/solr path=/select params={start=0q=*9789004235601*sfield=qt=any_bcwt=javabinrows=10version=2} *hits=1* status=0 QTime=5* 3) **978 90 04 23560 1** *path=/select params={start=0*q=978\+90\+04\+23560\+1*sfield=qt=any_bcwt=javabinrows=10version=2} *hits=0 *status=0 QTime=2* *Extract from queryDebug=true: str name=q978\ 90\ 04\ 23560\ 1/str ... str name=rawquerystring978\ 90\ 04\ 23560\ 1/str str name=querystring978\ 90\ 04\ 23560\ 1/str ... str name=parsedquery +((DisjunctionMaxQuery((isbn_issn_search:*978*^1.0)~0.1) DisjunctionMaxQuery((isbn_issn_search:*90*^1.0)~0.1) DisjunctionMaxQuery((isbn_issn_search:*04*^1.0)~0.1) DisjunctionMaxQuery((isbn_issn_search:*23560*^1.0)~0.1) DisjunctionMaxQuery((isbn_issn_search:*1*^1.0)~0.1))~5) DisjunctionMaxQuery((isbn_issn_search:*9789004235601*^10.0)~0.1) /str Probably this is a very stupid question but I'm going crazy. In this page http://wiki.apache.org/solr/DisMaxQParserPlugin *Query Structure* /For each word in the query string, dismax builds a DisjunctionMaxQuery object for that word across all of the fields in the //qf//param... /And seems exactly what it is doing...but what is a word? How can I force//(without using double quotes) spaces in a way that they are considered part of the word/? /Many many many thanks Andrea On 08/13/2013 04:18 PM, Erick Erickson wrote: I think you can get what you want by escaping the space with a backslash YMMV of course. Erick On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, sorry if that wasn't clear: this is what I'm actually observing in my application. I wrote the first post after looking at the explain (debugQuery=true): the query q=mag 778 G 69 is translated as follow: / +((DisjunctionMaxQuery((//**myfield://*mag*//^3000.0)~0.1) DisjunctionMaxQuery((//**myfield://*778*//^3000.0)~0.1) DisjunctionMaxQuery((//**myfield://*g*//^3000.0)~0.1) DisjunctionMaxQuery((//**myfield://*69*//^3000.0)~0.1))**~4) DisjunctionMaxQuery((//**myfield://*mag778g69*//^3.**0)~0.1)/ It seems that althouhg I declare myfield with this type /fieldtype name=type1 class=solr.TextField analyzer tokenizer class=solr.**KeywordTokenizerFactory* / filter class=solr.**LowerCaseFilterFactory / filter class=solr.**WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1**splitOnCaseChange=0 / /analyzer /fieldtype /SOLR is tokenizing it therefore by producing several tokens (mag,778,g,69)/ / And I can't put double quotes on the query (q=mag 778 G 69) because the request handler searches also in other fields (with different configuration chains) As I understood the query parser, (i.e. query time), does a whitespace tokenization on its own before invoking my (query-time) chain. The same doesn't happen at index time...this is my problem...because at index time the field is analyzed exactly as I want...but unfortunately cannot say the same at query time. Sorry for my wonderful english, did you get the point? On 08/13/2013 02:18 PM, Erick Erickson wrote: On a quick scan I don't see a problem here. Attach debug=query to your url and that'll show you the parsed query, which will in turn show you what's been pushed
Re: Solr 4.2.1 update to 4.3/4.4 problem
Hello All, I am still facing the same issue. Case insensitive search isnot working on Solr 4.3 I am using the below configurations in schema.xml fieldType name=string_lower_case class=solr.TextField sortMissingLast=true omitNorms=true analyzer type = index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Basically I want my string which could have spaces or characters like '-' or \ to be searched upon case insensitively. Please help. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Caused by: java.net.SocketException: Connection reset by peer: socket write error solr querying
Hi Greg, thanks for reply, I tried to set the maxIdleTime to 30 milliSeconds. But still getting same error. WARN - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response; Committed before 500 {msg=Connection reset by peer: socket write error,trace=org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914) at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.SocketException: Connection reset by peer: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375) at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:164) at org.eclipse.jetty.io.bio.StreamEndPoint.flush(StreamEndPoint.java:182) at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:841) ... 37 more ,code=500} WARN - 2013-08-26 09:44:29.060; org.eclipse.jetty.servlet.ServletHandler; /solr/324/select java.lang.IllegalStateException: Committed at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1144) at org.eclipse.jetty.server.Response.sendError(Response.java:314) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:695) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at
ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher
470665 [commitScheduler-14-thread-1] ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1522) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1634) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:574) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.ClassCastException -- View this message in context: http://lucene.472066.n3.nabble.com/ERROR-org-apache-solr-update-CommitTracker-auto-commit-error-org-apache-solr-common-SolrException-Err-tp4086576.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 update to 4.3/4.4 problem
I have also re indexed the data and tried. And also tried with the belowl fieldType name=string_lower_case class=solr.TextField sortMissingLast=true omitNorms=true analyzer type = index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = select tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This didnt work as well... On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] ml-node+s472066n4086601...@n3.nabble.com wrote: Hello All, I am still facing the same issue. Case insensitive search isnot working on Solr 4.3 I am using the below configurations in schema.xml fieldType name=string_lower_case class=solr.TextField sortMissingLast=true omitNorms=true analyzer type = index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Basically I want my string which could have spaces or characters like '-' or \ to be searched upon case insensitively. Please help. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenization at query time
Andrea: Works for me, admittedly through the browser I suspect the problem is here: ClientUtils.**escapeQueryChars That doesn't do anything about escaping the spaces, it just handles characters that have special meaning to the query syntax, things like +- etc. Using your field definition, this: http://localhost:8983/solr/select?wt=jsonq=ab\ cd\ efdebug=querydefType=edismaxqf=name eoe produced this output.. - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)), where the field eoe is your isbn_issn type. Best, Erick On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, escaping spaces doesn't work... Briefly, - In a document I have an ISBN field that (stored value) is *978-90-04-23560-1* - In the index I have this value: *9789004235601* Now, I want be able to search the document by using: 1) q=*978-90-04-23560-1* 2) q=*978 90 04 23560 1* 3) q=*9789004235601* 1 and 3 works perfectly, 2 doesn't work. My code is: /SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.** getParameter(q)));/ isbn is declared in this way fieldtype name=isbn_issn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=*solr.**KeywordTokenizerFactory*/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/ /analyzer /fieldtype field name=isbn_issn_search type=issn_isbn indexed=true/ search handler is: requestHandler name=any_bc class=solr.SearchHandler default=true lst name=defaults str name=defType*dismax*/str str name=mm100%/str str name=qf *isbn_issn_search*^1 /str str name=pf *isbn_issn_search*^10 /str int name=ps0/int float name=tie0.1/float ... /requestHandler This is what I get: *1) 978-90-04-23560-1** *path=/select params={start=0q=*978\-90\-**04\-23560\-1*sfield=qt=any_* *bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5* 2) ***9789004235601* *webapp=/solr path=/select params={start=0q=*** 9789004235601*sfield=qt=any_**bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5* 3) **978 90 04 23560 1** *path=/select params={start=0*q=978\+90\+**04\+23560\+1*sfield=qt=any_* *bcwt=javabinrows=10version=**2} *hits=0 *status=0 QTime=2* *Extract from queryDebug=true: str name=q978\ 90\ 04\ 23560\ 1/str ... str name=rawquerystring978\ 90\ 04\ 23560\ 1/str str name=querystring978\ 90\ 04\ 23560\ 1/str ... str name=parsedquery +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1) DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1) DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1) DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1) DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5) DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^** 10.0)~0.1) /str --**-- Probably this is a very stupid question but I'm going crazy. In this page http://wiki.apache.org/solr/**DisMaxQParserPluginhttp://wiki.apache.org/solr/DisMaxQParserPlugin *Query Structure* /For each word in the query string, dismax builds a DisjunctionMaxQuery object for that word across all of the fields in the //qf//param... /And seems exactly what it is doing...but what is a word? How can I force//(without using double quotes) spaces in a way that they are considered part of the word/? /Many many many thanks Andrea On 08/13/2013 04:18 PM, Erick Erickson wrote: I think you can get what you want by escaping the space with a backslash YMMV of course. Erick On Tue, Aug 13, 2013 at 9:11 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, sorry if that wasn't clear: this is what I'm actually observing in my application. I wrote the first post after looking at the explain (debugQuery=true): the query q=mag 778 G 69 is translated as follow: / +((DisjunctionMaxQuery((//myfield://*mag*//^3000.0)~0.1) DisjunctionMaxQuery((//myfield://*778*//^3000.0)~0.1) DisjunctionMaxQuery((//myfield://*g*//^3000.0)~0.1) DisjunctionMaxQuery((//myfield://*69*//^3000.0)~0.1))~4) DisjunctionMaxQuery((//myfield://*mag778g69*//^3.** **0)~0.1)/ It seems that althouhg I declare myfield with this type /fieldtype name=type1 class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory* / filter class=solr.LowerCaseFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1
Re: Tokenization at query time
Hi Erick, sorry I forgot the SOLR version...is the 3.6.0 ClientUtils in that version does whitespace escaping: public static String escapeQueryChars(String s) { StringBuilder sb = new StringBuilder(); for (int i = 0; i s.length(); i++) { char c = s.charAt(i); // These characters are part of the query syntax and must be escaped if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c == ')' || c == ':' || c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || c == '}' || c == '~' || c == '*' || c == '?' || c == '|' || c == '' || c == ';' || Character.isWhitespace(c)) { sb.append('\\'); } sb.append(c); } return sb.toString(); } Now, I solved the issue but not really sure about that. Debugging the code I saw that the query string (on the SearchHandler) 978\ 90\ 04\ 23560\ 1 once passed through DismaxQueryParser (specifically through SolrPluginUtils.partialEscape(CharSequence) becames 978\\ 90\\ 04\\ 23560\\ 1 because that method escapes the backslashes So, using the eclipse debugger I removed at runtime the additional backslash and it works perfectly but of course...I can't do that in production for every search :D So, just to try I changed dismax in edismax which, I saw, doesn't call SolrPluginUtilsand it works perfectly! I saw in your query string that you used edismax too...maybe is that the point? Many thanks Andrea On 08/26/2013 02:47 PM, Erick Erickson wrote: Andrea: Works for me, admittedly through the browser I suspect the problem is here: ClientUtils.**escapeQueryChars That doesn't do anything about escaping the spaces, it just handles characters that have special meaning to the query syntax, things like +- etc. Using your field definition, this: http://localhost:8983/solr/select?wt=jsonq=ab\ cd\ efdebug=querydefType=edismaxqf=name eoe produced this output.. - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)), where the field eoe is your isbn_issn type. Best, Erick On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, escaping spaces doesn't work... Briefly, - In a document I have an ISBN field that (stored value) is *978-90-04-23560-1* - In the index I have this value: *9789004235601* Now, I want be able to search the document by using: 1) q=*978-90-04-23560-1* 2) q=*978 90 04 23560 1* 3) q=*9789004235601* 1 and 3 works perfectly, 2 doesn't work. My code is: /SolrQuery query = new SolrQuery(ClientUtils.**escapeQueryChars(req.** getParameter(q)));/ isbn is declared in this way fieldtype name=isbn_issn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=*solr.**KeywordTokenizerFactory*/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/ /analyzer /fieldtype field name=isbn_issn_search type=issn_isbn indexed=true/ search handler is: requestHandler name=any_bc class=solr.SearchHandler default=true lst name=defaults str name=defType*dismax*/str str name=mm100%/str str name=qf *isbn_issn_search*^1 /str str name=pf *isbn_issn_search*^10 /str int name=ps0/int float name=tie0.1/float ... /requestHandler This is what I get: *1) 978-90-04-23560-1** *path=/select params={start=0q=*978\-90\-**04\-23560\-1*sfield=qt=any_* *bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5* 2) ***9789004235601* *webapp=/solr path=/select params={start=0q=*** 9789004235601*sfield=qt=any_**bcwt=javabinrows=10version=**2} *hits=1* status=0 QTime=5* 3) **978 90 04 23560 1** *path=/select params={start=0*q=978\+90\+**04\+23560\+1*sfield=qt=any_* *bcwt=javabinrows=10version=**2} *hits=0 *status=0 QTime=2* *Extract from queryDebug=true: str name=q978\ 90\ 04\ 23560\ 1/str ... str name=rawquerystring978\ 90\ 04\ 23560\ 1/str str name=querystring978\ 90\ 04\ 23560\ 1/str ... str name=parsedquery +((DisjunctionMaxQuery((isbn_**issn_search:*978*^1.0)~0.**1) DisjunctionMaxQuery((isbn_**issn_search:*90*^1.0)~0.1) DisjunctionMaxQuery((isbn_**issn_search:*04*^1.0)~0.1) DisjunctionMaxQuery((isbn_**issn_search:*23560*^1.0)~**0.1) DisjunctionMaxQuery((isbn_**issn_search:*1*^1.0)~0.1))**~5) DisjunctionMaxQuery((isbn_**issn_search:*9789004235601*^** 10.0)~0.1) /str --**-- Probably this is a very stupid question but I'm going crazy. In this page http://wiki.apache.org/solr/**DisMaxQParserPluginhttp://wiki.apache.org/solr/DisMaxQParserPlugin *Query Structure* /For each word in the query string, dismax builds a DisjunctionMaxQuery object for that word across all of the fields in the
adding support for deleteInstanceDir from solrj
Hi all, Did anyone have a chance to look at the code? It's attached here: https://issues.apache.org/jira/browse/SOLR-5023. Thank you very much. Lyuba
Re: Adding one core to an existing core?
Dear Solr User, now I have 2 cores collection1 collection2 Default collection is the Collection1 I have two questions: - Is exist a parameter to add in my html link to indicate the selected core? http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I mean by default is the collection1, if I want collection2 I use the link: http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on Is exist a param core=collection2 instead of using a different link? - My second question concerns updating. Actually with one core, I do: java -jar post.jar foo.xml I suppose now I must add the desire core ? no ? i.e.: -Dcore=collection2 What is the param to add in my command line? Thanks a lot ! Bruno Le 22/08/2013 16:23, Andrea Gazzarini a écrit : First, a core is a separate index so it is completely indipendent from the already existing core(s). So basically you don't need to reindex. In order to have two cores (but the same applies for n cores): you must have in your solr.home the file (solr.xml) described here http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29 then, you must obviously have one or two directories (corresponding to the instanceDir attribute). I said one or two because if the indexes configuration is basically the same (or something changes but is dynamically configured - i.e. core name) you can create two instances starting from the same configuration. I mean solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir* / core name=core1 instanceDir=*conf.dir* / /cores /solr Otherwise you must have two different conf directories that contain indexes configuration. You should already have a first one (the current core), you just need to have another conf dir with solrconfig.xml, schema.xml and other required files. In this case each core will have its own instanceDir. solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir.core0* / core name=core1 instanceDir=*conf.dir.core1* / /cores /solr Best, Andrea On 08/22/2013 04:04 PM, Bruno Mannina wrote: Little precision, I'm on Ubuntu 12.04LTS Le 22/08/2013 15:56, Bruno Mannina a écrit : Dear Users, (Solr3.6 + Tomcat7) I use since two years Solr with one core, I would like now to add one another core (a new database). Can I do this without re-indexing my core1 ? could you point me to a good tutorial to do that? (my current database is around 200Go for 86 000 000 docs) My new database will be little, around 1000 documents of 5ko each. thanks a lot, Bruno
Re: Different Responses for 4.4 and 3.5 solr index
Did you check the scoring? (use fl=*,score to retrieve it) .. additionally debugQuery=true might provide more information about how the score was calculated. - Stefan On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote: Hi, The response from 4.4 and 3.5 in the current scenario differs in the sequence in which results are given us back. For example : Response from 3.5 solr is : id:A, id:B, id:C, id:D ... Response from 4.4 solr is : id C, id:A, id:D, id:B... Looking forward your reply. Thanks. Kuchekar, Nilesh On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis matheis.ste...@gmail.com (mailto:matheis.ste...@gmail.com)wrote: Kuchekar (hope that's your first name?) you didn't tell us .. how they differ? do you get an actual error? or does the result contain documents you didn't expect? or the other way round, that some are missing you'd expect to be there? - Stefan On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote: Hi, We get different response when we query 4.4 and 3.5 solr using same query params. My query param are as following : facet=true facet.mincount=1 facet.limit=25 qf=content^0.0+p_last_name^500.0+p_first_name^50.0+strong_topic^0.0+first_author_topic^0.0+last_author_topic^0.0+title_topic^0.0 wt=javabin version=2 rows=10 f.affiliation_org.facet.limit=150 fl=p_id,p_first_name,p_last_name start=0 q=Apple facet.field=affiliation_org fq=table:profile fq=num_content:[*+TO+1500] fq=name:Apple The content in both (solr 4.4 and solr 3.5) are same. The solrconfig.xml from 3.5 an 4.4 are similarly constructed. Is there something I am missing that might have been changed in 4.4, which might be causing this issue. ?. The qf params looks same. Looking forward for your reply. Thanks. Kuchekar, Nilesh
Default query operator OR wont work in some cases
Hi, I have some documents with keywords egg and some with salad and some with egg salad. When I search for egg salad, I expect to see egg results + salad results. I dont see them. egg and salad queries individually work fine. I am using whitespacetokenizer. Not sure if I am missing something. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom names for replicas in solrcloud
Is coreNodeName exposed via collections api? -- View this message in context: http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenization at query time
right, edismax is much preferred, dismax hasn't been formally deprecated, but almost nobody uses it... I'd be really careful about adding whitespace to the list of escape chars because it changes the semantics of the search. While it'll work for this specific case, if you use it in other cases it will change the sense of the query. This may be OK, but be careful, it might be better to do this specifically on an as-needed basis... But you know your problem space best Best, Erick On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, sorry I forgot the SOLR version...is the 3.6.0 ClientUtils in that version does whitespace escaping: public static String escapeQueryChars(String s) { StringBuilder sb = new StringBuilder(); for (int i = 0; i s.length(); i++) { char c = s.charAt(i); // These characters are part of the query syntax and must be escaped if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c == ')' || c == ':' || c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || c == '}' || c == '~' || c == '*' || c == '?' || c == '|' || c == '' || c == ';' || Character.isWhitespace(c)) { sb.append('\\'); } sb.append(c); } return sb.toString(); } Now, I solved the issue but not really sure about that. Debugging the code I saw that the query string (on the SearchHandler) 978\ 90\ 04\ 23560\ 1 once passed through DismaxQueryParser (specifically through SolrPluginUtils.partialEscape(**CharSequence) becames 978\\ 90\\ 04\\ 23560\\ 1 because that method escapes the backslashes So, using the eclipse debugger I removed at runtime the additional backslash and it works perfectly but of course...I can't do that in production for every search :D So, just to try I changed dismax in edismax which, I saw, doesn't call SolrPluginUtilsand it works perfectly! I saw in your query string that you used edismax too...maybe is that the point? Many thanks Andrea On 08/26/2013 02:47 PM, Erick Erickson wrote: Andrea: Works for me, admittedly through the browser I suspect the problem is here: ClientUtils.**escapeQueryChars That doesn't do anything about escaping the spaces, it just handles characters that have special meaning to the query syntax, things like +- etc. Using your field definition, this: http://localhost:8983/solr/**select?wt=jsonq=ab\http://localhost:8983/solr/select?wt=jsonq=ab%5Ccd\ efdebug=querydefType=**edismaxqf=name eoe produced this output.. - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)), where the field eoe is your isbn_issn type. Best, Erick On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, escaping spaces doesn't work... Briefly, - In a document I have an ISBN field that (stored value) is *978-90-04-23560-1* - In the index I have this value: *9789004235601* Now, I want be able to search the document by using: 1) q=*978-90-04-23560-1* 2) q=*978 90 04 23560 1* 3) q=*9789004235601* 1 and 3 works perfectly, 2 doesn't work. My code is: /SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.** getParameter(q)));/ isbn is declared in this way fieldtype name=isbn_issn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=*solr.KeywordTokenizerFactory*/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/ /analyzer /fieldtype field name=isbn_issn_search type=issn_isbn indexed=true/ search handler is: requestHandler name=any_bc class=solr.SearchHandler default=true lst name=defaults str name=defType*dismax*/str str name=mm100%/str str name=qf *isbn_issn_search*^1 /str str name=pf *isbn_issn_search*^10 /str int name=ps0/int float name=tie0.1/float ... /requestHandler This is what I get: *1) 978-90-04-23560-1** *path=/select params={start=0q=*978\-90\- 04\-23560\-1*sfield=qt=any_* *bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5* 2) ***9789004235601* *webapp=/solr path=/select params={start=0q=*** 9789004235601*sfield=qt=any_bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5* 3) **978 90 04 23560 1** *path=/select params={start=0*q=978\+90\+ 04\+23560\+1*sfield=qt=any_* *bcwt=javabinrows=10**version=**2} *hits=0 *status=0 QTime=2* *Extract from queryDebug=true: str name=q978\ 90\ 04\ 23560\ 1/str ... str name=rawquerystring978\ 90\ 04\ 23560\ 1/str str name=querystring978\ 90\ 04\ 23560\ 1/str ... str name=parsedquery
Re: Tokenization at query time
On 08/26/2013 04:09 PM, Erick Erickson wrote: right, edismax is much preferred, dismax hasn't been formally deprecated, but almost nobody uses it... Good to know...I basically use dismax in ALL my SOLR instances :D I'd be really careful about adding whitespace to the list of escape chars because it changes the semantics of the search. While it'll work for this specific case, if you use it in other cases it will change the sense of the query. This may be OK, but be careful, it might be better to do this specifically on an as-needed basis... Yes, that's the reason why I'm not really sure about what I did...I'm running my regression tests...all seems green...let's see But you know your problem space best Best, Erick Thank you very much Best, Gazza On Mon, Aug 26, 2013 at 9:04 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, sorry I forgot the SOLR version...is the 3.6.0 ClientUtils in that version does whitespace escaping: public static String escapeQueryChars(String s) { StringBuilder sb = new StringBuilder(); for (int i = 0; i s.length(); i++) { char c = s.charAt(i); // These characters are part of the query syntax and must be escaped if (c == '\\' || c == '+' || c == '-' || c == '!' || c == '(' || c == ')' || c == ':' || c == '^' || c == '[' || c == ']' || c == '\' || c == '{' || c == '}' || c == '~' || c == '*' || c == '?' || c == '|' || c == '' || c == ';' || Character.isWhitespace(c)) { sb.append('\\'); } sb.append(c); } return sb.toString(); } Now, I solved the issue but not really sure about that. Debugging the code I saw that the query string (on the SearchHandler) 978\ 90\ 04\ 23560\ 1 once passed through DismaxQueryParser (specifically through SolrPluginUtils.partialEscape(**CharSequence) becames 978\\ 90\\ 04\\ 23560\\ 1 because that method escapes the backslashes So, using the eclipse debugger I removed at runtime the additional backslash and it works perfectly but of course...I can't do that in production for every search :D So, just to try I changed dismax in edismax which, I saw, doesn't call SolrPluginUtilsand it works perfectly! I saw in your query string that you used edismax too...maybe is that the point? Many thanks Andrea On 08/26/2013 02:47 PM, Erick Erickson wrote: Andrea: Works for me, admittedly through the browser I suspect the problem is here: ClientUtils.**escapeQueryChars That doesn't do anything about escaping the spaces, it just handles characters that have special meaning to the query syntax, things like +- etc. Using your field definition, this: http://localhost:8983/solr/**select?wt=jsonq=ab\http://localhost:8983/solr/select?wt=jsonq=ab%5Ccd\ efdebug=querydefType=**edismaxqf=name eoe produced this output.. - parsedquery_toString: +(eoe:abcdef | (name:ab name:cd name:ef)), where the field eoe is your isbn_issn type. Best, Erick On Mon, Aug 26, 2013 at 4:55 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: Hi Erick, escaping spaces doesn't work... Briefly, - In a document I have an ISBN field that (stored value) is *978-90-04-23560-1* - In the index I have this value: *9789004235601* Now, I want be able to search the document by using: 1) q=*978-90-04-23560-1* 2) q=*978 90 04 23560 1* 3) q=*9789004235601* 1 and 3 works perfectly, 2 doesn't work. My code is: /SolrQuery query = new SolrQuery(ClientUtils.escapeQueryChars(req.** getParameter(q)));/ isbn is declared in this way fieldtype name=isbn_issn class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=*solr.KeywordTokenizerFactory*/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1 splitOnCaseChange=0/ /analyzer /fieldtype field name=isbn_issn_search type=issn_isbn indexed=true/ search handler is: requestHandler name=any_bc class=solr.SearchHandler default=true lst name=defaults str name=defType*dismax*/str str name=mm100%/str str name=qf *isbn_issn_search*^1 /str str name=pf *isbn_issn_search*^10 /str int name=ps0/int float name=tie0.1/float ... /requestHandler This is what I get: *1) 978-90-04-23560-1** *path=/select params={start=0q=*978\-90\- 04\-23560\-1*sfield=qt=any_* *bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5* 2) ***9789004235601* *webapp=/solr path=/select params={start=0q=*** 9789004235601*sfield=qt=any_bcwt=javabinrows=10**version=**2} *hits=1* status=0 QTime=5* 3) **978 90 04 23560 1** *path=/select params={start=0*q=978\+90\+ 04\+23560\+1*sfield=qt=any_* *bcwt=javabinrows=10**version=**2} *hits=0 *status=0 QTime=2* *Extract from
autoCommit and autoSoftCommit
I'm running Solr 4.3 with: autoCommit maxTime6/maxTIme openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime5000/maxTime /autoSoftCommit When I start Solr and send in a couple of hundred documents, I am able to retrieve documents after 5 seconds using SolrJ. However, from the Solr admin console if I query for *:* it will show that there are docs in the numFound attribute, but none of the results have the stored fields present. As a test I also tried modifying the autoCommit to add maxDocs like this: autoCommit maxDocs100/maxDocs maxTime6/maxTIme openSearcherfalse/openSearcher /autoCommit It seems like with this configuration something different happens... if I send in 150 docs then the first 100 will show up correctly through Solr admin, but the last 50 that didn't hit the maxDocs threshold still don't show the stored fields. Is it expected that maxDocs and maxTime do something different when commiting ? If using autoCommit with openSearcher=false and autoSoftCommit, does the client ever have to send a hard commit with openSearcher=true ? - Bryan
Can a data import handler grab all pages of an RSS feed?
Good morning, I have an IBM Portal atom feed that spans multiple pages. Is there a way to instruct the DIH to grab all available pages? I can put a huge range in but that can be extremely slow with large amounts of XML data. I'm currently using Solr 4.0 final. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-data-import-handler-grab-all-pages-of-an-RSS-feed-tp4086635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dropping Caches of Machine That Solr Runs At
What is the precise error? What kind of machine? File buffers are a robust part of the OS. Unix has had file buffer caching for decades. wunder On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote: Hi Walter; You are right about performance. However when I index documents on a machine that has a high percentage of Physical Memory usage I get EOF errors? 2013/8/26 Walter Underwood wun...@wunderwood.org On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote: Sometimes Physical Memory usage of Solr is over %99 and this may cause problems. Do you run such kind of a command periodically: sudo sh -c sync; echo 3 /proc/sys/vm/drop_caches to force dropping caches of machine that Solr runs at and avoid problems? This is a terrible idea. The OS automatically manages the file buffers. When they are all used, that is a good thing, because it reduced disk IO. After this, no files will be cached in RAM. Every single read from a file will have to go to disk. This will cause very slow performance until the files are recached. Recently, I did exactly the opposite to improve performance in our Solr installation. Before starting the Solr process, a script reads every file in the index so that it will already be in file buffers. This avoids several minutes of high disk IO and slow performance after startup. wunder Search Guy, Chegg.com -- Walter Underwood wun...@wunderwood.org
Re: custom names for replicas in solrcloud
No, it is part of the core admin API. -- Jack Krupansky -Original Message- From: smanad Sent: Monday, August 26, 2013 10:02 AM To: solr-user@lucene.apache.org Subject: Re: custom names for replicas in solrcloud Is coreNodeName exposed via collections api? -- View this message in context: http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Caused by: java.net.SocketException: Connection reset by peer: socket write error solr querying
AnilJayanti, Have you checked your entire stack from the client all the way to solr along with anything between them? Your timeout values should match everywhere and if there's something between the client and server that'll timeout before either the client or server does it'll cause that error as well. A quick google search shows similar causes: http://stackoverflow.com/questions/13719645/comitted-before-500-null-error-in-solr-3-6-1 http://lucene.472066.n3.nabble.com/jetty-error-broken-pipe-td3522120.html How long after the client sends a request does it take for that error to show up in the logs and what happens client side when you see the error? -Original Message- From: aniljayanti [mailto:aniljaya...@yahoo.co.in] Sent: Sunday, August 25, 2013 11:28 PM To: solr-user@lucene.apache.org Subject: RE: Caused by: java.net.SocketException: Connection reset by peer: socket write error solr querying Hi Greg, thanks for reply, I tried to set the maxIdleTime to 30 milliSeconds. But still getting same error. WARN - 2013-08-26 09:44:29.058; org.eclipse.jetty.server.Response; Committed before 500 {msg=Connection reset by peer: socket write error,trace=org.eclipse.jetty.io.EofException at org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:914) at org.eclipse.jetty.http.AbstractGenerator.blockForOutput(AbstractGenerator.java:507) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:170) at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:107) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.util.FastWriter.flush(FastWriter.java:137) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:648) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:375) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.SocketException: Connection reset by peer: socket write error at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.eclipse.jetty.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:375) at
Re: Dropping Caches of Machine That Solr Runs At
It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have CentOS 6.4. While indexing I got that error and I am suspicious about that it is because of high percentage of Physical Memory usage. ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.eclipse.jetty.io.EofException: early EOF at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) at java.io.InputStream.read(InputStream.java:101) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) ... 36 more 2013/8/26 Walter Underwood wun...@wunderwood.org What is the precise error? What kind of machine? File buffers are a robust part of the OS. Unix has had file buffer caching for decades. wunder On Aug 26, 2013, at 1:37 AM, Furkan KAMACI wrote: Hi Walter; You are right about performance. However when I index documents on a machine that has a high percentage of Physical Memory usage I get EOF errors? 2013/8/26 Walter Underwood wun...@wunderwood.org On Aug 25, 2013, at 1:41 PM, Furkan KAMACI wrote:
Re: Dropping Caches of Machine That Solr Runs At
It looks lik that error happens when reading XML from an HTTP request. The XML ends too soon. This should be unrelated to file buffers. wunder On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote: It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have CentOS 6.4. While indexing I got that error and I am suspicious about that it is because of high percentage of Physical Memory usage. ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.eclipse.jetty.io.EofException: early EOF at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) at java.io.InputStream.read(InputStream.java:101) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) ... 36 more 2013/8/26 Walter Underwood wun...@wunderwood.org What is the precise error? What kind of machine? File buffers are a robust part of the OS. Unix has had file buffer caching for decades. wunder On Aug 26, 2013, at 1:37 AM,
Master / Slave Set Up Documentation
Hello, I'm new to this Solr thing, and I was wondering if there is any good / solid documentation on setting up and running replication. I'm going through the Wiki but I am not seeing anything that is obvious there. -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: ERROR org.apache.solr.update.CommitTracker – auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher
On 8/26/2013 1:54 AM, zhaoxin wrote: Caused by: java.lang.ClassCastException Generally when you get this kind of error with Solr, it means you have a mix of old and new jars. This might be from an upgrade, where either the old war expansion doesn't get removed, or from unnecessarily including jars on your classpath. If you are using custom code or a code patch, it probably needs changing for a new Solr version. Thanks, Shawn
Re: Dropping Caches of Machine That Solr Runs At
Hi Walter; You said you are caching your documents. What is average Physical Memory usage of your Solr Nodes? 2013/8/26 Walter Underwood wun...@wunderwood.org It looks lik that error happens when reading XML from an HTTP request. The XML ends too soon. This should be unrelated to file buffers. wunder On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote: It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have CentOS 6.4. While indexing I got that error and I am suspicious about that it is because of high percentage of Physical Memory usage. ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.eclipse.jetty.io.EofException: early EOF at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) at java.io.InputStream.read(InputStream.java:101) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at
Re: Adding one core to an existing core?
Unfortunately, there is no -Dcore property, so you have to due -Durl - java -Durl=http://localhost:8983/solr/collection2/update ... -jar post.jar ... You have the proper /select syntax. -- Jack Krupansky -Original Message- From: Bruno Mannina Sent: Monday, August 26, 2013 9:36 AM To: solr-user@lucene.apache.org Subject: Re: Adding one core to an existing core? Dear Solr User, now I have 2 cores collection1 collection2 Default collection is the Collection1 I have two questions: - Is exist a parameter to add in my html link to indicate the selected core? http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I mean by default is the collection1, if I want collection2 I use the link: http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on Is exist a param core=collection2 instead of using a different link? - My second question concerns updating. Actually with one core, I do: java -jar post.jar foo.xml I suppose now I must add the desire core ? no ? i.e.: -Dcore=collection2 What is the param to add in my command line? Thanks a lot ! Bruno Le 22/08/2013 16:23, Andrea Gazzarini a écrit : First, a core is a separate index so it is completely indipendent from the already existing core(s). So basically you don't need to reindex. In order to have two cores (but the same applies for n cores): you must have in your solr.home the file (solr.xml) described here http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29 then, you must obviously have one or two directories (corresponding to the instanceDir attribute). I said one or two because if the indexes configuration is basically the same (or something changes but is dynamically configured - i.e. core name) you can create two instances starting from the same configuration. I mean solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir* / core name=core1 instanceDir=*conf.dir* / /cores /solr Otherwise you must have two different conf directories that contain indexes configuration. You should already have a first one (the current core), you just need to have another conf dir with solrconfig.xml, schema.xml and other required files. In this case each core will have its own instanceDir. solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir.core0* / core name=core1 instanceDir=*conf.dir.core1* / /cores /solr Best, Andrea On 08/22/2013 04:04 PM, Bruno Mannina wrote: Little precision, I'm on Ubuntu 12.04LTS Le 22/08/2013 15:56, Bruno Mannina a écrit : Dear Users, (Solr3.6 + Tomcat7) I use since two years Solr with one core, I would like now to add one another core (a new database). Can I do this without re-indexing my core1 ? could you point me to a good tutorial to do that? (my current database is around 200Go for 86 000 000 docs) My new database will be little, around 1000 documents of 5ko each. thanks a lot, Bruno
Re: Master / Slave Set Up Documentation
You mean this http://wiki.apache.org/solr/SolrReplication ? What's wrong with this page? It seems clear. I'm widely using replication and the first time I set up a 1 master + 2 slaves by simply following that page On 26 Aug 2013 18:54, Jared Griffith jgriff...@picsauditing.com wrote: Hello, I'm new to this Solr thing, and I was wondering if there is any good / solid documentation on setting up and running replication. I'm going through the Wiki but I am not seeing anything that is obvious there. -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: Dropping Caches of Machine That Solr Runs At
We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr heap is 8GB. We have several cores, totaling about 14GB on disk. This configuration allows 100% of the indexes to be in file buffers. wunder On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote: Hi Walter; You said you are caching your documents. What is average Physical Memory usage of your Solr Nodes? 2013/8/26 Walter Underwood wun...@wunderwood.org It looks lik that error happens when reading XML from an HTTP request. The XML ends too soon. This should be unrelated to file buffers. wunder On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote: It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have CentOS 6.4. While indexing I got that error and I am suspicious about that it is because of high percentage of Physical Memory usage. ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:948) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.eclipse.jetty.io.EofException: early EOF at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:65) at java.io.InputStream.read(InputStream.java:101) at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365) at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:110) at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101) at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84) at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57) at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992) at
Re: Master / Slave Set Up Documentation
Ha, I guess I didn't see that page listed in the Table of contents it's definitely Monday. Thanks. On Mon, Aug 26, 2013 at 10:36 AM, Andrea Gazzarini andrea.gazzar...@gmail.com wrote: You mean this http://wiki.apache.org/solr/SolrReplication ? What's wrong with this page? It seems clear. I'm widely using replication and the first time I set up a 1 master + 2 slaves by simply following that page On 26 Aug 2013 18:54, Jared Griffith jgriff...@picsauditing.com wrote: Hello, I'm new to this Solr thing, and I was wondering if there is any good / solid documentation on setting up and running replication. I'm going through the Wiki but I am not seeing anything that is obvious there. -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC -- Jared Griffith Linux Administrator, PICS Auditing, LLC P: (949) 936-4574 C: (909) 653-7814 http://www.picsauditing.com 17701 Cowan #140 | Irvine, CA | 92614 Join PICS on LinkedIn and Twitter! https://twitter.com/PICSAuditingLLC
Re: Grouping
I'm getting the same error...Is there any workaround to this? -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-tp2820116p4086674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 update to 4.3/4.4 problem
What is a select analyzer type? Never seen one of those before or I'm just blanking Either of those types should work for case-insensitive search, did you re-index? And please don't hijack threads, start a new subject with new questions. Best Erick On Mon, Aug 26, 2013 at 7:42 AM, skorrapa korrapati.sus...@gmail.comwrote: I have also re indexed the data and tried. And also tried with the belowl fieldType name=string_lower_case class=solr.TextField sortMissingLast=true omitNorms=true analyzer type = index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = select tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This didnt work as well... On Mon, Aug 26, 2013 at 4:03 PM, skorrapa [via Lucene] ml-node+s472066n4086601...@n3.nabble.com wrote: Hello All, I am still facing the same issue. Case insensitive search isnot working on Solr 4.3 I am using the below configurations in schema.xml fieldType name=string_lower_case class=solr.TextField sortMissingLast=true omitNorms=true analyzer type = index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type = select tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Basically I want my string which could have spaces or characters like '-' or \ to be searched upon case insensitively. Please help. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086601.html To unsubscribe from Solr 4.2.1 update to 4.3/4.4 problem, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4081896code=a29ycmFwYXRpLnN1c2htYUBnbWFpbC5jb218NDA4MTg5Nnw0MjEwNTY0Mzc= . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-update-to-4-3-4-4-problem-tp4081896p4086606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Default query operator OR wont work in some cases
Try adding debug=query to your URL, that'll show you how the parsing actually happened and should give you some pointers. Best, Erick On Mon, Aug 26, 2013 at 9:55 AM, smanad sma...@gmail.com wrote: Hi, I have some documents with keywords egg and some with salad and some with egg salad. When I search for egg salad, I expect to see egg results + salad results. I dont see them. egg and salad queries individually work fine. I am using whitespacetokenizer. Not sure if I am missing something. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding one core to an existing core?
ok thanks ! Le 26/08/2013 17:52, Jack Krupansky a écrit : Unfortunately, there is no -Dcore property, so you have to due -Durl - java -Durl=http://localhost:8983/solr/collection2/update ... -jar post.jar ... You have the proper /select syntax. -- Jack Krupansky -Original Message- From: Bruno Mannina Sent: Monday, August 26, 2013 9:36 AM To: solr-user@lucene.apache.org Subject: Re: Adding one core to an existing core? Dear Solr User, now I have 2 cores collection1 collection2 Default collection is the Collection1 I have two questions: - Is exist a parameter to add in my html link to indicate the selected core? http://xxx.xxx.xxx.xxx/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I mean by default is the collection1, if I want collection2 I use the link: http://xxx.xxx.xxx.xxx/solr/collection2/select/?q=*%3A*version=2.2start=0rows=10indent=on Is exist a param core=collection2 instead of using a different link? - My second question concerns updating. Actually with one core, I do: java -jar post.jar foo.xml I suppose now I must add the desire core ? no ? i.e.: -Dcore=collection2 What is the param to add in my command line? Thanks a lot ! Bruno Le 22/08/2013 16:23, Andrea Gazzarini a écrit : First, a core is a separate index so it is completely indipendent from the already existing core(s). So basically you don't need to reindex. In order to have two cores (but the same applies for n cores): you must have in your solr.home the file (solr.xml) described here http://wiki.apache.org/solr/Solr.xml%20%28supported%20through%204.x%29 then, you must obviously have one or two directories (corresponding to the instanceDir attribute). I said one or two because if the indexes configuration is basically the same (or something changes but is dynamically configured - i.e. core name) you can create two instances starting from the same configuration. I mean solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir* / core name=core1 instanceDir=*conf.dir* / /cores /solr Otherwise you must have two different conf directories that contain indexes configuration. You should already have a first one (the current core), you just need to have another conf dir with solrconfig.xml, schema.xml and other required files. In this case each core will have its own instanceDir. solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=*conf.dir.core0* / core name=core1 instanceDir=*conf.dir.core1* / /cores /solr Best, Andrea On 08/22/2013 04:04 PM, Bruno Mannina wrote: Little precision, I'm on Ubuntu 12.04LTS Le 22/08/2013 15:56, Bruno Mannina a écrit : Dear Users, (Solr3.6 + Tomcat7) I use since two years Solr with one core, I would like now to add one another core (a new database). Can I do this without re-indexing my core1 ? could you point me to a good tutorial to do that? (my current database is around 200Go for 86 000 000 docs) My new database will be little, around 1000 documents of 5ko each. thanks a lot, Bruno
Re: More on topic of Meta-search/Federated Search with Solr
I have now come to the task of estimating man-days to add Blended Search Results to Apache Solr. The argument has been made that this is not desirable (see Jonathan Rochkind's blog entries on Bento search with blacklight). But the estimate remains.No estimate is worth much without a design. So, I am come to the difficult of estimating this without having an in-depth knowledge of the Apache core. Here is my design, likely imperfect, as it stands. - Configure a core specific to each search source (local or remote) - On cores that index remote content, implement a periodic delete query that deletes documents whose timestamp is too old - Implement a custom requestHandler for the remote cores that goes out and queries the remote source. For each result in the top N (configurable), it computes an id that is stable (e.g. it is based on the remote resource URL, doi, or hash of data returned). It uses that id to look-up the document in the lucene database. If the data is not there, it updates the lucene core and sets a flag that commit is required. Once it is done, it commits if needed. - Configure a core that uses a custom SearchComponent to call the requestHandler that goes and gets new documents and commits them. Since the cores for remote content are different cores, they can restart their searcher at this point if any commit is needed. The custom SearchComponent will wait for commit and reload to be completed. Then, search continues uses the other cores as shards. - Auto-warming on this will assure that the most recently requested data is present. It will, of course, be very slow a good part of the time. Erik and others, I need to know whether this design has legs and what other alternatives I might consider. On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote: The lack of global TF/IDF has been answered in the past, in the sharded case, by usually you have similar enough stats that it doesn't matter. This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you rescore with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that makes sense. But I don't know how you'd just get the right thing to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote: I've thought about it, and I have no time to really do a meta-search during evaluation. What I need to do is to create a single core that contains both of my data sets, and then describe the architecture that would be required to do blended results, with liberal estimates. From the perspective of evaluation, I need to understand whether any of the solutions to better ranking in the absence of global IDF have been explored?I suspect that one could retrieve a much larger than N set of results from a set of shards, re-score in some way that doesn't require IDF, e.g. storing both results in the same priority queue and *re-scoring* before *re-ranking*. The other way to do this would be to have a custom SearchHandler that works differently - it performs the query, retries all results deemed relevant by another engine, adds them to the Lucene index, and then performs the query again in the standard way. This would be quite slow, but perhaps useful as a way to evaluate my method. I still welcome any suggestions on how such a SearchHandler could be implemented.
Re: More on topic of Meta-search/Federated Search with Solr
Why not simply create a meta search engine that indexes everything of each of the nodes.? (I think one calls this harvesting) I believe that this the way to avoid all sorts of performance bottleneck. As far as I could analyze, the performance of a federated search is the performance of the least speedy node; which can turn to be quite bad if you do not exercise guarantees of remote sources. Or are the remote cores below actually things that you manage on your side? If yes guarantees are easy to manage.. Paul Le 26 août 2013 à 22:38, Dan Davis a écrit : I have now come to the task of estimating man-days to add Blended Search Results to Apache Solr. The argument has been made that this is not desirable (see Jonathan Rochkind's blog entries on Bento search with blacklight). But the estimate remains.No estimate is worth much without a design. So, I am come to the difficult of estimating this without having an in-depth knowledge of the Apache core. Here is my design, likely imperfect, as it stands. - Configure a core specific to each search source (local or remote) - On cores that index remote content, implement a periodic delete query that deletes documents whose timestamp is too old - Implement a custom requestHandler for the remote cores that goes out and queries the remote source. For each result in the top N (configurable), it computes an id that is stable (e.g. it is based on the remote resource URL, doi, or hash of data returned). It uses that id to look-up the document in the lucene database. If the data is not there, it updates the lucene core and sets a flag that commit is required. Once it is done, it commits if needed. - Configure a core that uses a custom SearchComponent to call the requestHandler that goes and gets new documents and commits them. Since the cores for remote content are different cores, they can restart their searcher at this point if any commit is needed. The custom SearchComponent will wait for commit and reload to be completed. Then, search continues uses the other cores as shards. - Auto-warming on this will assure that the most recently requested data is present. It will, of course, be very slow a good part of the time. Erik and others, I need to know whether this design has legs and what other alternatives I might consider. On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote: The lack of global TF/IDF has been answered in the past, in the sharded case, by usually you have similar enough stats that it doesn't matter. This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you rescore with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that makes sense. But I don't know how you'd just get the right thing to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote: I've thought about it, and I have no time to really do a meta-search during evaluation. What I need to do is to create a single core that contains both of my data sets, and then describe the architecture that would be required to do blended results, with liberal estimates. From the perspective of evaluation, I need to understand whether any of the solutions to better ranking in the absence of global IDF have been explored?I suspect that one could retrieve a much larger than N set of results from a set of shards, re-score in some way that doesn't require IDF, e.g. storing both results in the same priority queue and *re-scoring* before *re-ranking*. The other way to do this would be to have a custom SearchHandler that works differently - it performs the query, retries all results deemed relevant by another engine, adds them to the Lucene index, and then performs the query again in the standard way. This would be quite slow, but perhaps useful as a way to evaluate my method. I still welcome any suggestions on how such a SearchHandler could be implemented.
Re: Dropping Caches of Machine That Solr Runs At
EOF exception seems like a generic exception for me. I should find the underlying problem within my infrastructure. 26 Ağustos 2013 Pazartesi tarihinde Walter Underwood wun...@wunderwood.org adlı kullanıcı şöyle yazdı: We use Amazon EC2 machines with 34GB of memory (m2.2xlarge). The Solr heap is 8GB. We have several cores, totaling about 14GB on disk. This configuration allows 100% of the indexes to be in file buffers. wunder On Aug 26, 2013, at 9:57 AM, Furkan KAMACI wrote: Hi Walter; You said you are caching your documents. What is average Physical Memory usage of your Solr Nodes? 2013/8/26 Walter Underwood wun...@wunderwood.org It looks lik that error happens when reading XML from an HTTP request. The XML ends too soon. This should be unrelated to file buffers. wunder On Aug 26, 2013, at 9:17 AM, Furkan KAMACI wrote: It has a 48 GB of RAM and index size is nearly 100 GB at each node. I have CentOS 6.4. While indexing I got that error and I am suspicious about that it is because of high percentage of Physical Memory usage. ERROR - 2013-08-21 22:01:30.979; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:245) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1812) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:639) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.-- Walter Underwood wun...@wunderwood.org
Re: More on topic of Meta-search/Federated Search with Solr
First answer: My employer is a library and do not have the license to harvest everything indexed by a web-scale discovery service such as PRIMO or Summon.If our design automatically relays searches entered by users, and then periodically purges results, I think it is reasonable from a licensing perspective. Second answer: What if you wanted your Apache Solr powered search to include all results from Google scholar to any query? Do you think you could easily or cheaply configure a Zookeeper cluster large enough to harvest and index all of Google Scholar? Would that violate robot rules?Is it even possible to do this from an API perspective? Wouldn't google notice? Third answer: On Gartner's 2013 Enterprise Search Magic Quadrant, LucidWorks and the other Enterprise Search firm based on Apache Solr were dinged on the lack of Federated Search. I do not have the hubris to think I can fix that, and it is not really my role to try, but something that works without Harvesting and local indexing is obviously desirable to Enterprise Search users. On Mon, Aug 26, 2013 at 4:46 PM, Paul Libbrecht p...@hoplahup.net wrote: Why not simply create a meta search engine that indexes everything of each of the nodes.? (I think one calls this harvesting) I believe that this the way to avoid all sorts of performance bottleneck. As far as I could analyze, the performance of a federated search is the performance of the least speedy node; which can turn to be quite bad if you do not exercise guarantees of remote sources. Or are the remote cores below actually things that you manage on your side? If yes guarantees are easy to manage.. Paul Le 26 août 2013 à 22:38, Dan Davis a écrit : I have now come to the task of estimating man-days to add Blended Search Results to Apache Solr. The argument has been made that this is not desirable (see Jonathan Rochkind's blog entries on Bento search with blacklight). But the estimate remains.No estimate is worth much without a design. So, I am come to the difficult of estimating this without having an in-depth knowledge of the Apache core. Here is my design, likely imperfect, as it stands. - Configure a core specific to each search source (local or remote) - On cores that index remote content, implement a periodic delete query that deletes documents whose timestamp is too old - Implement a custom requestHandler for the remote cores that goes out and queries the remote source. For each result in the top N (configurable), it computes an id that is stable (e.g. it is based on the remote resource URL, doi, or hash of data returned). It uses that id to look-up the document in the lucene database. If the data is not there, it updates the lucene core and sets a flag that commit is required. Once it is done, it commits if needed. - Configure a core that uses a custom SearchComponent to call the requestHandler that goes and gets new documents and commits them. Since the cores for remote content are different cores, they can restart their searcher at this point if any commit is needed. The custom SearchComponent will wait for commit and reload to be completed. Then, search continues uses the other cores as shards. - Auto-warming on this will assure that the most recently requested data is present. It will, of course, be very slow a good part of the time. Erik and others, I need to know whether this design has legs and what other alternatives I might consider. On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.com wrote: The lack of global TF/IDF has been answered in the past, in the sharded case, by usually you have similar enough stats that it doesn't matter. This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you rescore with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that makes sense. But I don't know how you'd just get the right thing to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote: I've thought about it, and I have no time to really do a meta-search during evaluation. What I need to do is to create a single core that contains both of my data sets, and then describe the architecture that would be required to do blended results, with liberal estimates. From the perspective of evaluation, I need to understand whether any of the solutions to better ranking in
Re: More on topic of Meta-search/Federated Search with Solr
One more question here - is this topic more appropriate to a different list? On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis dansm...@gmail.com wrote: I have now come to the task of estimating man-days to add Blended Search Results to Apache Solr. The argument has been made that this is not desirable (see Jonathan Rochkind's blog entries on Bento search with blacklight). But the estimate remains.No estimate is worth much without a design. So, I am come to the difficult of estimating this without having an in-depth knowledge of the Apache core. Here is my design, likely imperfect, as it stands. - Configure a core specific to each search source (local or remote) - On cores that index remote content, implement a periodic delete query that deletes documents whose timestamp is too old - Implement a custom requestHandler for the remote cores that goes out and queries the remote source. For each result in the top N (configurable), it computes an id that is stable (e.g. it is based on the remote resource URL, doi, or hash of data returned). It uses that id to look-up the document in the lucene database. If the data is not there, it updates the lucene core and sets a flag that commit is required. Once it is done, it commits if needed. - Configure a core that uses a custom SearchComponent to call the requestHandler that goes and gets new documents and commits them. Since the cores for remote content are different cores, they can restart their searcher at this point if any commit is needed. The custom SearchComponent will wait for commit and reload to be completed. Then, search continues uses the other cores as shards. - Auto-warming on this will assure that the most recently requested data is present. It will, of course, be very slow a good part of the time. Erik and others, I need to know whether this design has legs and what other alternatives I might consider. On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote: The lack of global TF/IDF has been answered in the past, in the sharded case, by usually you have similar enough stats that it doesn't matter. This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you rescore with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that makes sense. But I don't know how you'd just get the right thing to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote: I've thought about it, and I have no time to really do a meta-search during evaluation. What I need to do is to create a single core that contains both of my data sets, and then describe the architecture that would be required to do blended results, with liberal estimates. From the perspective of evaluation, I need to understand whether any of the solutions to better ranking in the absence of global IDF have been explored?I suspect that one could retrieve a much larger than N set of results from a set of shards, re-score in some way that doesn't require IDF, e.g. storing both results in the same priority queue and *re-scoring* before *re-ranking*. The other way to do this would be to have a custom SearchHandler that works differently - it performs the query, retries all results deemed relevant by another engine, adds them to the Lucene index, and then performs the query again in the standard way. This would be quite slow, but perhaps useful as a way to evaluate my method. I still welcome any suggestions on how such a SearchHandler could be implemented.
No documents found for some queries with special chars like mm
Some of the queries (not all) with special chars return no documents. Example: queries returning no documents q=mm (this can be explained, when I search for m m, no documents are returned) q=o'reilly (when I search for o reilly, I get documents back) Queries returning documents: q=helloworld (document matched is Hello World: A Life in Ham Radio) My questions are: 1. What's wrong with o'reilly? What changes do I need in my field type? 2. How can I make the query mm work? My indexe has a bunch of MM's docs like: M M's Milk Chocolate Candy Coated Peanuts 19.2 oz and M and Ms Chocolate Candies - Peanut - 1 Bag (42 oz) FIeld type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- Thanks, -Utkarsh
Re: No documents found for some queries with special chars like mm
First thing to do is attach query=debug to your queries and look at the parsed output. Second thing to do is look at the admin/analysis page and see what happens at index and query time to things like o'reilly. You have WordDelimiterFilterFactory configured in your query but not index analysis chain. My bet on that is that you're getting different tokens at query and index time... Third thing is that you need to escape the character. It's probably being interpreted as a delimiter on the URL and Solr ignores params it doesn't understand. Best Erick On Mon, Aug 26, 2013 at 5:08 PM, Utkarsh Sengar utkarsh2...@gmail.comwrote: Some of the queries (not all) with special chars return no documents. Example: queries returning no documents q=mm (this can be explained, when I search for m m, no documents are returned) q=o'reilly (when I search for o reilly, I get documents back) Queries returning documents: q=helloworld (document matched is Hello World: A Life in Ham Radio) My questions are: 1. What's wrong with o'reilly? What changes do I need in my field type? 2. How can I make the query mm work? My indexe has a bunch of MM's docs like: M M's Milk Chocolate Candy Coated Peanuts 19.2 oz and M and Ms Chocolate Candies - Peanut - 1 Bag (42 oz) FIeld type: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishMinimalStemFilterFactory/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- Thanks, -Utkarsh
Re: Default query operator OR wont work in some cases
here is keywords field for 3 docs, Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce Here is my debug query: str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1) DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2) DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str Here is my fieldtype definition for keywords, fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: More on topic of Meta-search/Federated Search with Solr
Hi, I would suggest for the following. 1. Create custom search connectors for each individual sources. 2. Connector will responsible to query the source of any type web, gateways etc. and get the results write the top N results to a solr. 3. Query the same keyword to solr and display the result. Would you like to create something like http://knimbus.com Rgds AJ On 27-Aug-2013, at 2:28, Dan Davis dansm...@gmail.com wrote: One more question here - is this topic more appropriate to a different list? On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis dansm...@gmail.com wrote: I have now come to the task of estimating man-days to add Blended Search Results to Apache Solr. The argument has been made that this is not desirable (see Jonathan Rochkind's blog entries on Bento search with blacklight). But the estimate remains.No estimate is worth much without a design. So, I am come to the difficult of estimating this without having an in-depth knowledge of the Apache core. Here is my design, likely imperfect, as it stands. - Configure a core specific to each search source (local or remote) - On cores that index remote content, implement a periodic delete query that deletes documents whose timestamp is too old - Implement a custom requestHandler for the remote cores that goes out and queries the remote source. For each result in the top N (configurable), it computes an id that is stable (e.g. it is based on the remote resource URL, doi, or hash of data returned). It uses that id to look-up the document in the lucene database. If the data is not there, it updates the lucene core and sets a flag that commit is required. Once it is done, it commits if needed. - Configure a core that uses a custom SearchComponent to call the requestHandler that goes and gets new documents and commits them. Since the cores for remote content are different cores, they can restart their searcher at this point if any commit is needed. The custom SearchComponent will wait for commit and reload to be completed. Then, search continues uses the other cores as shards. - Auto-warming on this will assure that the most recently requested data is present. It will, of course, be very slow a good part of the time. Erik and others, I need to know whether this design has legs and what other alternatives I might consider. On Sun, Aug 18, 2013 at 3:14 PM, Erick Erickson erickerick...@gmail.comwrote: The lack of global TF/IDF has been answered in the past, in the sharded case, by usually you have similar enough stats that it doesn't matter. This pre-supposes a fairly evenly distributed set of documents. But if you're talking about federated search across different types of documents, then what would you rescore with? How would you even consider scoring docs that are somewhat/ totally different? Think magazine articles an meta-data associated with pictures. What I've usually found is that one can use grouping to show the top N of a variety of results. Or show tabs with different types. Or have the app intelligently combine the different types of documents in a way that makes sense. But I don't know how you'd just get the right thing to happen with some kind of scoring magic. Best Erick On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote: I've thought about it, and I have no time to really do a meta-search during evaluation. What I need to do is to create a single core that contains both of my data sets, and then describe the architecture that would be required to do blended results, with liberal estimates. From the perspective of evaluation, I need to understand whether any of the solutions to better ranking in the absence of global IDF have been explored?I suspect that one could retrieve a much larger than N set of results from a set of shards, re-score in some way that doesn't require IDF, e.g. storing both results in the same priority queue and *re-scoring* before *re-ranking*. The other way to do this would be to have a custom SearchHandler that works differently - it performs the query, retries all results deemed relevant by another engine, adds them to the Lucene index, and then performs the query again in the standard way. This would be quite slow, but perhaps useful as a way to evaluate my method. I still welcome any suggestions on how such a SearchHandler could be implemented.
Re: Default query operator OR wont work in some cases
The phrase egg salad does not occur in your input. And, quoted phrases are an implicit AND, not an OR. Either you wanted egg and salad but not as a phrase, or as a very loose sloppy phrase, such as egg salad~10. Or, who knows what you really want - your requirements are expressed too imprecisely. -- Jack Krupansky -Original Message- From: smanad Sent: Monday, August 26, 2013 8:50 PM To: solr-user@lucene.apache.org Subject: Re: Default query operator OR wont work in some cases here is keywords field for 3 docs, Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce Here is my debug query: str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1) DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2) DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str Here is my fieldtype definition for keywords, fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html Sent from the Solr - User mailing list archive at Nabble.com.
ANNOUNCE: Lucene/Solr Revolution EU 2013: Registration Community Voting
(NOTE: cross-posted to various lists, please reply only to general@lucene w/ any questions or follow ups) 2 Announcements folks should be aware of regarding the upcoming Lucene/Solr Revolution EU 2013 in Dublin... # 1) Registration Now Open Registration is now open for Lucene/Solr Revolution EU 2013, the biggest open source conference dedicated to Apache Lucene/Solr. Two-day training workshops will precede the conference. You can benefit from discounted conference rates if you register early. http://lucenerevolution.org/registration More info... http://searchhub.org/2013/08/15/lucenesolr-revolution-eu-registration-is-open/ # 2) Community Voting on Agenda (Until September 9th) The Lucene/Solr Revolution free voting system allows you to vote on your favorite topics. The sessions that receive the highest number of votes will be automatically added to the Lucene/Solr Revolution EU 2013 agenda. The remaining sessions will be selected by a committee of industry experts who will take into account the community’s votes as well as their own expertise in the area. http://lucenerevolution.org/2013/call-for-papers-survey More info... http://searchhub.org/2013/08/23/help-us-set-the-agenda-for-lucenesolr-revolution-eu/ -Hoss
Re: Default query operator OR wont work in some cases
I am not searching for phrase query, I am not sure why it shows up in parsedquery. lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=debugQuerytrue/str str name=indenttrue/str str name=qegg salad /str str name=_1377569284170/str str name=wtxml/str /lst /lst -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter cache pollution during sharded edismax queries
Hi Otis, Sorry I missed your reply, and thanks for trying to find a similar report. Wondering if I should file a Jira issue? That might get more attention :) -- Ken On Jul 5, 2013, at 1:05pm, Otis Gospodnetic wrote: Hi Ken, Uh, I left this email until now hoping I could find you a reference to similar reports, but I can't find them now. I am quite sure I saw somebody with a similar report within the last month. Plus, several people have reported issues with performance dropping when they went from 3.x to 4.x and maybe this is why. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 2, 2013 at 3:01 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi all, After upgrading from Solr 3.5 to 4.2.1, I noticed our filterCache hit ratio had dropped significantly. Previously it was at 95+%, but now it's 50%. I enabled recording 100 entries for debugging, and in looking at them it seems that edismax (and faceting) is creating entries for me. This is in a sharded setup, so it's a distributed search. If I do a search for the string bogus text using edismax on two fields, I get an entry in each of the shard's filter caches that looks like: item_+(((field1:bogus | field2:bogu) (field1:text | field2:text))~2): Is this expected? I have a similar situation happening during faceted search, even though my fields are single-value/untokenized strings, and I'm not using the enum facet method. But I'll get many, many entries in the filterCache for facet values, and they all look like item_facet field:facet value: The net result of the above is that even with a very big filterCache size of 2K, the hit ratio is still only 60%. Thanks for any insights, -- Ken -- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions training Hadoop, Cascading, Cassandra Solr
Re: Default query operator OR wont work in some cases
Yeah, sorry, I read the parsed query too quickly - the phrase is the optional relevancy boost due to the pf2 parameter. -- Jack Krupansky -Original Message- From: smanad Sent: Monday, August 26, 2013 10:08 PM To: solr-user@lucene.apache.org Subject: Re: Default query operator OR wont work in some cases I am not searching for phrase query, I am not sure why it shows up in parsedquery. lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=debugQuerytrue/str str name=indenttrue/str str name=qegg salad /str str name=_1377569284170/str str name=wtxml/str /lst /lst -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html Sent from the Solr - User mailing list archive at Nabble.com.