Re: Disable all caches in Solr
Thanks Chris. I understand this. But this test is to determine the *maximum* latency a query can have and hence I have disabled all caches. After disabling all caches in solrconfig, I was able to remove latency variation for a single query in most of the cases. But still *sort* queries are showing variation in latency when executed multiple times. Is there some hidden cache for sorting? When I run query below for first time, it shows higher latency, but when I run it second time it shows lower QTime. http://localhost:7000/solr/collection1/select?q=field1:keywordrows=20sort=field2 desc *If I remove the sorting then I always get fixed QTime*. field2 is of type tlong. Any ideas why this is happening and how to prevent this variation? -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-all-caches-in-Solr-tp4144933p4146039.html Sent from the Solr - User mailing list archive at Nabble.com.
Changing default behavior of solr for overwrite the whole document on uniquekey duplication
Dears, Hi, According to my requirement I need to change the default behavior of Solr for overwriting the whole document on unique-key duplication. I am going to change that the overwrite just part of document (some fields) and other parts of document (other fields) remain unchanged. First of all I need to know such changing in Solr behavior is possible? Second, I really appreciate if you can guide me through what class/classes should I consider for changing that? Best regards. -- A.Nazemian
Solr irregularly having QTime 50000ms, stracing solr cures the problem
Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Harald.
Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication
Please look at https://wiki.apache.org/solr/Atomic_Updates This does what you want just update relevant fields. Thanks, Himanshu On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com wrote: Dears, Hi, According to my requirement I need to change the default behavior of Solr for overwriting the whole document on unique-key duplication. I am going to change that the overwrite just part of document (some fields) and other parts of document (other fields) remain unchanged. First of all I need to know such changing in Solr behavior is possible? Second, I really appreciate if you can guide me through what class/classes should I consider for changing that? Best regards. -- A.Nazemian
Re: SOLR on hdfs
Hi all, I am new to Solr and hdfs, actually, I am trying to index text content extracted from binary files like PDF, MS Office...etc which are stored on hdfs (single node), till now I've running Solr on HDFS, and create the core but I couldn't send the files to solr for indexing. Can someone please help me to do that. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-on-hdfs-tp4045128p4146049.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
My First assumption: full gc. Can you please tell us about your jvm setup and maybe trace what happens the jvms? On Jul 8, 2014 9:54 AM, Harald Kirsch harald.kir...@raytion.com wrote: Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Harald.
Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication
Dear Himanshu, Hi, You misunderstood what I meant. I am not going to update some field. I am going to change what Solr do on duplication of uniquekey field. I dont want to solr overwrite Whole document I just want to overwrite some parts of document. This situation does not come from user side this is what solr do to documents with duplicated uniquekey. Regards. On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra himanshu.mehro...@snapdeal.com wrote: Please look at https://wiki.apache.org/solr/Atomic_Updates This does what you want just update relevant fields. Thanks, Himanshu On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com wrote: Dears, Hi, According to my requirement I need to change the default behavior of Solr for overwriting the whole document on unique-key duplication. I am going to change that the overwrite just part of document (some fields) and other parts of document (other fields) remain unchanged. First of all I need to know such changing in Solr behavior is possible? Second, I really appreciate if you can guide me through what class/classes should I consider for changing that? Best regards. -- A.Nazemian -- A.Nazemian
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
No, no full GC. The JVM does nothing during the outages, no CPU, no GC, as checked with jvisualvm and htop. Harald. On 08.07.2014 10:12, Heyde, Ralf wrote: My First assumption: full gc. Can you please tell us about your jvm setup and maybe trace what happens the jvms? On Jul 8, 2014 9:54 AM, Harald Kirsch harald.kir...@raytion.com wrote: Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Harald. -- Harald Kirsch Raytion GmbH Kaiser-Friedrich-Ring 74 40547 Duesseldorf Fon +49 211 53883-216 Fax +49-211-550266-19 http://www.raytion.com
Parallel optimize of index on SolrCloud.
Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServer.opimize() on them parallely but still it seems to be doing optimization sequentially. I tried invoking optimize directly using HttpPost with following url and parameters but still it seems to be sequential. *URL* : http://host:port/solr/collection/update *Parameters*: params.add(new BasicNameValuePair(optimize, true)); params.add(new BasicNameValuePair(maxSegments, 1)); params.add(new BasicNameValuePair(waitFlush, true)); params.add(new BasicNameValuePair(distrib, false)); Kindly provide your suggestion and help. Regards, Modassar
[Solr Schema API] SolrJ Access
Hi guys, wondering if there is any proper way to access Schema API via Solrj. Of course is possible to reach them in Java with a specific Http Request, but in this way, using SolrCloud for example we become coupled to one specific instance ( and we don't want) . Code Example : HttpResponse httpResponse; String url=this.solrBase+/+core+ SCHEMA_SOLR_FIELDS_ENDPOINT +fieldName; HttpPut httpPut = new HttpPut(url); StringEntity entity = new StringEntity( {\type\:\text_general\,\stored\:\true\} , ContentType.APPLICATION_JSON); httpPut.setEntity( entity ); HttpClient client=new DefaultHttpClient(); response = client.execute(httpPut); Any suggestion ? In my opinion should be interesting to have some auxiliary method in SolrServer if it's not there yet. Cheers -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Fwd: Language detection for solr 3.6.1
-- Forwarded message -- From: Poornima Jay poornima...@rocketmail.com Date: Tue, Jul 8, 2014 at 5:03 PM Subject: Re: Language detection for solr 3.6.1 When i try to use solr-langid-3.6.1.jar file in my path /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ and define the path in the solrconfig.xml as below lib dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ regex=solr-langid-.*\.jar / I am getting the below error while reloading the core. SEVERE: java.lang.NoClassDefFoundError: com/cybozu/labs/langdetect/DetectorFactory Please advice. Thanks, Poornima On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you are having troubles with jar location, just use absolute path in your lib statement and use path, not dir/regex. That will complain louder. You should be using the latest jar matching the version, they should be shipped with Solr itself. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay poornima...@rocketmail.com wrote: I am facing the issue with the jar file location. Where should i place the solr-langid-3.6.1.jar. If i place it in the instance folder inside /lib/solr-langid-3.6.1.jar the language detection class are not loaded. Should i use solr-langid-3.5.1.jar in solr 3.6.1 version? Can you please attach the schema file also for reference. lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar / lib dir=${user.dir}/../contrib/langid/lib/ / where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/ Thanks for your time. Regards, Poornima On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I've had an example in my book: https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml , though it was for Solr 4.2+. Solr in Action also has a section on multilingual indexing. There is no generic advice, as everybody seems to have slightly different multilingual requirements, but the books will at least discuss the main issues. Regarding your specific email from a week ago, You haven't actually said what is the problem was. Just what you did. So, we don't know where you are stuck and what - specifically - you need help with. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Please let me know if anyone had used google language detection for implementing multilanguage search in one schema. Thanks, Poornima On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/ location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes in the solrconfig.xml as below directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=invariants str name=langid.flcontent_eng/str str name=langid.maptrue/str str name=langid.map.flcontent_eng,content_ja/str str name=langid.whitelisten,ja/str str name=langid.map.lcmapen:english ja:japanese/str str name=langid.fallbacken/str /lst /processor /updateRequestProcessorChain requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler Please suggest me the solution. Thanks, Poornima
Re: Fwd: Language detection for solr 3.6.1
When i use solr-langid-3.5.0.jar file after reloading the core i am getting the below error SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder. Thanks, Poornima On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: -- Forwarded message -- From: Poornima Jay poornima...@rocketmail.com Date: Tue, Jul 8, 2014 at 5:03 PM Subject: Re: Language detection for solr 3.6.1 When i try to use solr-langid-3.6.1.jar file in my path /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ and define the path in the solrconfig.xml as below lib dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ regex=solr-langid-.*\.jar / I am getting the below error while reloading the core. SEVERE: java.lang.NoClassDefFoundError: com/cybozu/labs/langdetect/DetectorFactory Please advice. Thanks, Poornima On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you are having troubles with jar location, just use absolute path in your lib statement and use path, not dir/regex. That will complain louder. You should be using the latest jar matching the version, they should be shipped with Solr itself. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay poornima...@rocketmail.com wrote: I am facing the issue with the jar file location. Where should i place the solr-langid-3.6.1.jar. If i place it in the instance folder inside /lib/solr-langid-3.6.1.jar the language detection class are not loaded. Should i use solr-langid-3.5.1.jar in solr 3.6.1 version? Can you please attach the schema file also for reference. lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar / lib dir=${user.dir}/../contrib/langid/lib/ / where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/ Thanks for your time. Regards, Poornima On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I've had an example in my book: https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml , though it was for Solr 4.2+. Solr in Action also has a section on multilingual indexing. There is no generic advice, as everybody seems to have slightly different multilingual requirements, but the books will at least discuss the main issues. Regarding your specific email from a week ago, You haven't actually said what is the problem was. Just what you did. So, we don't know where you are stuck and what - specifically - you need help with. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Please let me know if anyone had used google language detection for implementing multilanguage search in one schema. Thanks, Poornima On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/ location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes in the solrconfig.xml as below directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=invariants str name=langid.flcontent_eng/str str name=langid.maptrue/str str name=langid.map.flcontent_eng,content_ja/str str name=langid.whitelisten,ja/str str name=langid.map.lcmapen:english ja:japanese/str str name=langid.fallbacken/str /lst /processor /updateRequestProcessorChain requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler Please suggest me the solution. Thanks, Poornima
Re: Fwd: Language detection for solr 3.6.1
I just realized you are not using Solr language detect libraries. You are using third party one. You did mention that in your first message. I don't see that library integrated with Solr though, just as a standalone library. So, you can't just plug in it. Is there any reason you cannot use one of the two libraries Solr does already have (Tika's and Google's)? What's so special about that one? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay poornima...@rocketmail.com wrote: When i use solr-langid-3.5.0.jar file after reloading the core i am getting the below error SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder. Thanks, Poornima On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: -- Forwarded message -- From: Poornima Jay poornima...@rocketmail.com Date: Tue, Jul 8, 2014 at 5:03 PM Subject: Re: Language detection for solr 3.6.1 When i try to use solr-langid-3.6.1.jar file in my path /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ and define the path in the solrconfig.xml as below lib dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ regex=solr-langid-.*\.jar / I am getting the below error while reloading the core. SEVERE: java.lang.NoClassDefFoundError: com/cybozu/labs/langdetect/DetectorFactory Please advice. Thanks, Poornima On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you are having troubles with jar location, just use absolute path in your lib statement and use path, not dir/regex. That will complain louder. You should be using the latest jar matching the version, they should be shipped with Solr itself. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay poornima...@rocketmail.com wrote: I am facing the issue with the jar file location. Where should i place the solr-langid-3.6.1.jar. If i place it in the instance folder inside /lib/solr-langid-3.6.1.jar the language detection class are not loaded. Should i use solr-langid-3.5.1.jar in solr 3.6.1 version? Can you please attach the schema file also for reference. lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar / lib dir=${user.dir}/../contrib/langid/lib/ / where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/ Thanks for your time. Regards, Poornima On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I've had an example in my book: https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml , though it was for Solr 4.2+. Solr in Action also has a section on multilingual indexing. There is no generic advice, as everybody seems to have slightly different multilingual requirements, but the books will at least discuss the main issues. Regarding your specific email from a week ago, You haven't actually said what is the problem was. Just what you did. So, we don't know where you are stuck and what - specifically - you need help with. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Please let me know if anyone had used google language detection for implementing multilanguage search in one schema. Thanks, Poornima On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/ location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes in the solrconfig.xml as below directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=invariants str name=langid.flcontent_eng/str str name=langid.maptrue/str str name=langid.map.flcontent_eng,content_ja/str str name=langid.whitelisten,ja/str str name=langid.map.lcmapen:english ja:japanese/str str name=langid.fallbacken/str /lst /processor /updateRequestProcessorChain requestHandler
don't count facet on blank values
Hi, Is this possible to not to count the facets for the blank values? e.g. cat: cats:[*,34324,* 10,8635, 20,8226, 50,5162, 30,759, 100,188, 40,13, 200,7] How is this possible? With Regards Aman Tandon
Re: Fwd: Language detection for solr 3.6.1
I'm using the google library which I has mentioned in my first mail saying Im using http://code.google.com/p/language-detection/. I have downloaded the jar file from the below url https://www.versioneye.com/java/org.apache.solr:solr-langid/3.6.1 Please let me know from where I need to download the correct jar file. Regards, Poornima On Tuesday, 8 July 2014 3:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I just realized you are not using Solr language detect libraries. You are using third party one. You did mention that in your first message. I don't see that library integrated with Solr though, just as a standalone library. So, you can't just plug in it. Is there any reason you cannot use one of the two libraries Solr does already have (Tika's and Google's)? What's so special about that one? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 5:08 PM, Poornima Jay poornima...@rocketmail.com wrote: When i use solr-langid-3.5.0.jar file after reloading the core i am getting the below error SEVERE: java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException Even after adding the solr-jsonic-3.5.0.jar file in the webapps folder. Thanks, Poornima On Tuesday, 8 July 2014 3:36 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: -- Forwarded message -- From: Poornima Jay poornima...@rocketmail.com Date: Tue, Jul 8, 2014 at 5:03 PM Subject: Re: Language detection for solr 3.6.1 When i try to use solr-langid-3.6.1.jar file in my path /apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ and define the path in the solrconfig.xml as below lib dir=/home/searchuser/apache-tomcat-5.5.25/webapps/solr_multilangue_3.6_jar/WEB-INF/lib/ regex=solr-langid-.*\.jar / I am getting the below error while reloading the core. SEVERE: java.lang.NoClassDefFoundError: com/cybozu/labs/langdetect/DetectorFactory Please advice. Thanks, Poornima On Tuesday, 8 July 2014 9:58 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you are having troubles with jar location, just use absolute path in your lib statement and use path, not dir/regex. That will complain louder. You should be using the latest jar matching the version, they should be shipped with Solr itself. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 11:14 AM, Poornima Jay poornima...@rocketmail.com wrote: I am facing the issue with the jar file location. Where should i place the solr-langid-3.6.1.jar. If i place it in the instance folder inside /lib/solr-langid-3.6.1.jar the language detection class are not loaded. Should i use solr-langid-3.5.1.jar in solr 3.6.1 version? Can you please attach the schema file also for reference. lib dir=${user.dir}/../dist/ regex=solr-langid-.*\.jar / lib dir=${user.dir}/../contrib/langid/lib/ / where exactly the jar file should be placed? /dist/ or /contrib/langid/lib/ Thanks for your time. Regards, Poornima On Monday, 7 July 2014 2:42 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: I've had an example in my book: https://github.com/arafalov/solr-indexing-book/blob/master/published/languages/conf/solrconfig.xml , though it was for Solr 4.2+. Solr in Action also has a section on multilingual indexing. There is no generic advice, as everybody seems to have slightly different multilingual requirements, but the books will at least discuss the main issues. Regarding your specific email from a week ago, You haven't actually said what is the problem was. Just what you did. So, we don't know where you are stuck and what - specifically - you need help with. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jul 7, 2014 at 4:06 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Please let me know if anyone had used google language detection for implementing multilanguage search in one schema. Thanks, Poornima On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried added solr-langdetect-3.5.0.jar in my /solr/contrib/langid/lib/ location and in /webapps/solr/WEB-INF/contrib/langid/lib/ and made changes in the solrconfig.xml as below directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ updateRequestProcessorChain name=langid processor
Re: don't count facet on blank values
Do you need those values stored/indexed? If not, why not remove them before they hit Solr with appropriate UpdateRequestProcessor? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 5:16 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Is this possible to not to count the facets for the blank values? e.g. cat: cats:[*,34324,* 10,8635, 20,8226, 50,5162, 30,759, 100,188, 40,13, 200,7] How is this possible? With Regards Aman Tandon
Re: don't count facet on blank values
On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote: Hi, Is this possible to not to count the facets for the blank values? e.g. cat: [...] Either filter them out in the query, or remove them client-side when displaying the results. Regards, Gora
Re: don't count facet on blank values
@Alex, yes we need them to indexed and stored, as we are doing some processing if fields are blank. @Gora Thanks, i will try this one. Thanks for your quick replies. With Regards Aman Tandon On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote: On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote: Hi, Is this possible to not to count the facets for the blank values? e.g. cat: [...] Either filter them out in the query, or remove them client-side when displaying the results. Regards, Gora
Re: don't count facet on blank values
Right, but the blank field and missing field are different things. Are they for you? If yes, then correct, you are stuck with getting them back. But if blank field is the same as missing/empty field, then you can pre-process unify them. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon amantandon...@gmail.com wrote: @Alex, yes we need them to indexed and stored, as we are doing some processing if fields are blank. @Gora Thanks, i will try this one. Thanks for your quick replies. With Regards Aman Tandon On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote: On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote: Hi, Is this possible to not to count the facets for the blank values? e.g. cat: [...] Either filter them out in the query, or remove them client-side when displaying the results. Regards, Gora
Re: Facets on Nested documents
Yes, also i've the same problem. In my case i have 2 type (parent and children) in a single collection and i want to retrieve only the parent with a facet on a children field. I've seen that is possible via block join query (availble by solr 4.5). I've solr 1.2 and I've thinked about static facet field calculated during indexing time but i'dont see any guide o reference about it. Walter Ing. Walter Liguori 2014-07-07 17:59 GMT+02:00 adfel70 adfe...@gmail.com: Hi, I indexed different types(different fields) of child docs for every parent. I want to do facet on field in one type of child doc and after it to do another of facet on different type of child doc. It doesn't work.. Any idea how i can do something like that? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-on-Nested-documents-tp4145931.html Sent from the Solr - User mailing list archive at Nabble.com.
JOB: Solr / Elasticsearch engineer @ Sematext
Hi, I think most people on this list have heard of Sematext http://sematext.com/, so I'll skip the company info, and just jump to the meat, which involves a lot of fun work with Solr and/or Elasticsearch: We have an opening for an engineer who knows either Elasticsearch or Solr or both and wants to use these technologies to implement search and analytics solutions for both Sematext's own products http://sematext.com/products/ such as SPM http://sematext.com/spm/ (monitoring, alerting, machine learning-based anomaly detection, etc.) and Logsene http://sematext.com/logsene/ (logging), as well as for Sematext's clients http://sematext.com/clients/. More info at: * http://blog.sematext.com/2014/07/07/job-elasticsearch-solr-engineer/ * http://sematext.com/about/jobs.html Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
[ANN] Solr Users Thailand - unofficial group
Hello, A new Google Group has been recently started for Solr Users who want to discuss Solr in Thai or need to discuss Solr issues around Thai language (in Thai or English). https://groups.google.com/forum/#!forum/solr-user-thailand The group is monitored by the local Solr consultancy, one of Thai LucidWorks employees and myself. It's just started, but if this language is of interest to you, please join and help building a vibrant community. As mentioned in the subject, this is not an official group. I hope though it will become active enough over time to be listed next to the other user groups on the Wiki. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
I need a replacement for the QueryElevation Component
Good morning to one and all, I'm using Solr 4.0 Final and I've been struggling mightily with the elevation component. It is too limited for our needs; it doesn't handle phrases very well and I need to have more than one doc with the same keyword or phrase. So, I need a better solution. One that allows us to tag the doc with keywords that clearly identify it as a promoted document would be ideal. I tried using an external file field but that only allows numbers and not strings (please correct me if I'm wrong) EFF would be ideal if there is a way to make it take strings. I also need an easy way to add these tags to specific docs. If possible, I would like to avoid creating a separate elevation core but it may come down to that... Thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: don't count facet on blank values
No both are same for me With Regards Aman Tandon On Tue, Jul 8, 2014 at 4:01 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Right, but the blank field and missing field are different things. Are they for you? If yes, then correct, you are stuck with getting them back. But if blank field is the same as missing/empty field, then you can pre-process unify them. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 8, 2014 at 5:26 PM, Aman Tandon amantandon...@gmail.com wrote: @Alex, yes we need them to indexed and stored, as we are doing some processing if fields are blank. @Gora Thanks, i will try this one. Thanks for your quick replies. With Regards Aman Tandon On Tue, Jul 8, 2014 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote: On 8 July 2014 15:46, Aman Tandon amantandon...@gmail.com wrote: Hi, Is this possible to not to count the facets for the blank values? e.g. cat: [...] Either filter them out in the query, or remove them client-side when displaying the results. Regards, Gora
Slow inserts when using Solr Cloud
Hi I'm encountering a surprisingly high increase in response times when I insert new documents into a SolrCloud, compared with a standalone Solr instance. I have a SolrCloud set up for test and evaluation purposes. I have four shards, each with a leader and a replica, distributed over four Windows virtual servers. I have zookeeper running on three of the four servers. There are not many documents in my SolrCloud (just a few hundred). I am using composite id routing, specifying a prefix to my document ids which is then used by Solr to determine which shard the document should be stored on. I determine in advance which shard a document with a given id prefix will end up in, by trying it out in advance. I then try the following scenarios, using inserts without commits. E.g. I use: curl http://servername:port/solr/update -H Content-Type: text/xml --data-binary @test.txt 1. Insert a document, sending it to the server hosting the correct shard, with replicas turned off (response time 20ms) I find that if I 'switch off' the replicas for my shard (by shutting down Solr for the replicas), and then I send the new document to the server hosting the leader for the correct shard, then I get a very fast response, i.e. under 10ms, which is similar to the performance I get when not using SolrCloud. This is expected, as I've removed any overhead to do with replicas or routing to the correct shard. 2. Insert a document, sending it to the server hosting the correct shard, but with replicas turned on (response time approx 250ms) If I switch on the replica for that shard, then my average response time for an insert increases from 10ms to around 250ms. Now I expect an overhead, because the leader has to find out where the replica is (from Zookeeper?) and then forward the request to that replica, then wait for a reply - but an increase from 20ms to 250ms seems very high? 3. Insert a document, sending it to a server hosting the incorrect shard, with replicas turned on (response time approx 500ms) If I do the same thing again but this time send to the server hosting a different shard to the shard my document will end up in, the average response times increase again to around 500ms. Again, I'd expect an increase because of the extra step of needing to forward to the correct shard, but the increase seems very high? Should I expect this much of an overhead for shard routing and replicas, or might this indicate a problem in my configuration? Many thanks Ian --- Maer wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadaun gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod ir anfonwr ai dileun ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a defnydd amhriodol. Maen bosibl y bydd y neges e-bost hon ac unrhyw atebion neu atodiadau dilynol yn ddarostyngedig ir Ddeddf Rhyddid Gwybodaeth. Maer farn a fynegir yn y neges e-bost hon yn perthyn ir anfonwr ac nid ydynt o reidrwydd yn perthyn i NWIS. The information included in this email and any attachments is confidential. If received in error, please notify the sender and delete it immediately. Disclosure to any party other than the addressee, whether unintentional or otherwise, is not intended to waive confidentiality. The NHS Wales Informatics Service (NWIS) may monitor and record all emails for viruses and inappropriate use. This e-mail and any subsequent replies or attachments may be subject to the Freedom of Information Act. The views expressed in this email are those of the sender and not necessarily of NWIS. ---
RE: Exact Match first in the list.
Thanks shawn, I am already using the Boosting but the OR condition works for me as you mentioned. One question If I used in search field (TAGs) , it is returning lot of Fields but if try with the '( something like TAGs, it is getting less, why the ( ) are changing the results.? They won't take the exact match ..? Let me know if I am missing something. Thanks -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Monday, July 07, 2014 8:22 PM To: solr-user@lucene.apache.org Subject: Re: Exact Match first in the list. HI, I HAVE A situation where applying below search rules. When I search columns for the full text search. Product Variant Name, the exact match has to be in the first list and other match like , product or variant or name or any combination will be next in the results. Any thoughts, why analyzer or tokenizer or filter need to use. This is more a matter of boosting than analysis. If you are using edismax, this is particularly easy. Just put large boost values on the fields in the pf parameter, and you'd likely want to use the same field list as the qf parameter. If you are not using edismax and can construct such a query yourself, you can boost the phrase over the individual terms. Here's a sample query: Product Variant Name^10 OR (Product Variant Name) This is essentially what edismax will do with a boost on the pf values, except that it will work with more than one field. The edismax parser is a wonderful creation. Thanks, Shawn
Re: I need a replacement for the QueryElevation Component
You can sponsor more then 1 document per keyword. query text=AAA doc id=A / doc id=B / /query And you might want to try str name=queryFieldTypestring/str instead of another FieldType. I found that textFields remove whitespace and concatenated the tokens. Not sure if this is intended or not. -- View this message in context: http://lucene.472066.n3.nabble.com/I-need-a-replacement-for-the-QueryElevation-Component-tp4146077p4146090.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow inserts when using Solr Cloud
Updates are currently done locally before concurrently being sent to all replicas - so on a single update, you can expect 2x just from that. As for your results, it sounds like perhaps there is more overhead than we would like in the code that sends to replicas and forwards updates? Someone would have to dig in to really know I think. I would doubt it’s a configuration issue, but you never know. -- Mark Miller about.me/markrmiller On July 8, 2014 at 9:18:28 AM, Ian Williams (NWIS - Applications Design) (ian.willi...@wales.nhs.uk) wrote: Hi I'm encountering a surprisingly high increase in response times when I insert new documents into a SolrCloud, compared with a standalone Solr instance. I have a SolrCloud set up for test and evaluation purposes. I have four shards, each with a leader and a replica, distributed over four Windows virtual servers. I have zookeeper running on three of the four servers. There are not many documents in my SolrCloud (just a few hundred). I am using composite id routing, specifying a prefix to my document ids which is then used by Solr to determine which shard the document should be stored on. I determine in advance which shard a document with a given id prefix will end up in, by trying it out in advance. I then try the following scenarios, using inserts without commits. E.g. I use: curl http://servername:port/solr/update -H Content-Type: text/xml --data-binary @test.txt 1. Insert a document, sending it to the server hosting the correct shard, with replicas turned off (response time 20ms) I find that if I 'switch off' the replicas for my shard (by shutting down Solr for the replicas), and then I send the new document to the server hosting the leader for the correct shard, then I get a very fast response, i.e. under 10ms, which is similar to the performance I get when not using SolrCloud. This is expected, as I've removed any overhead to do with replicas or routing to the correct shard. 2. Insert a document, sending it to the server hosting the correct shard, but with replicas turned on (response time approx 250ms) If I switch on the replica for that shard, then my average response time for an insert increases from 10ms to around 250ms. Now I expect an overhead, because the leader has to find out where the replica is (from Zookeeper?) and then forward the request to that replica, then wait for a reply - but an increase from 20ms to 250ms seems very high? 3. Insert a document, sending it to a server hosting the incorrect shard, with replicas turned on (response time approx 500ms) If I do the same thing again but this time send to the server hosting a different shard to the shard my document will end up in, the average response times increase again to around 500ms. Again, I'd expect an increase because of the extra step of needing to forward to the correct shard, but the increase seems very high? Should I expect this much of an overhead for shard routing and replicas, or might this indicate a problem in my configuration? Many thanks Ian --- Mae?r wybodaeth a gynhwysir yn y neges e-bost hon ac yn unrhyw atodiadau?n gyfrinachol. Os ydych yn ei derbyn ar gam, rhowch wybod i?r anfonwr a?i dileu?n ddi-oed. Ni fwriedir i ddatgelu i unrhyw un heblaw am y derbynnydd, boed yn anfwriadol neu fel arall, hepgor cyfrinachedd. Efallai bydd Gwasanaeth Gwybodeg GIG Cymru (NWIS) yn monitro ac yn cofnodi pob neges e-bost rhag firysau a defnydd amhriodol. Mae?n bosibl y bydd y neges e-bost hon ac unrhyw atebion neu atodiadau dilynol yn ddarostyngedig i?r Ddeddf Rhyddid Gwybodaeth. Mae?r farn a fynegir yn y neges e-bost hon yn perthyn i?r anfonwr ac nid ydynt o reidrwydd yn perthyn i NWIS. The information included in this email and any attachments is confidential. If received in error, please notify the sender and delete it immediately. Disclosure to any party other than the addressee, whether unintentional or otherwise, is not intended to waive confidentiality. The NHS Wales Informatics Service (NWIS) may monitor and record all emails for viruses and inappropriate use. This e-mail and any subsequent replies or attachments may be subject to the Freedom of Information Act. The views expressed in this email are those of the sender and not necessarily of NWIS. ---
Re: Parallel optimize of index on SolrCloud.
You probably do not need to force merge (mistakenly called optimize) your index. Solr does automatic merges, which work just fine. There are only a few situations where a forced merge is even a good idea. The most common one is a replicated (non-cloud) setup with a full reindex every night. If you need Solr Cloud, I cannot think of a situation where you would want a forced merge. wunder On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServer.opimize() on them parallely but still it seems to be doing optimization sequentially. I tried invoking optimize directly using HttpPost with following url and parameters but still it seems to be sequential. *URL* : http://host:port/solr/collection/update *Parameters*: params.add(new BasicNameValuePair(optimize, true)); params.add(new BasicNameValuePair(maxSegments, 1)); params.add(new BasicNameValuePair(waitFlush, true)); params.add(new BasicNameValuePair(distrib, false)); Kindly provide your suggestion and help. Regards, Modassar
Re: Transparently rebalancing a Solr cluster without splitting or moving shards
Thanks for your suggestions and recommendations. If I understand correctly, the MIGRATE command does shard splitting (around the range of the split.key) and merging behind the scene. Though, it's a bit difficult to properly monitor the actual migration, set the proper timeouts, know when to direct indexing and search traffic to the destination collection, etc. Note sure how to MIGRATE an entire collection. By providing the full list of split.keys? I'd be surprised if that was doable, but I guess it will skip the splitting part, which makes it easier ;-) Or much tougher by splitting around all the ranges. More seriously, doing a MERGEINDEX at the core level might not be a bad alternative, providing the hash ranges are compatible. Damien On 07/07/2014 05:14 PM, Shawn Heisey wrote: I don't think you'd want to disable mmap. It could be done, by choosing another DirectoryFactory object. Adding memory is likely to be the only sane way forward. Another possibility would be to bump up the maxShardsPerNode value and build the new collection (with the proper number of shards) only on the new machines... Then when they are built, move them to their proper homes and manually adjust the cluster state in zookeeper. This will still generate a lot of I/O, but hopefully it will last for less time on the wall clock, and it will be something you can do when load is low. After that done and you've switched to it, you can add replicas with either the addreplica collections api or with the core admin api. You should be on the newest Solr version... Lots of bugs have been found and fixed. One thing I wonder is whether the MIGRATE api can be used on an entire collection. It says it works by shard key, but I suspect that most users will not be using that functionality. Thanks, Shawn
SolrCloud delete replica
Hi, I have an issue regarding collection delete. when a solr node is in down mode and I delete a collection, all things seems fine and it deletes the collection from cluster state too. But when the dead node comes back it register the collection again. Even when I delete the collection by DELETEREPLICA collection api, the core inside the dead node starts to push the collection inside clusterstate.json What is the true config for SolrCloud, ZooKeeper, the solr node or the leader? Is there a way to unload or delete the core in down node, after it becomes active? Thanks
Re: Changing default behavior of solr for overwrite the whole document on uniquekey duplication
I think you are missunderstanding what Himanshu is suggesting to you. You don't need to make lots of big changes ot the internals of solr's code to get what you want -- instead you can leverage the Atomic Updates Optimistic Concurrency features of Solr to get the existing internal Solr to reject any attempts to add a duplicate documentunless the client code sending the document specifies it should be an update. This means your client code needs to be a bit more sophisticated, but the benefit is that you don't have to try to make complex changes to the internals of Solr that may be impossible and/or difficult to support/upgrade later. More details... https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-OptimisticConcurrency Simplest possible idea based on the basic info you have given so far... 1) send every doc using _version_=-1 2a) if doc update fails with error 409, that means a version of this doc already exists 2b) resend just the field changes (using set atomic operation) and specify _version_=1 : Dear Himanshu, : Hi, : You misunderstood what I meant. I am not going to update some field. I am : going to change what Solr do on duplication of uniquekey field. I dont want : to solr overwrite Whole document I just want to overwrite some parts of : document. This situation does not come from user side this is what solr do : to documents with duplicated uniquekey. : Regards. : : : On Tue, Jul 8, 2014 at 12:29 PM, Himanshu Mehrotra : himanshu.mehro...@snapdeal.com wrote: : : Please look at https://wiki.apache.org/solr/Atomic_Updates : : This does what you want just update relevant fields. : : Thanks, : Himanshu : : : On Tue, Jul 8, 2014 at 1:09 PM, Ali Nazemian alinazem...@gmail.com : wrote: : : Dears, : Hi, : According to my requirement I need to change the default behavior of Solr : for overwriting the whole document on unique-key duplication. I am going : to : change that the overwrite just part of document (some fields) and other : parts of document (other fields) remain unchanged. First of all I need to : know such changing in Solr behavior is possible? Second, I really : appreciate if you can guide me through what class/classes should I : consider : for changing that? : Best regards. : : -- : A.Nazemian : : : : : : -- : A.Nazemian : -Hoss http://www.lucidworks.com/
Hypen in search keyword
I have the below config for the field type text_general. But then I search with keyword e.g 100-001, it get 100-001, 100 in starting records ending with 001 . I want to treat - as another character not to split. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.PorterStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PorterStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ /analyzer /fieldType Thanks Ravi
Re: Hypen in search keyword
The word delimiter filter has a types parameter where you specify a file that can map hyphen to alpha or numeric. There is an example in my e-book. -- Jack Krupansky -Original Message- From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Tuesday, July 8, 2014 2:18 PM To: solr-user@lucene.apache.org Subject: Hypen in search keyword I have the below config for the field type text_general. But then I search with keyword e.g 100-001, it get 100-001, 100 in starting records ending with 001 . I want to treat - as another character not to split. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.PorterStemFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PorterStemFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/ /analyzer /fieldType Thanks Ravi
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
On 7/8/2014 1:53 AM, Harald Kirsch wrote: Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Long GC pauses would also be my first guess. DNS problems on the inter-server communication for SolrCloud would be a second guess. If it's not one of these, then I really have no idea. http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems http://serverfault.com/questions/339791/5-second-resolving-delay Thanks, Shawn
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
Local disks or shared network disks? --wunder On Jul 8, 2014, at 11:43 AM, Shawn Heisey s...@elyograg.org wrote: On 7/8/2014 1:53 AM, Harald Kirsch wrote: Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Long GC pauses would also be my first guess. DNS problems on the inter-server communication for SolrCloud would be a second guess. If it's not one of these, then I really have no idea. http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems http://serverfault.com/questions/339791/5-second-resolving-delay Thanks, Shawn
SOLR Talk at AOL Dulles Campus.
All, There is a tech talk on AOL Dulles campus tomorrow. Do swing by if you can and share it with your colleagues and friends. www.meetup.com/Code-Brew/events/192361672/ There will be free food and beer served at this event :) Thanks, Rishi.
RE: [Solr Schema API] SolrJ Access
Alessandro, I just got this to work myself: public static final String DEFINED_FIELDS_API = /schema/fields; public static final String DYNAMIC_FIELDS_API = /schema/dynamicfields; ... // just get a connection to Solr as usual (the factory is mine - it will use CloudSolrServer or HttpSolrServer depending on if we're using SolrCloud or not) SolrClient client = SolrClientFactory.getSolrClientInstance(CLOUD_ENABLED); SolrServer solrConn = client.getConnection(SOLR_URL, collection); SolrQuery query = new SolrQuery(); if (dynamicFields) query.setRequestHandler(DYNAMIC_FIELDS_API); else query.setRequestHandler(DEFINED_FIELDS_API); query.setParam(showDefaults, true); QueryResponse response = solrConn.query(query) Then you've got to parse the response using NamedList etc.etc. -Original Message- From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] Sent: Tuesday, July 08, 2014 5:54 AM To: solr-user@lucene.apache.org Subject: [Solr Schema API] SolrJ Access Hi guys, wondering if there is any proper way to access Schema API via Solrj. Of course is possible to reach them in Java with a specific Http Request, but in this way, using SolrCloud for example we become coupled to one specific instance ( and we don't want) . Code Example : HttpResponse httpResponse; String url=this.solrBase+/+core+ SCHEMA_SOLR_FIELDS_ENDPOINT +fieldName; HttpPut httpPut = new HttpPut(url); StringEntity entity = new StringEntity( {\type\:\text_general\,\stored\:\true\} , ContentType.APPLICATION_JSON); httpPut.setEntity( entity ); HttpClient client=new DefaultHttpClient(); response = client.execute(httpPut); Any suggestion ? In my opinion should be interesting to have some auxiliary method in SolrServer if it's not there yet. Cheers -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Solr atomic updates question
Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no saving. I am actually thinking that there will be a performance penalty since atomic update requires Solr to first retrieve all the fields first before updating. Bill
What does getSearcher method of SolrQueryRequest means ?
Hello there, I'm using a project named LIRE for image retrieval based on sole platform. There is part of the code which i can't understand, so maybe you could help me. The project implements request handler named lireq : public class LireRequestHandler extends RequestHandlerBase The search method in this handler is computed from lucene search + reranking. The first part goes like this : public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { ... BooleanQuery query = new BooleanQuery(); for (int i = 0; i numHashes; i++) { query.add(new BooleanClause(new TermQuery(new Term(paramField, Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD)); } SolrIndexSearcher searcher = req.getSearcher() TopDocs docs = searcher.search(query, candidateResultNumber);
Re: Solr irregularly having QTime 50000ms, stracing solr cures the problem
Sure sounds like a socket bug, doesn't it? I turn to tcpdump when Solr starts behaving strangely in a socket-related way. Knowing exactly what's happening at the transport level is worth a month of guessing and poking. On Jul 8, 2014, at 3:53 AM, Harald Kirsch harald.kir...@raytion.com wrote: Hi all, This is what happens when I run a regular wget query to log the current number of documents indexed: 2014-07-08:07:23:28 QTime=20 numFound=5720168 2014-07-08:07:24:28 QTime=12 numFound=5721126 2014-07-08:07:25:28 QTime=19 numFound=5721126 2014-07-08:07:27:18 QTime=50071 numFound=5721126 2014-07-08:07:29:08 QTime=50058 numFound=5724494 2014-07-08:07:30:58 QTime=50033 numFound=5730710 2014-07-08:07:31:58 QTime=13 numFound=5730710 2014-07-08:07:33:48 QTime=50065 numFound=5734069 2014-07-08:07:34:48 QTime=16 numFound=5737742 2014-07-08:07:36:38 QTime=50037 numFound=5737742 2014-07-08:07:37:38 QTime=12 numFound=5738190 2014-07-08:07:38:38 QTime=23 numFound=5741208 2014-07-08:07:40:29 QTime=50034 numFound=5742067 2014-07-08:07:41:29 QTime=12 numFound=5742067 2014-07-08:07:42:29 QTime=17 numFound=5742067 2014-07-08:07:43:29 QTime=20 numFound=5745497 2014-07-08:07:44:29 QTime=13 numFound=5745981 2014-07-08:07:45:29 QTime=23 numFound=5746420 As you can see, the QTime is just over 50 seconds at irregular intervals. This happens independent of whether I am indexing documents with around 20 dps or not. First I thought about a dependence on the auto-commit of 5 minutes, but the the 50 seconds hits are too irregular. Furthermore, and this is *really strange*: when hooking strace on the solr process, the 50 seconds QTimes disappear completely and consistently --- a real Heisenbug. Nevertheless, strace shows that there is a socket timeout of 50 seconds defined in calls like this: [pid 1253] 09:09:37.857413 poll([{fd=96, events=POLLIN|POLLERR}], 1, 5) = 1 ([{fd=96, revents=POLLIN}]) 0.40 where the fd=96 is the result of [pid 25446] 09:09:37.855235 accept(122, {sa_family=AF_INET, sin_port=htons(57236), sin_addr=inet_addr(ip address of local host)}, [16]) = 96 0.54 where again fd=122 is the TCP port on which solr was started. My hunch is that this is communication between the cores of solr. I tried to search the internet for such a strange connection between socket timeouts and strace, but could not find anything (the stackoverflow entry from yesterday is my own :-( This smells a bit like a race condition/deadlock kind of thing which is broken up by timing differences introduced by stracing the process. Any hints appreciated. For completeness, here is my setup: - solr-4.8.1, - cloud version running - 10 shards on 10 cores in one instance - hosted on SUSE Linux Enterprise Server 11 (x86_64), VERSION 11, PATCHLEVEL 2 - hosted on a vmware, 4 CPU cores, 16 GB RAM - single digit million docs indexed, exact number does not matter - zero query load Harald.
Re: What does getSearcher method of SolrQueryRequest means ?
(Sorry - my mail was sent half ready) hashes is an array of hash values generated some-how from the image. So my question is what is the query being done in this part ? I tried to reconstruct it by my own, by constructing select query with the hash values seperated by OR but the results were different. Any one can tell me why ? This where the source code is : http://code.google.com/p/lire/ On Wed, Jul 9, 2014 at 1:29 AM, Yossi Biton yossibi...@gmail.com wrote: Hello there, I'm using a project named LIRE for image retrieval based on sole platform. There is part of the code which i can't understand, so maybe you could help me. The project implements request handler named lireq : public class LireRequestHandler extends RequestHandlerBase The search method in this handler is computed from lucene search + reranking. The first part goes like this : public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { ... BooleanQuery query = new BooleanQuery(); for (int i = 0; i numHashes; i++) { query.add(new BooleanClause(new TermQuery(new Term(paramField, Integer.toHexString(hashes[i]))), BooleanClause.Occur.SHOULD)); } SolrIndexSearcher searcher = req.getSearcher() TopDocs docs = searcher.search(query, candidateResultNumber); -- יוסי
Re: Solr atomic updates question
Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and normal updates is likely minimal. Atomic updates are for when you have changes and want to apply them to a document without affecting the other fields. A regular add will replace an existing document completely. AFAIK Solr will let you mix atomic updates with regular field values, but I don't think it's a good idea. Steve On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote: Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no saving. I am actually thinking that there will be a performance penalty since atomic update requires Solr to first retrieve all the fields first before updating. Bill
Re: Solr atomic updates question
Thanks for that under-the-cover explanation. I am not sure what you mean by mix atomic updates with regular field values. Can you give an example? Thanks. Bill On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote: Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and normal updates is likely minimal. Atomic updates are for when you have changes and want to apply them to a document without affecting the other fields. A regular add will replace an existing document completely. AFAIK Solr will let you mix atomic updates with regular field values, but I don't think it's a good idea. Steve On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote: Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no saving. I am actually thinking that there will be a performance penalty since atomic update requires Solr to first retrieve all the fields first before updating. Bill
Re: Solr atomic updates question
Take a look at this update XML: add doc field name=employeeId05991/field field name=employeeNameSteve McKay/field field name=office update=setWalla Walla/field field name=skills update=addPython/field /doc /add Let's say employeeId is the key. If there's a fourth field, salary, on the existing doc, should it be deleted or retained? With this update it will obviously be deleted: add doc field name=employeeId05991/field field name=employeeNameSteve McKay/field /doc /add With this XML it will be retained: add doc field name=employeeId05991/field field name=office update=setWalla Walla/field field name=skills update=addPython/field /doc /add I'm not willing to guess what will happen in the case where non-atomic and atomic updates are present on the same add because I haven't looked at that code since 4.0, but I think I could make a case for retaining salary or for discarding it. That by itself reeks--and it's also not well documented. Relying on iffy, poorly-documented behavior is asking for pain at upgrade time. Steve On Jul 8, 2014, at 7:02 PM, Bill Au bill.w...@gmail.com wrote: Thanks for that under-the-cover explanation. I am not sure what you mean by mix atomic updates with regular field values. Can you give an example? Thanks. Bill On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote: Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and normal updates is likely minimal. Atomic updates are for when you have changes and want to apply them to a document without affecting the other fields. A regular add will replace an existing document completely. AFAIK Solr will let you mix atomic updates with regular field values, but I don't think it's a good idea. Steve On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote: Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no saving. I am actually thinking that there will be a performance penalty since atomic update requires Solr to first retrieve all the fields first before updating. Bill
Re: Solr atomic updates question
I see what you mean now. Thanks for the example. It makes things very clear. I have been thinking about the explanation in the original response more. According to that, both regular update with entire doc and atomic update involves a delete by id followed by a add. But both the Solr reference doc ( https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents) says that: The first is *atomic updates*. This approach allows changing only one or more fields of a document without having to re-index the entire document. But since Solr is doing a delete by id followed by a add, so without having to re-index the entire document apply to the client side only? On the server side the add means that the entire document is re-indexed, right? Bill On Tue, Jul 8, 2014 at 7:32 PM, Steve McKay st...@b.abbies.us wrote: Take a look at this update XML: add doc field name=employeeId05991/field field name=employeeNameSteve McKay/field field name=office update=setWalla Walla/field field name=skills update=addPython/field /doc /add Let's say employeeId is the key. If there's a fourth field, salary, on the existing doc, should it be deleted or retained? With this update it will obviously be deleted: add doc field name=employeeId05991/field field name=employeeNameSteve McKay/field /doc /add With this XML it will be retained: add doc field name=employeeId05991/field field name=office update=setWalla Walla/field field name=skills update=addPython/field /doc /add I'm not willing to guess what will happen in the case where non-atomic and atomic updates are present on the same add because I haven't looked at that code since 4.0, but I think I could make a case for retaining salary or for discarding it. That by itself reeks--and it's also not well documented. Relying on iffy, poorly-documented behavior is asking for pain at upgrade time. Steve On Jul 8, 2014, at 7:02 PM, Bill Au bill.w...@gmail.com wrote: Thanks for that under-the-cover explanation. I am not sure what you mean by mix atomic updates with regular field values. Can you give an example? Thanks. Bill On Tue, Jul 8, 2014 at 6:56 PM, Steve McKay st...@b.abbies.us wrote: Atomic updates fetch the doc with RealTimeGet, apply the updates to the fetched doc, then reindex. Whether you use atomic updates or send the entire doc to Solr, it has to deleteById then add. The perf difference between the atomic updates and normal updates is likely minimal. Atomic updates are for when you have changes and want to apply them to a document without affecting the other fields. A regular add will replace an existing document completely. AFAIK Solr will let you mix atomic updates with regular field values, but I don't think it's a good idea. Steve On Jul 8, 2014, at 5:30 PM, Bill Au bill.w...@gmail.com wrote: Solr atomic update allows for changing only one or more fields of a document without having to re-index the entire document. But what about the case where I am sending in the entire document? In that case the whole document will be re-indexed anyway, right? So I assume that there will be no saving. I am actually thinking that there will be a performance penalty since atomic update requires Solr to first retrieve all the fields first before updating. Bill
fix wiki error
The url for solr atomic update documentation should contain json in the end. Here is the page - https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'
Re: fix wiki error
Why do you think so? As of Solr 4, the CSV and JSON handlers have been unified in the general update handler and the /update/json is there for legacy reason. The example should work. If it is not for you, it might be a different reason. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Wed, Jul 9, 2014 at 9:56 AM, Susmit Shukla shukla.sus...@gmail.com wrote: The url for solr atomic update documentation should contain json in the end. Here is the page - https://wiki.apache.org/solr/UpdateJSON#Solr_4.0_Example curl http://localhost:8983/solr/update/*json* -H 'Content-type:application/json'
Add a new replica to SolrCloud
Hi, I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 servers with number of shards as 2, replication factor as 2 and mas shards per node as 4. Now, I want to add another server to the SolrCloud as a replica. I can see Collection API to add a new replica but that was added in Solr 4.8. Is there some way to add a new replica in Solr 4.7.2? -- Thanks Varun Gupta
Synchronising two masters
Hi , Our solr setup consists of 2 Masters and 2Slaves. The slaves would point to any one of the Masters through a load balancer and replicate the data. Master1(M1) is the primary indexer. I send data to M1. In case M1 fails, i have a failover master, M2 and that would be indexing the data. The problem is, once the Master1 comes up, how to synchornize M1 and M2? SolrCloud would the option rather that going with this setup. But, currently we want it to be implemented in Master-Slave mode. Any suggestions? Thanks, Prasi
Re: Parallel optimize of index on SolrCloud.
Thanks Walter for your inputs. Our use case and performance benchmark requires us to invoke optimize. Here we see a chance of improvement in performance of optimize() if invoked in parallel. I found that if* distrib=false *is used, the optimization will happen in parallel. But I could not find a way to set it using HttpSolrServer/CloudSolrServer. Also with the parameter setting as given in my mail above does not seems to work. Please let me know in what ways I can achieve the parallel optimize on SolrCloud. Thanks, Modassar On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org wrote: You probably do not need to force merge (mistakenly called optimize) your index. Solr does automatic merges, which work just fine. There are only a few situations where a forced merge is even a good idea. The most common one is a replicated (non-cloud) setup with a full reindex every night. If you need Solr Cloud, I cannot think of a situation where you would want a forced merge. wunder On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServer.opimize() on them parallely but still it seems to be doing optimization sequentially. I tried invoking optimize directly using HttpPost with following url and parameters but still it seems to be sequential. *URL* : http://host:port/solr/collection/update *Parameters*: params.add(new BasicNameValuePair(optimize, true)); params.add(new BasicNameValuePair(maxSegments, 1)); params.add(new BasicNameValuePair(waitFlush, true)); params.add(new BasicNameValuePair(distrib, false)); Kindly provide your suggestion and help. Regards, Modassar
Planning ahead for Solr Cloud and Scaling
I'm working on a product hosted with AWS that uses Elastic Beanstalk auto-scaling to good effect and we are trying to set up similar (more or less) runtime scaling support with Solr. I think I understand how to set this up, and wanted to check I was on the right track. We currently run 3 cores on a single host / Solr server / shard. This is just fine for now, and we have overhead for the near future. However, I need to have a plan, and then test, for a higher capacity future. 1) I gather that if I set up SolrCloud, and then later load increases, I can spin up a second host / Solr server, create a new shard, and then split the first shard: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 And doing this, we no longer have to commit to shards out of the gate. 2) I'm not clear whether there's a big advantage splitting up the cores or not. Two of the three cores will have about the same number of documents, though only one contains large amounts of text. The third core is much smaller in both bytes and documents (2 orders of magnitude). 3) We are also looking at moving multi-lingual. The current plan is to store the localized text in fields within the same core. The languages will be added over time. We can update the schema (as each will be optional). This seems easier than adding a core for each language. Is there a downside? Thanks for any pointers.
Re: Add a new replica to SolrCloud
Yes, you can just call a Core Admin CREATE on the new node with the collection name and optionally the shard name. On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta varun.vgu...@gmail.com wrote: Hi, I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 servers with number of shards as 2, replication factor as 2 and mas shards per node as 4. Now, I want to add another server to the SolrCloud as a replica. I can see Collection API to add a new replica but that was added in Solr 4.8. Is there some way to add a new replica in Solr 4.7.2? -- Thanks Varun Gupta -- Regards, Shalin Shekhar Mangar.
Re: Add a new replica to SolrCloud
Yes, there is a way. One node on which replica needs to be created hit curl ' http://localhost:8983/solr/admin/cores?action=CREATEname=corenamecollection=collectionshard= http://localhost:8983/solr/admin/cores?action=CREATEname=mycorecollection=collection1shard=shard2 shardid' For example curl ' http://localhost:8983/solr/admin/cores?action=CREATEname=mycorecollection=collection1shard=shard2 ' see http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin for details. Thanks, Himanshu On Wed, Jul 9, 2014 at 9:46 AM, Varun Gupta varun.vgu...@gmail.com wrote: Hi, I am currently using Solr 4.7.2 and have SolrCloud setup running on 2 servers with number of shards as 2, replication factor as 2 and mas shards per node as 4. Now, I want to add another server to the SolrCloud as a replica. I can see Collection API to add a new replica but that was added in Solr 4.8. Is there some way to add a new replica in Solr 4.7.2? -- Thanks Varun Gupta
Re: Parallel optimize of index on SolrCloud.
I seriously doubt that you are required to force merge. How much improvement? And is the big performance cost also OK? I have worked on search engines that do automatic merges and offer forced merges for over fifteen years. For all that time, forced merges have usually caused problems. Stop doing forced merges. wunder On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com wrote: Thanks Walter for your inputs. Our use case and performance benchmark requires us to invoke optimize. Here we see a chance of improvement in performance of optimize() if invoked in parallel. I found that if* distrib=false *is used, the optimization will happen in parallel. But I could not find a way to set it using HttpSolrServer/CloudSolrServer. Also with the parameter setting as given in my mail above does not seems to work. Please let me know in what ways I can achieve the parallel optimize on SolrCloud. Thanks, Modassar On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org wrote: You probably do not need to force merge (mistakenly called optimize) your index. Solr does automatic merges, which work just fine. There are only a few situations where a forced merge is even a good idea. The most common one is a replicated (non-cloud) setup with a full reindex every night. If you need Solr Cloud, I cannot think of a situation where you would want a forced merge. wunder On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServer.opimize() on them parallely but still it seems to be doing optimization sequentially. I tried invoking optimize directly using HttpPost with following url and parameters but still it seems to be sequential. *URL* : http://host:port/solr/collection/update *Parameters*: params.add(new BasicNameValuePair(optimize, true)); params.add(new BasicNameValuePair(maxSegments, 1)); params.add(new BasicNameValuePair(waitFlush, true)); params.add(new BasicNameValuePair(distrib, false)); Kindly provide your suggestion and help. Regards, Modassar -- Walter Underwood wun...@wunderwood.org
Re: Parallel optimize of index on SolrCloud.
Our index has almost 100M documents running on SolrCloud of 3 shards and each shard has an index size of about 700GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but it does not work well. Kindly provide your suggestion. Thanks, Modassar On Wed, Jul 9, 2014 at 10:47 AM, Walter Underwood wun...@wunderwood.org wrote: I seriously doubt that you are required to force merge. How much improvement? And is the big performance cost also OK? I have worked on search engines that do automatic merges and offer forced merges for over fifteen years. For all that time, forced merges have usually caused problems. Stop doing forced merges. wunder On Jul 8, 2014, at 10:09 PM, Modassar Ather modather1...@gmail.com wrote: Thanks Walter for your inputs. Our use case and performance benchmark requires us to invoke optimize. Here we see a chance of improvement in performance of optimize() if invoked in parallel. I found that if* distrib=false *is used, the optimization will happen in parallel. But I could not find a way to set it using HttpSolrServer/CloudSolrServer. Also with the parameter setting as given in my mail above does not seems to work. Please let me know in what ways I can achieve the parallel optimize on SolrCloud. Thanks, Modassar On Tue, Jul 8, 2014 at 7:53 PM, Walter Underwood wun...@wunderwood.org wrote: You probably do not need to force merge (mistakenly called optimize) your index. Solr does automatic merges, which work just fine. There are only a few situations where a forced merge is even a good idea. The most common one is a replicated (non-cloud) setup with a full reindex every night. If you need Solr Cloud, I cannot think of a situation where you would want a forced merge. wunder On Jul 8, 2014, at 2:01 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Need to optimize index created using CloudSolrServer APIs under SolrCloud setup of 3 instances on separate machines. Currently it optimizes sequentially if I invoke cloudSolrServer.optimize(). To make it parallel I tried making three separate HttpSolrServer instances and invoked httpSolrServer.opimize() on them parallely but still it seems to be doing optimization sequentially. I tried invoking optimize directly using HttpPost with following url and parameters but still it seems to be sequential. *URL* : http://host:port/solr/collection/update *Parameters*: params.add(new BasicNameValuePair(optimize, true)); params.add(new BasicNameValuePair(maxSegments, 1)); params.add(new BasicNameValuePair(waitFlush, true)); params.add(new BasicNameValuePair(distrib, false)); Kindly provide your suggestion and help. Regards, Modassar -- Walter Underwood wun...@wunderwood.org