Problem with Master
Hello, i'm having some issues with replication in my production environment. I have a master and 4 slaves. I had some data indexed and was replicated successfully. We are close to make the production environment public, so i deleted the old data deleting the data folder in the master Then i reloaded the master (from the manager of tomcat) hoping that the slaves will get updated with the new empty index. But when i enter to each slave in the replication page, i see that they have the old index. Even if i manually tell them to replicate, the replication count increases but there is no change in the indexed data. I checked the logs of the master and slaves and i see no error. I do see the /replication request reaching to the master. I put the SolrCore and ReplicationHandler log's level to FINEST. Still nothing. I went agains the slave with the command=details and i saw a list of ReplicationList and FailedList. And the failed list indicating that the replication is failing. But i don't know why and i don't know where to look for the error. Thanks in advance, this is really urgent. PD: I hope this goes thru the spam filter...
Re: Problem with Master
Just to add more info... this is the result of a Replication / Command=Details I'm really confused by the masterDetails/indexSize being 52 byts (its correct), but the indexSize being 303.8 KB ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime15/int /lst - lst name=details str name=indexSize303.8 KB/str str name=indexPathD:\Solr\data\solr\index/str arr name=commits / str name=isMasterfalse/str str name=isSlavetrue/str long name=indexVersion1301331343628/long long name=generation3/long - lst name=slave - lst name=masterDetails str name=indexSize52 bytes/str str name=indexPathD:\Solr\data\solr\index/str - arr name=commits - lst long name=indexVersion1304086785516/long long name=generation1/long - arr name=filelist strsegments_1/str /arr /lst /arr str name=isMastertrue/str str name=isSlavefalse/str long name=indexVersion1304086785516/long long name=generation1/long /lst str name=masterUrlhttp://192.168.211.185:8787/solr/replication/str str name=pollInterval00:00:60/str str name=nextExecutionAtFri Apr 29 12:04:57 ART 2011/str str name=indexReplicatedAtFri Apr 29 12:03:57 ART 2011/str - arr name=indexReplicatedAtList strFri Apr 29 12:03:57 ART 2011/str strFri Apr 29 12:02:57 ART 2011/str strFri Apr 29 12:01:57 ART 2011/str strFri Apr 29 12:00:57 ART 2011/str strFri Apr 29 11:59:57 ART 2011/str strFri Apr 29 11:58:57 ART 2011/str strFri Apr 29 11:57:57 ART 2011/str strFri Apr 29 11:56:57 ART 2011/str strFri Apr 29 11:55:57 ART 2011/str strFri Apr 29 11:54:57 ART 2011/str /arr - arr name=replicationFailedAtList strFri Apr 29 12:03:57 ART 2011/str strFri Apr 29 12:02:57 ART 2011/str strFri Apr 29 12:01:57 ART 2011/str strFri Apr 29 12:00:57 ART 2011/str strFri Apr 29 11:59:57 ART 2011/str strFri Apr 29 11:58:57 ART 2011/str strFri Apr 29 11:57:57 ART 2011/str strFri Apr 29 11:56:57 ART 2011/str strFri Apr 29 11:55:57 ART 2011/str strFri Apr 29 11:54:57 ART 2011/str /arr str name=timesIndexReplicated44794/str str name=confFilesReplicated[solrconfig_slave.xml]/str str name=timesConfigReplicated1/str str name=confFilesReplicatedAt1301405968250/str str name=lastCycleBytesDownloaded0/str str name=timesFailed44792/str str name=replicationFailedAtFri Apr 29 12:03:57 ART 2011/str str name=previousCycleTimeInSeconds0/str str name=isPollingDisabledfalse/str str name=isReplicatingfalse/str /lst /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str /response On Fri, Apr 29, 2011 at 11:52 AM, Ezequiel Calderara ezech...@gmail.com wrote: Hello, i'm having some issues with replication in my production environment. I have a master and 4 slaves. I had some data indexed and was replicated successfully. We are close to make the production environment public, so i deleted the old data deleting the data folder in the master Then i reloaded the master (from the manager of tomcat) hoping that the slaves will get updated with the new empty index. But when i enter to each slave in the replication page, i see that they have the old index. Even if i manually tell them to replicate, the replication count increases but there is no change in the indexed data. I checked the logs of the master and slaves and i see no error. I do see the /replication request reaching to the master. I put the SolrCore and ReplicationHandler log's level to FINEST. Still nothing. I went agains the slave with the command=details and i saw a list of ReplicationList and FailedList. And the failed list indicating that the replication is failing. But i don't know why and i don't know where to look for the error. Thanks in advance, this is really urgent. PD: I hope this goes thru the spam filter... -- __ Ezequiel. Http://www.ironicnet.com
Re: Curl bulk XML
From the post.jar i think that you can do something like... java -jar post.jar A*.xml java -jar post.jar B*.xml java -jar post.jar C*.xml java -jar post.jar D*.xml (im in windows) On Wed, Apr 13, 2011 at 4:41 PM, Markus Jelsma markus.jel...@openindex.iowrote: Either put all documents in a large file or loop over them with a simple shell script. Hey guys, how do you curl update all the XML inside a folder from A-D? Example: curl http://localhost:8080/solr update Sent from my iPhone -- __ Ezequiel. Http://www.ironicnet.com
Re: Problems indexing very large set of documents
Maybe those files are created with a different Adobe Format version... See this: http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html On Fri, Apr 8, 2011 at 12:14 PM, Brandon Waterloo brandon.water...@matrix.msu.edu wrote: A second test has revealed that it is something to do with the contents, and not the literal filenames, of the second set of files. I renamed one of the second-format files and tested it and Solr still failed. However, the problem still only applies to those files of the second naming format. From: Brandon Waterloo [brandon.water...@matrix.msu.edu] Sent: Friday, April 08, 2011 10:40 AM To: solr-user@lucene.apache.org Subject: RE: Problems indexing very large set of documents I had some time to do some research into the problems. From what I can tell, it appears Solr is tripping up over the filename. These are strictly examples, but, Solr handles this filename fine: 32-130-A0-84-african_activist_archive-a0a6s3-b_12419.pdf However, it fails with either a parsing error or an EOF exception on this filename: 32-130-A08-84-al.sff.document.nusa197102.pdf The only significant difference is that the second filename contains multiple periods. As there are about 1700 files whose filenames are similar to the second format it is simply not possible to change their filenames. In addition they are being used by other applications. Is there something I can change in Solr configs to fix this issue or am I simply SOL until the Solr dev team can work on this? (assuming I put in a ticket) Thanks again everyone, ~Brandon Waterloo From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Tuesday, April 05, 2011 3:03 PM To: solr-user@lucene.apache.org Subject: RE: Problems indexing very large set of documents : It wasn't just a single file, it was dozens of files all having problems : toward the end just before I killed the process. ... : That is by no means all the errors, that is just a sample of a few. : You can see they all threw HTTP 500 errors. What is strange is, nearly : every file succeeded before about the 2200-files-mark, and nearly every : file after that failed. ..the root question is: do those files *only* fail if you have already indexed ~2200 files, or do they fail if you start up your server and index them first? there may be a resource issued (if it only happens after indexing 2200) or it may just be a problem with a large number of your PDFs that your iteration code just happens to get to at that point. If it's the former, then there may e something buggy about how Solr is using Tika to cause the problem -- if it's the later, then it's a straight Tika parsing issue. : now, commit is set to false to speed up the indexing, and I'm assuming that : Solr should be auto-committing as necessary. I'm using the default : solrconfig.xml file included in apache-solr-1.4.1\example\solr\conf. Once solr does no autocommitting by default, you need to check your solrconfig.xml -Hoss -- __ Ezequiel. Http://www.ironicnet.com
Re: Problems indexing very large set of documents
Ohh sorry... didn't realize that they already sent you that link :P On Fri, Apr 8, 2011 at 12:35 PM, Ezequiel Calderara ezech...@gmail.comwrote: Maybe those files are created with a different Adobe Format version... See this: http://lucene.472066.n3.nabble.com/PDF-parser-exception-td644885.html On Fri, Apr 8, 2011 at 12:14 PM, Brandon Waterloo brandon.water...@matrix.msu.edu wrote: A second test has revealed that it is something to do with the contents, and not the literal filenames, of the second set of files. I renamed one of the second-format files and tested it and Solr still failed. However, the problem still only applies to those files of the second naming format. From: Brandon Waterloo [brandon.water...@matrix.msu.edu] Sent: Friday, April 08, 2011 10:40 AM To: solr-user@lucene.apache.org Subject: RE: Problems indexing very large set of documents I had some time to do some research into the problems. From what I can tell, it appears Solr is tripping up over the filename. These are strictly examples, but, Solr handles this filename fine: 32-130-A0-84-african_activist_archive-a0a6s3-b_12419.pdf However, it fails with either a parsing error or an EOF exception on this filename: 32-130-A08-84-al.sff.document.nusa197102.pdf The only significant difference is that the second filename contains multiple periods. As there are about 1700 files whose filenames are similar to the second format it is simply not possible to change their filenames. In addition they are being used by other applications. Is there something I can change in Solr configs to fix this issue or am I simply SOL until the Solr dev team can work on this? (assuming I put in a ticket) Thanks again everyone, ~Brandon Waterloo From: Chris Hostetter [hossman_luc...@fucit.org] Sent: Tuesday, April 05, 2011 3:03 PM To: solr-user@lucene.apache.org Subject: RE: Problems indexing very large set of documents : It wasn't just a single file, it was dozens of files all having problems : toward the end just before I killed the process. ... : That is by no means all the errors, that is just a sample of a few. : You can see they all threw HTTP 500 errors. What is strange is, nearly : every file succeeded before about the 2200-files-mark, and nearly every : file after that failed. ..the root question is: do those files *only* fail if you have already indexed ~2200 files, or do they fail if you start up your server and index them first? there may be a resource issued (if it only happens after indexing 2200) or it may just be a problem with a large number of your PDFs that your iteration code just happens to get to at that point. If it's the former, then there may e something buggy about how Solr is using Tika to cause the problem -- if it's the later, then it's a straight Tika parsing issue. : now, commit is set to false to speed up the indexing, and I'm assuming that : Solr should be auto-committing as necessary. I'm using the default : solrconfig.xml file included in apache-solr-1.4.1\example\solr\conf. Once solr does no autocommitting by default, you need to check your solrconfig.xml -Hoss -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: Solr without Server / Search solutions with Solr on DVD (examples?)
Can't you just run a jetty server on the background? But probably some antivirus or antispyware could take that as an tojan or something like that. How many little main memory is? 1gb? less? I don't think that you are going to have problems above 1gb. The index will be static, no changes, no optimizations... That's my thought On Thu, Apr 7, 2011 at 11:12 AM, karsten-s...@gmx.de wrote: Hi folks, we want to migrate our search-portal to Solr. But some of our customers search in our informations offline with a DVD-Version. So we want to estimate the complexity of a Solr DVD-Version. This means to trim Solr to work on small computers with the opposite of heavy loads. So no server-optimizations, no Cache, less facet terms in memory... My question: Does anyone know examples of solutions with Solr starting from DVD? Is there a tutorial for “configure a slow Solr for Computer with little main memory”? Any best practice tips from yourself? Best regards Karsten -- __ Ezequiel. Http://www.ironicnet.com
Re: Solr without Server / Search solutions with Solr on DVD (examples?)
Try setting a virtual machine and see its performance. I'm really not a java guy, so i really don't know how to tune it for performance... But afaik solr handles pretty well in ram if the index is static... On Thu, Apr 7, 2011 at 2:48 PM, Karsten Fissmer karsten-s...@gmx.de wrote: Hi yonik, Hi Ezequiel, Java is no problem for an DVD Version. We already have a DVD version with Servlet-Container (but this does currently not use Solr). Some of our customers work in public sector institutions and have less then 1gb main memory, but they use MS Word and IE and.. But let us say that we can set Xmx384m (we have 14m documents). Xmx384m with 14m UnitsOfRetrieval means e.g. that we do not allow the same fields for sorting as on server. My main interest is an example of other companies/product who delivered information on DVD with stand alone Solr. Best regards Karsten ---yonik Including a JRE on the DVD and a launch script that uses that JRE by default should be doable as well. -Yonik Jeffrey Even if you can ship your DVD with a jetty server, you'll still need JAVA installed on the customer machine... ---Karsten My question: Does anyone know examples of solutions with Solr starting from DVD? Is there a tutorial for “configure a slow Solr for Computer with little main memory”? Any best practice tips from yourself? -- __ Ezequiel. Http://www.ironicnet.com
Re: Trying to Post. Emails rejected as spam.
Happened to me a couple of times, couldn't find a way a workaround... On Thu, Apr 7, 2011 at 4:14 PM, Parker Johnson pjoh...@yahoo.com wrote: Hello everyone. Does anyone else have problems posting to the list? My messages keep getting rejected with this response below. I'll be surprised if this one makes it through :) -Park Sorry, we were unable to deliver your message to the following address. solr-user@lucene.apache.org: Remote host said: 552 spam score (8.0) exceeded threshold (FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FS_REPLICA,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL ) [BODY] --- Below this line is a copy of the message. -- __ Ezequiel. Http://www.ironicnet.com
Solr: Images, Docs and Binary data
Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this type of docs? How affects performance? Thanks Everyone -- __ Ezequiel. Http://www.ironicnet.com
Re: Solr: Images, Docs and Binary data
Another question that maybe is easier to answer, how can i store binary data? Any example schema? 2011/4/6 Ezequiel Calderara ezech...@gmail.com Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this type of docs? How affects performance? Thanks Everyone -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: Solr: Images, Docs and Binary data
Hi, your answers were really helpfull I was thinking in putting the base64 encoded file into a string field. But was a little worried about solr trying to stem it or vectorize or those stuff. Seen in the example of the schema.xml: !--Binary data type. The data should be sent/retrieved in as Base64 encoded Strings -- fieldtype name=binary class=solr.BinaryField/ Anyone knows any storage for images that performs well, other than FS ? Thanks On Wed, Apr 6, 2011 at 3:31 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Ha, there's a binary field type?! I've stored binary data in an ordinary String field type, and it's worked. But there were some headaches to get it to work, might have been smoother if I had realized there was actually a binary field type. But wait I'm talking about Solr 'stored field', not about indexing. I didn't try to index my binary data, just store it for later retrieval (knowing this can sometimes be a performance problem, doing it anyway with relatively small data, got away with it). Does the field type even effect the _stored values_ in a Solr field? On 4/6/2011 2:25 PM, Ryan McKinley wrote: You can store binary data using a binary field type -- then you need to send the data base64 encoded. I would strongly recommend against storing large binary files in solr -- unless you really don't care about performance -- the file system is a good option that springs to mind. ryan 2011/4/6 Ezequiel Calderaraezech...@gmail.com: Another question that maybe is easier to answer, how can i store binary data? Any example schema? 2011/4/6 Ezequiel Calderaraezech...@gmail.com Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this type of docs? How affects performance? Thanks Everyone -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: Solr: Images, Docs and Binary data
On Wed, Apr 6, 2011 at 15:31 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Well...by default there is a pretty decent schema that you can use as a template in the example project that builds with Solr. Tika is the library that does the actual content extraction so it would be a good idea to try the example project out first. I wanted to know how large field's size affects performance. But i wasn't sure how to design the schema. On Wed, Apr 6, 2011 at 4:23 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Ezequiel, Am 06.04.2011 20:38, schrieb Ezequiel Calderara: Anyone knows any storage for images that performs well, other than FS ? you may have a look on http://www.danga.com/mogilefs/ ? :) Regards Stefan Stefan, we looked at mogilefs, also couchdb and mongodb. AFAIR (As Far as I Read :P), mogilefs runs on *nix OS, while we are using microsoft as the OS. (yeah, we are the open source evangelist in our company :P) Just for the moment we well start using Solr for storing and indexing (some info at least) images and docs. We have yet to see what are the needs in terms of scalability to choose between the options. Thanks all... If you have more info send it :) -- __ Ezequiel. Http://www.ironicnet.com
Re: Solrcore.properties
Hi Jayendra, this is the content of the files: In the Master: + SolrConfig.xml : http://pastebin.com/JhvwMTdd In the Slave: + solrconfig.xml: http://pastebin.com/XPuwAkmW + solrcore.properties: http://pastebin.com/6HZhQG8z I don't know which other files do you need or could be involved in this. I checked the home environment key in the tomcat instance and its ok too. Any light on this would be appreciated! On Mon, Mar 28, 2011 at 6:26 PM, Jayendra Patil jayendra.patil@gmail.com wrote: Can you please attach the other files. It doesn't seem to find the enable.master property, so you may want to check the properties file exists on the box having issues We have the following configuration in the core :- Core - - solrconfig.xml - Master Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilessolrcore_slave.properties:solrcore.properties,solrconfig.xml,schema.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://master_host:port/solr/corename/replication/str /lst /requestHandler - solrcore.properties - Master enable.master=true enable.slave=false - solrcore_slave.properties - Slave enable.master=false enable.slave=true We have the default values and separate properties file for Master and slave. Replication is enabled for the solrcore.proerties file. Regards, Jayendra On Mon, Mar 28, 2011 at 2:06 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, i'm having problems when deploying solr in the production machines. I have a master solr, and 3 slaves. The master replicates the schema and the solrconfig for the slaves (this file in the master is named like solrconfig_slave.xml). The solrconfig of the slaves has for example the ${data.dir} and other values in the solrtcore.properties I think that solr isn't recognizing that file, because i get this error: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: No system property or default value specified for enable.master at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) ... MORE STACK TRACE INFO... But here is the thing: org.apache.solr.common.SolrException: No system property or default value specified for enable.master I'm attaching the master schema, the master solr config, the solr config of the slaves and the solrcore.properties. If anyone has any info on this i would be more than appreciated!... Thanks -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: Solrcore.properties
I think that i found the problem: The contents of the solrcore.properties were: #solrcore.properties data.dir=D:\Solr\data\solr\ enable.master=false enable.slave=true masterUrl=http://url:8787/solr/ pollInterval=00:00:60 I found a folder in the D:\ called: SolrDatasolrenable.master=false So i researched a little and tested another little more and i found that i have to escape the data.dir like this: #solrcore.properties data.dir=D:\\Solr\\data\\solr\\ enable.master=false enable.slave=true masterUrl=http://url:8787/solr/ pollInterval=00:00:60 And Problem solved, for now at least :P On Tue, Mar 29, 2011 at 8:37 AM, Ezequiel Calderara ezech...@gmail.comwrote: Hi Jayendra, this is the content of the files: In the Master: + SolrConfig.xml : http://pastebin.com/JhvwMTdd In the Slave: + solrconfig.xml: http://pastebin.com/XPuwAkmW + solrcore.properties: http://pastebin.com/6HZhQG8z I don't know which other files do you need or could be involved in this. I checked the home environment key in the tomcat instance and its ok too. Any light on this would be appreciated! On Mon, Mar 28, 2011 at 6:26 PM, Jayendra Patil jayendra.patil@gmail.com wrote: Can you please attach the other files. It doesn't seem to find the enable.master property, so you may want to check the properties file exists on the box having issues We have the following configuration in the core :- Core - - solrconfig.xml - Master Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilessolrcore_slave.properties:solrcore.properties,solrconfig.xml,schema.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://master_host:port/solr/corename/replication/str /lst /requestHandler - solrcore.properties - Master enable.master=true enable.slave=false - solrcore_slave.properties - Slave enable.master=false enable.slave=true We have the default values and separate properties file for Master and slave. Replication is enabled for the solrcore.proerties file. Regards, Jayendra On Mon, Mar 28, 2011 at 2:06 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, i'm having problems when deploying solr in the production machines. I have a master solr, and 3 slaves. The master replicates the schema and the solrconfig for the slaves (this file in the master is named like solrconfig_slave.xml). The solrconfig of the slaves has for example the ${data.dir} and other values in the solrtcore.properties I think that solr isn't recognizing that file, because i get this error: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: No system property or default value specified for enable.master at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) ... MORE STACK TRACE INFO... But here is the thing: org.apache.solr.common.SolrException: No system property or default value specified for enable.master I'm attaching the master schema, the master solr config, the solr config of the slaves and the solrcore.properties. If anyone has any info on this i would be more than appreciated!... Thanks -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: Solrcore.properties
Just for the record, in case anymore is having trouble, the masterUrl should be: http://url:port/solr/replication (don't forget the /replication/ part!) On Tue, Mar 29, 2011 at 9:44 AM, Ezequiel Calderara ezech...@gmail.comwrote: I think that i found the problem: The contents of the solrcore.properties were: #solrcore.properties data.dir=D:\Solr\data\solr\ enable.master=false enable.slave=true masterUrl=http://url:8787/solr/ pollInterval=00:00:60 I found a folder in the D:\ called: SolrDatasolrenable.master=false So i researched a little and tested another little more and i found that i have to escape the data.dir like this: #solrcore.properties data.dir=D:\\Solr\\data\\solr\\ enable.master=false enable.slave=true masterUrl=http://url:8787/solr/ pollInterval=00:00:60 And Problem solved, for now at least :P On Tue, Mar 29, 2011 at 8:37 AM, Ezequiel Calderara ezech...@gmail.comwrote: Hi Jayendra, this is the content of the files: In the Master: + SolrConfig.xml : http://pastebin.com/JhvwMTdd In the Slave: + solrconfig.xml: http://pastebin.com/XPuwAkmW + solrcore.properties: http://pastebin.com/6HZhQG8z I don't know which other files do you need or could be involved in this. I checked the home environment key in the tomcat instance and its ok too. Any light on this would be appreciated! On Mon, Mar 28, 2011 at 6:26 PM, Jayendra Patil jayendra.patil@gmail.com wrote: Can you please attach the other files. It doesn't seem to find the enable.master property, so you may want to check the properties file exists on the box having issues We have the following configuration in the core :- Core - - solrconfig.xml - Master Slave requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilessolrcore_slave.properties:solrcore.properties,solrconfig.xml,schema.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://master_host:port/solr/corename/replication/str /lst /requestHandler - solrcore.properties - Master enable.master=true enable.slave=false - solrcore_slave.properties - Slave enable.master=false enable.slave=true We have the default values and separate properties file for Master and slave. Replication is enabled for the solrcore.proerties file. Regards, Jayendra On Mon, Mar 28, 2011 at 2:06 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, i'm having problems when deploying solr in the production machines. I have a master solr, and 3 slaves. The master replicates the schema and the solrconfig for the slaves (this file in the master is named like solrconfig_slave.xml). The solrconfig of the slaves has for example the ${data.dir} and other values in the solrtcore.properties I think that solr isn't recognizing that file, because i get this error: HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - org.apache.solr.common.SolrException: No system property or default value specified for enable.master at org.apache.solr.common.util.DOMUtil.substituteProperty(DOMUtil.java:311) ... MORE STACK TRACE INFO... But here is the thing: org.apache.solr.common.SolrException: No system property or default value specified for enable.master I'm attaching the master schema, the master solr config, the solr config of the slaves and the solrcore.properties. If anyone has any info on this i would be more than appreciated!... Thanks -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: help with Solr installation within Tomcat7
Where is your solr files (war, conf files) located? How did you instance solr in tomcat? On Tue, Mar 22, 2011 at 7:08 PM, Erick Erickson erickerick...@gmail.comwrote: What error are you receiving? Check your config files for any absolute rather than relative paths would be my first guess... Best Erick On Tue, Mar 22, 2011 at 10:09 AM, ramdev.wud...@thomsonreuters.com wrote: Hi All: I have just started using Solr and have it successfully installed within a Tomcat7 Webapp server. I have also indexed documents using the SolrJ interfaces. The following is my problem: I installed Solr under Tomcat7 folders and setup an xml configuration file to indicate the Solr home variables as detailed on the wiki (for Solr install within TOmcat) The indexes seem to reside within the solr_home folder under the data folder (Solr_home/data/index ) However when I make a zip copy of the the complete install (i.e. tomcat with Solr), and move it to a different machine and unzip/install it, The index seems to be inaccessible. (I did change the solr.xml configuration variables to point to the new location) From what I know, with tomcat installations, it should be as simple as zipping a current working installation and unzipping/installing on a different machine/location. Am I missing something that makes Solr hardcode the path to the index in an install ? Simple plut, I would like to know how to transport an existing install of Solr within TOmcat 7 from one machine to another and still have it working. Ramdev= -- __ Ezequiel. Http://www.ironicnet.com
Dismax problem
Hi, im having a problem while trying to do a dismax search. For example i have the standard query url like this: It returns 1 result. But when i try to use the dismax query type i have the following error: 15/02/2011 10:27:07 org.apache.solr.common.SolrException log GRAVE: java.lang.ArrayIndexOutOfBoundsException: 28 at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.search.function.StringIndexDocValues.init(StringIndexDocValues.java:35) at org.apache.solr.search.function.OrdFieldSource$1.init(OrdFieldSource.java:84) at org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:58) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297) at org.apache.lucene.search.IndexSearcher.searchWithFilter(IndexSearcher.java:268) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:258) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:242) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:243) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:201) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:163) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:108) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:556) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:281) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1568) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) The Solr instance is running as a replication slave. This is the solrconfig.xml: http://pastebin.com/GSv2wBB4 This is the schema.xml: http://pastebin.com/5VpRT5Jj Any help? How can i find what is causing this exception? I thought that the dismax didn't throw exceptions... -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
And the answer there didn't help? Why do not copy the logs of this new error too? Every time you encounter an error, take time to send the log output, and if its needed the schema.xml or the solrconfig.xml Thanks On Tue, Jan 25, 2011 at 6:44 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2327738.html this thread explains my problem - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2327745.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
This may be a dumb question, but Is the xml encoded in UTF-8? On Mon, Jan 24, 2011 at 7:08 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: this is the error that i'm getting.. no idea of what is it.. /apache-solr-1.4.1/example/exampledocs# java -jar post.jar sample.txt SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file sample.txt SimplePostTool: FATAL: Solr returned an error: Severe_errors_in_solr_configuration__Check_your_log_files_for_more_detailed_information_on_what_may_be_wrong__If_you_want_solr_to_continue_after_configuration_errors_changeabortOnConfigurationErrorfalseabortOnConfigurationError__in_null___orgapachesolrhandlerdataimportDataImportHandlerException_Exception_occurred_while_initializing_context__at_orgapachesolrhandlerdataimportDataImporterloadDataConfigDataImporterjava190__at_orgapachesolrhandlerdataimportDataImporterinitDataImporterjava101__at_orgapachesolrhandlerdataimportDataImportHandlerinformDataImportHandlerjava113__at_orgapachesolrcoreSolrResourceLoaderinformSolrResourceLoaderjava508__at_orgapachesolrcoreSolrCoreinitSolrCorejava588__at_orgapachesolrcoreCoreContainer$InitializerinitializeCoreContainerjava137__at_orgapachesolrservletSolrDispatchFilterinitSolrDispatchFilterjava83__at_orgmortbayjettyservletFilterHolderdoStartFilterHolderjava99__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyservletServletHandlerinitializeServletHandlerjava594__at_orgmortbayjettyservletContextstartContextContextjava139__at_orgmortbayjettywebappWebAppContextstartContextWebAppContextjava1218__at_orgmortbayjettyhandlerContextHandlerdoStartContextHandlerjava500__at_orgmortbayjettywebappWebAppContextdoStartWebAppContextjava448__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbayjettyhandlerContextHandlerCollectiondoStartContextHandlerCollectionjava161__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhandlerHandlerCollectiondoStartHandlerCollectionjava147__at_orgmortbaycomponentAbstractLifeCyclestartAbstractLifeCyclejava40__at_orgmortbayjettyhan root@karunya-desktop:/home/karunya/apache-solr-1.4.1/example/exampledocs# - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318585.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
And what the logs says about it? On Mon, Jan 24, 2011 at 7:15 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: actually its a log file i seperately created an handler for that... its not XML - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2318617.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: please help Problem with dataImportHandler
I mean, when you run the DIH, what's the output of the Solr Log ? Probably there is more info about whats happening... On Mon, Jan 24, 2011 at 10:28 AM, Dinesh mdineshkuma...@karunya.edu.inwrote: its a DHCP log.. i want ti index it - DINESHKUMAR . M I am neither especially clever nor especially gifted. I am only very, very curious. -- View this message in context: http://lucene.472066.n3.nabble.com/please-help-Problem-with-dataImportHandler-tp2318585p2319627.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Backup and Recover strategy
Hello, i just finished implementing a master with two slaves (this is a test for now :P). I'm trying to figure out how to do backups and restores without stopping the service or using a passive slave. Right now i'm backing up using the str name=backupAfteroptimize/str, and it creates a snapshot folder for backup. Is there any way to indicate where to backup, or some other options? Is there any other way of doing backups without stopping the service? Thanks all! -- __ Ezequiel. Http://www.ironicnet.com
Master and Slaves
I have setup a Master with two slaves. Let's call the Master Jabba and the slaves Leia and C3PO (very nerdy! lol). Well, i have setup in Jabba the replication, with the following confFiles str name=confFilessolrconfig_slave.xml:solrconfig.xml,schema.xml,stopwords.txt,elevate.xml/str But in the slaves i want to override the dataDir value of the solrconfig.xml, but it get overrided by the one replicated. Is there a way to have the slaves having their solrconfig replicated, but with some special configurations? I want to avoid having to enter to each slave to configure it, i prefer to do it in a centralized way. -- __ Ezequiel. Http://www.ironicnet.com
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
On Tue, Jan 18, 2011 at 6:04 PM, Grant Ingersoll gsing...@apache.orgwrote: Where do you get your Lucene/Solr downloads from? [X] ASF Mirrors (linked in our release announcements or via the Lucene website) [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [X] Other (someone in your company mirrors them internally or via a downstream project) -- __ Ezequiel. Http://www.ironicnet.com
Re: Master and Slaves
Thanks!, thats what i needed! There is always some much to learn about Solr/Lucene! On Fri, Jan 21, 2011 at 10:08 AM, Markus Jelsma markus.jel...@openindex.iowrote: solrcore.properties -- __ Ezequiel. Http://www.ironicnet.com
Re: Master and Slaves
Somehow it's not working :( i have set it up like: #solrcore.properties data.dir=D:\Solr\PAU\data But it keeps going to the dataDir configured in the solrconfig.xml. Also, when i go to the replication admin i see this: *Master* http://10.11.33.180:8787/solr/replication *Poll Interval* 00:00:60 *Local Index* Index Version: 1295466104884, Generation: 4 Location: C:\Program Files\Apache Software Foundation\Tomcat 7.0\data\index Size: 6,99 KB Times Replicated Since Startup: 50 Previous Replication Done At: Fri Jan 21 11:36:19 ART 2011 *Config Files Replicated At: null * ** *Config Files Replicated: null * ** *Times Config Files Replicated Since Startup: null* Next Replication Cycle At: Fri Jan 21 11:37:19 ART 2011 And i know that the files were replicated ok. i see the solrconfig backup with name solrconfig.xml.20110120030345, and the datadir changed also... So i don't understand why isn't figuring as replicated. Maybe i'm doing something wrong. Don't know On Fri, Jan 21, 2011 at 10:16 AM, Ezequiel Calderara ezech...@gmail.comwrote: Thanks!, thats what i needed! There is always some much to learn about Solr/Lucene! On Fri, Jan 21, 2011 at 10:08 AM, Markus Jelsma markus.jel...@openindex.io wrote: solrcore.properties -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Master and Slaves
Ohh i see... i was setting a default value in the solrconfig_slave like this: dataDir${solr.data.dir:.\data}/dataDir i will try the ${data.dir} 2011/1/21 Tomás Fernández Löbbe tomasflo...@gmail.com Did you modify the solrconfig file with: dataDir${data.dir}/dataDir ?? On Fri, Jan 21, 2011 at 11:38 AM, Ezequiel Calderara ezech...@gmail.com wrote: Somehow it's not working :( i have set it up like: #solrcore.properties data.dir=D:\Solr\PAU\data But it keeps going to the dataDir configured in the solrconfig.xml. Also, when i go to the replication admin i see this: *Master* http://10.11.33.180:8787/solr/replication *Poll Interval* 00:00:60 *Local Index* Index Version: 1295466104884, Generation: 4 Location: C:\Program Files\Apache Software Foundation\Tomcat 7.0\data\index Size: 6,99 KB Times Replicated Since Startup: 50 Previous Replication Done At: Fri Jan 21 11:36:19 ART 2011 *Config Files Replicated At: null * ** *Config Files Replicated: null * ** *Times Config Files Replicated Since Startup: null* Next Replication Cycle At: Fri Jan 21 11:37:19 ART 2011 And i know that the files were replicated ok. i see the solrconfig backup with name solrconfig.xml.20110120030345, and the datadir changed also... So i don't understand why isn't figuring as replicated. Maybe i'm doing something wrong. Don't know On Fri, Jan 21, 2011 at 10:16 AM, Ezequiel Calderara ezech...@gmail.com wrote: Thanks!, thats what i needed! There is always some much to learn about Solr/Lucene! On Fri, Jan 21, 2011 at 10:08 AM, Markus Jelsma markus.jel...@openindex.io wrote: solrcore.properties -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Master and Slaves
It worked! :) On Fri, Jan 21, 2011 at 12:02 PM, Ezequiel Calderara ezech...@gmail.comwrote: Ohh i see... i was setting a default value in the solrconfig_slave like this: dataDir${solr.data.dir:.\data}/dataDir i will try the ${data.dir} 2011/1/21 Tomás Fernández Löbbe tomasflo...@gmail.com Did you modify the solrconfig file with: dataDir${data.dir}/dataDir ?? On Fri, Jan 21, 2011 at 11:38 AM, Ezequiel Calderara ezech...@gmail.com wrote: Somehow it's not working :( i have set it up like: #solrcore.properties data.dir=D:\Solr\PAU\data But it keeps going to the dataDir configured in the solrconfig.xml. Also, when i go to the replication admin i see this: *Master* http://10.11.33.180:8787/solr/replication *Poll Interval* 00:00:60 *Local Index* Index Version: 1295466104884, Generation: 4 Location: C:\Program Files\Apache Software Foundation\Tomcat 7.0\data\index Size: 6,99 KB Times Replicated Since Startup: 50 Previous Replication Done At: Fri Jan 21 11:36:19 ART 2011 *Config Files Replicated At: null * ** *Config Files Replicated: null * ** *Times Config Files Replicated Since Startup: null* Next Replication Cycle At: Fri Jan 21 11:37:19 ART 2011 And i know that the files were replicated ok. i see the solrconfig backup with name solrconfig.xml.20110120030345, and the datadir changed also... So i don't understand why isn't figuring as replicated. Maybe i'm doing something wrong. Don't know On Fri, Jan 21, 2011 at 10:16 AM, Ezequiel Calderara ezech...@gmail.com wrote: Thanks!, thats what i needed! There is always some much to learn about Solr/Lucene! On Fri, Jan 21, 2011 at 10:08 AM, Markus Jelsma markus.jel...@openindex.io wrote: solrcore.properties -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Can I host TWO separate datasets in Solr?
you can configure it as two different instances in a tomcat server or keep running two jetty apps... :P On Fri, Jan 21, 2011 at 8:51 PM, Igor Chudov ichu...@gmail.com wrote: I would like to have two sets of data and search them separately (they are used for two different websites). How can I do it? Thanks! -- __ Ezequiel. Http://www.ironicnet.com
Re: Query Problem
Hi Erick, you were right. I'm looking the source of the search result (instead of the render of internet explorer :$) and i see this: str name=SectionNameProgramas_Home /str So i think that is the problem is in the SSIS process that retrieves data from the DB and sends it to solr. The data type in the db is VARCHAR(100)... but i'm sure that somewhere is mapping it to CHAR(100) so it's length its always 100. Thank you very much, i will keep you informed Thanks On Thu, Dec 16, 2010 at 9:38 PM, Erick Erickson erickerick...@gmail.comwrote: OK, it works perfectly for me on a 1.4.1 instance. I've looked over your files a couple of times and see nothing obvious (but you'll never find anyone better at overlooking the obvious than me!). Tokenizing and stemming are irrelevant in this case because your type is string, which is an untokenizedtype so you don't need to go there. The way your query parses and analyzes backs this up, so you're getting to the right schema definition. Which may bring us to whether what's in the index is what you *think* is in there. I'm betting not. Either you changed the schema and didn't re-index (say changed index=false to index=true), you didn't commit the documents after indexing or other such-like, or changed the field type and didn't reindex. So go into /solr/admin. Click on schema browser, click on fields. Along the left you should see SectionName, click on that. That will show you the #indexed# terms, and you should see, exactly, Programas_Home in there, just like in your returned documents. Let us know if that's in fact what you do see. It's possible you're being mislead by the difference between seeing the value in a returned document (the stored value) and what's searched on (the indexed token(s)). And I'm assuming that some asterisks in your mails were really there for bolding and you are NOT doing wildcard searches for, for instance, *SectionName:Programas_Home*. But we're at a point where my 1.4.1 instance produces the results you're expecting, at least as I understand them so I don't think it's a problem with Solr, but some change you've made is producing results you don't expect but are correct. Like I said, look at the indexed terms. If you see Programas_Home in the admin console after following the steps above, then I don't know what to suggest Best Erick On Thu, Dec 16, 2010 at 5:12 PM, Ezequiel Calderara ezech...@gmail.com wrote: The jars are named like *1.4.1* . So i suppose its the version 1.4.1 Thanks! On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson erickerick...@gmail.com wrote: OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara ezech...@gmail.com wrote: I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problems with that... Also in the debug you can see that the parsed query is correct... So i don't know where to look... I know nothing about Stemming or tokenizing, but i will look if that has anything to do. If anyone can help me out, please do :D On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson erickerick...@gmail.com wrote: Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a string type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned values are the verbatim, stored field NOT what's tokenized, and the tokenized data is what's searched.. That said, you SectionName should not be tokenized at all because it's a string type. Take a look at the admin page, schema browser and see what values for SectionName look (these will be the tokenized values. They should be exactly Programas_Name, complete with underscore, case changes, etc. Is that the case? Another place that might help is the admin/analysis page. Check the debug boxes and input your steps and it'll show you what the transformations are applied. But a quick look leaves me completely baffled. Sorry I can't be more help Erick On Thu, Dec 16, 2010 at 2:07 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, I have the following problems. I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) http
Re: Query Problem
Well... finally... isn't solr problem. Isn't solr config problem. Is Microsoft's problem: http://flyingtriangles.blogspot.com/2006/08/workaround-to-ssis-strings-are-not.html Thank you very much erick!! you really helped on the solution of this! On Fri, Dec 17, 2010 at 10:52 AM, Erick Erickson erickerick...@gmail.comwrote: Right, I *love* problems like this... NOT You might get some joy out of using TrimFilterFactory along with KeywordAnalyzer, something like this: fieldType name=trimField class=solr.TextField your options here analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType but it depends upon what your fields are padded with Best Erick On Fri, Dec 17, 2010 at 8:12 AM, Ezequiel Calderara ezech...@gmail.com wrote: Hi Erick, you were right. I'm looking the source of the search result (instead of the render of internet explorer :$) and i see this: str name=SectionNameProgramas_Home /str So i think that is the problem is in the SSIS process that retrieves data from the DB and sends it to solr. The data type in the db is VARCHAR(100)... but i'm sure that somewhere is mapping it to CHAR(100) so it's length its always 100. Thank you very much, i will keep you informed Thanks On Thu, Dec 16, 2010 at 9:38 PM, Erick Erickson erickerick...@gmail.com wrote: OK, it works perfectly for me on a 1.4.1 instance. I've looked over your files a couple of times and see nothing obvious (but you'll never find anyone better at overlooking the obvious than me!). Tokenizing and stemming are irrelevant in this case because your type is string, which is an untokenizedtype so you don't need to go there. The way your query parses and analyzes backs this up, so you're getting to the right schema definition. Which may bring us to whether what's in the index is what you *think* is in there. I'm betting not. Either you changed the schema and didn't re-index (say changed index=false to index=true), you didn't commit the documents after indexing or other such-like, or changed the field type and didn't reindex. So go into /solr/admin. Click on schema browser, click on fields. Along the left you should see SectionName, click on that. That will show you the #indexed# terms, and you should see, exactly, Programas_Home in there, just like in your returned documents. Let us know if that's in fact what you do see. It's possible you're being mislead by the difference between seeing the value in a returned document (the stored value) and what's searched on (the indexed token(s)). And I'm assuming that some asterisks in your mails were really there for bolding and you are NOT doing wildcard searches for, for instance, *SectionName:Programas_Home*. But we're at a point where my 1.4.1 instance produces the results you're expecting, at least as I understand them so I don't think it's a problem with Solr, but some change you've made is producing results you don't expect but are correct. Like I said, look at the indexed terms. If you see Programas_Home in the admin console after following the steps above, then I don't know what to suggest Best Erick On Thu, Dec 16, 2010 at 5:12 PM, Ezequiel Calderara ezech...@gmail.com wrote: The jars are named like *1.4.1* . So i suppose its the version 1.4.1 Thanks! On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson erickerick...@gmail.com wrote: OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara ezech...@gmail.com wrote: I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problems with that... Also in the debug you can see that the parsed query is correct... So i don't know where to look... I know nothing about Stemming or tokenizing, but i will look if that has anything to do. If anyone can help me out, please do :D On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson erickerick...@gmail.com wrote: Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a string type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned
Query Problem
Hi all, I have the following problems. I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) http://pastebin.com/wnPdHqBm If i do a search for: *Programas_Home* i have only 1 result: Result Returned (Pastebin) http://pastebin.com/fMZkLvYK if i do a search for: SectionName:Programa* i have 1 result: Result Returned (Pastebin) http://pastebin.com/kLLnVp4b This is my *schema* http://pastebin.com/PQM8uap4 (Pastebin) and this is my *solrconfig* http://%3c/?xml version=1.0 encoding=UTF-8 ?(PasteBin) I don't understand why when searching for SectionName:Programas_Home isn't returning any results at all... Can someone send some light on this? -- __ Ezequiel. Http://www.ironicnet.com
Re: Query Problem
I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problems with that... Also in the debug you can see that the parsed query is correct... So i don't know where to look... I know nothing about Stemming or tokenizing, but i will look if that has anything to do. If anyone can help me out, please do :D On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson erickerick...@gmail.comwrote: Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a string type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned values are the verbatim, stored field NOT what's tokenized, and the tokenized data is what's searched.. That said, you SectionName should not be tokenized at all because it's a string type. Take a look at the admin page, schema browser and see what values for SectionName look (these will be the tokenized values. They should be exactly Programas_Name, complete with underscore, case changes, etc. Is that the case? Another place that might help is the admin/analysis page. Check the debug boxes and input your steps and it'll show you what the transformations are applied. But a quick look leaves me completely baffled. Sorry I can't be more help Erick On Thu, Dec 16, 2010 at 2:07 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, I have the following problems. I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) http://pastebin.com/wnPdHqBm If i do a search for: *Programas_Home* i have only 1 result: Result Returned (Pastebin) http://pastebin.com/fMZkLvYK if i do a search for: SectionName:Programa* i have 1 result: Result Returned (Pastebin) http://pastebin.com/kLLnVp4b This is my *schema* http://pastebin.com/PQM8uap4 (Pastebin) and this is my *solrconfig* http://%3c/?xml version=1.0 encoding=UTF-8 ?(PasteBin) I don't understand why when searching for SectionName:Programas_Home isn't returning any results at all... Can someone send some light on this? -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Query Problem
The jars are named like *1.4.1* . So i suppose its the version 1.4.1 Thanks! On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson erickerick...@gmail.comwrote: OK, what version of Solr are you using? I can take a quick check to see what behavior I get Erick On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara ezech...@gmail.com wrote: I'll check the Tokenizer to see if that's the problem. The results of Analysis Page for SectionName:Programas_Home Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} term position 1 term text Programas_Home term type word source start,end 0,14 payload So it's not having problems with that... Also in the debug you can see that the parsed query is correct... So i don't know where to look... I know nothing about Stemming or tokenizing, but i will look if that has anything to do. If anyone can help me out, please do :D On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson erickerick...@gmail.com wrote: Ezequiel: Nice job of including relevant details, by the way. Unfortunately I'm puzzled too. Your SectionName is a string type, so it should be placed in the index as-is. Be a bit cautious about looking at returned results (as I see in one of your xml files) because the returned values are the verbatim, stored field NOT what's tokenized, and the tokenized data is what's searched.. That said, you SectionName should not be tokenized at all because it's a string type. Take a look at the admin page, schema browser and see what values for SectionName look (these will be the tokenized values. They should be exactly Programas_Name, complete with underscore, case changes, etc. Is that the case? Another place that might help is the admin/analysis page. Check the debug boxes and input your steps and it'll show you what the transformations are applied. But a quick look leaves me completely baffled. Sorry I can't be more help Erick On Thu, Dec 16, 2010 at 2:07 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all, I have the following problems. I have this set of data (View data (Pastebin) http://pastebin.com/jKbUhjVS ) If i do a search for: *SectionName:Programas_Home* i have no results: Returned Data (PasteBin) http://pastebin.com/wnPdHqBm If i do a search for: *Programas_Home* i have only 1 result: Result Returned (Pastebin) http://pastebin.com/fMZkLvYK if i do a search for: SectionName:Programa* i have 1 result: Result Returned (Pastebin) http://pastebin.com/kLLnVp4b This is my *schema* http://pastebin.com/PQM8uap4 (Pastebin) and this is my *solrconfig* http://%3c/?xml version=1.0 encoding=UTF-8 ?(PasteBin) I don't understand why when searching for SectionName:Programas_Home isn't returning any results at all... Can someone send some light on this? -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Best practice for emailing this list?
Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com
Re: Best practice for emailing this list?
Tried to forward the mail of robomon but had the same error: Delivery to the following recipient failed permanently: solr-user@lucene.apache.org Technical details of permanent failure: Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 552 552 spam score (5.8) exceeded threshold (state 18). - Original message - On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote: Mmmm maybe its your mail address? :P Weird, i didn't have any problem with it using gmail... Send in plain text, avoid links or links... maybe that could work... If you want, send me the mail and i will forward it to the list, just to test! On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote: No matter how much I limit my other email it will not get through the Solr mailing spam filter. This has to be the most frustrating mailing list I have ever tried to work with. All I need are some answers on replication and load balancing but I can't even get it to the list. On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote: On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote: How do people email this list without getting spam filter problems? Depends on which side of the spam filter that you're referring to. I've found that to keep these emails from entering my spam filter is to add a rule to Gmail that says Never send to spam. As for when I send emails, I make sure that I send my emails as plain text to avoid getting bounce backs. - Ken -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Updating Solr index - DIH delta vs. task queues
I'm in the same scenario, so this answer would be helpful too.. I'm adding... 3) Web Service - Request a webservice for all the new data that has been updated (can this be done? On Thu, Nov 4, 2010 at 2:38 PM, Andy angelf...@yahoo.com wrote: Hi, I have data stored in a database that is being updated constantly. I need to find a way to update Solr index as data in the database is being updated. There seems to be 2 main schools of thoughts on this: 1) DIH delta - query the database for all records that have a timestamp later than the last_index_time. Import those records for indexing to Solr 2) Task queue - every time a record is updated in the database, throw a task to a queue to index that record to Solr Just want to know what are the pros and cons of each approach and what is your experience. For someone starting new, what'd be your recommendation? ThanksAndy -- __ Ezequiel. Http://www.ironicnet.com
Re: Custom Sorting in Solr
Ok i imagined that the double linked list would be far too complicated for solr. Now, how can i achieve that solr connects to a webservice and do the import? I'm sorry if i'm not clear, sometimes my english gets fuzzy :P On Fri, Oct 29, 2010 at 4:51 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Oct 29, 2010 at 3:39 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi all guys! I'm in a weird situation here. We have index a set of documents which are ordered using a linked list (each documents has the reference of the previous and the next). Is there a way when sorting in the solr search, Use the linked list to sort? It seems like you should be able to encode this linked list as an integer instead, and sort by that? If there are multiple linked lists in the index, it seems like you could even use the high bits of the int to designate which list the doc belongs to, and the low order bits as the order in that list. -Yonik http://www.lucidimagination.com -- __ Ezequiel. Http://www.ironicnet.com
Custom Sorting in Solr
Hi all guys! I'm in a weird situation here. We have index a set of documents which are ordered using a linked list (each documents has the reference of the previous and the next). Is there a way when sorting in the solr search, Use the linked list to sort? If that is not possible, how can i use the DIH to access a Service in WCF or a Webservice? Should i develop my own DIH? -- __ Ezequiel. Http://www.ironicnet.com
Re: How to delete a SOLR document if that particular data doesnt exist in DB?
Can't you in each delete of that data, save the ids in other table? And then process those ids against solr to delete them? On Wed, Oct 20, 2010 at 11:51 AM, bbarani bbar...@gmail.com wrote: Hi, I have a very common question but couldnt find any post related to my question in this forum, I am currently initiating a full import each week but the data that have been deleted in the source is not update in my document as I am using clean=false. We are indexing multiple data by data types hence cant delete the index and do a complete re-indexing each week also we want to delete the orphan solr documents (for which the data is not present in back end DB) on a daily basis. Now my question is.. Is there a way I can use preImportDeleteQuery to delete the documents from SOLR for which the data doesnt exist in back end db? I dont have anything called delete status in DB, instead I need to get all the UID's from SOLR document and compare it with all the UID's in back end and delete the data from SOLR document for the UID's which is not present in DB. Any suggestion / ideas would be of great help. Note: Currently I have developed a simple program which will fetch the UID's from SOLR document and then connect to backend DB to check the orphan UID's and delete the documents from SOLR index corresponding to orphan UID's. I just dont want to re-invent the wheel if this feature is already present in SOLR as I need to do more testing in terms of performance / scalability for my program.. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html Sent from the Solr - User mailing list archive at Nabble.com. -- __ Ezequiel. Http://www.ironicnet.com
Re: How to delete a SOLR document if that particular data doesnt exist in DB?
Also you can set an expiration policy maybe, and delete files that expire after some time and aren't older than other... but i don't know if you can iterate over the existing ids... On Wed, Oct 20, 2010 at 1:34 PM, Shawn Heisey s...@elyograg.org wrote: On 10/20/2010 9:59 AM, bbarani wrote: We actually use virtual DB modelling tool to fetch the data from various sources during run time hence we dont have any control over the source. We consolidate the data from more than one source and index the consolidated data using SOLR. We dont have any kind of update / access rights to source data. It seems likely that those who are in control of the data sources would be maintaining some kind of delete log, and that they should be able to make those logs available to you. For my index, the data comes from a MySQL database. When a delete is done at the database level, a database trigger records the old information to a main delete log table, as well as a separate table for the search system. The build system uses that separate table to run deletes every ten minutes and keeps it trimmed to 48 hours of delete history. -- __ Ezequiel. Http://www.ironicnet.com
Re: query results file for trec_eval
I don't know anything about the TREC format document, but i think if you want text output, you can do it by using the http://wiki.apache.org/solr/XsltResponseWriter to transform the xml to a text... On Tue, Oct 19, 2010 at 12:29 PM, Valli Indraganti valli.indraga...@gmail.com wrote: Hello! I am a student and I am trying to run evaluation for TREC format document. I have the judgments. I would like to have the output of my queries for use with trec_eval software. Can someone please point me how to make Solr spit out output in this format? Or at least point me to some material that guides me through this. Thanks, Valli -- __ Ezequiel. Http://www.ironicnet.com
Commits on service after shutdown
Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/
Re: Commits on service after shutdown
I understand, but i want to have control of what is commit or not. In our scenario, we want to add documents to the index, and maybe after an hour trigger the commit. If in the middle, we have a server shutdown or any process sending a Shutdown signal to the process. I don't want those documents being commited. Should i file a bug issue or an enhacement issue?. Thanks On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpo israele...@gmail.com wrote: The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can use the auto commit parameter when sending your updates so that the changes are saved right away, thought this could slow down the indexing speed considerably but I do not believe there are parameters to keep those un-commited documents alive after a kill. On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderara ezech...@gmail.com wrote: Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Commits on service after shutdown
But if something happens in between that hour, i will have lost or committed the documents to the index out of the schedule. How can i handle this scenario? I think that Solr (or Lucene) should make sure of the durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of the data even if its in an uncommited state. On Mon, Oct 18, 2010 at 4:53 PM, Matthew Hall mh...@informatics.jax.orgwrote: No.. you would just turn autocommit off, and have the thread that is doing updates to your indexes commit every hour. I'd think that this would take care of the scenario that you are describing. Matt On 10/18/2010 3:50 PM, Ezequiel Calderara wrote: I understand, but i want to have control of what is commit or not. In our scenario, we want to add documents to the index, and maybe after an hour trigger the commit. If in the middle, we have a server shutdown or any process sending a Shutdown signal to the process. I don't want those documents being commited. Should i file a bug issue or an enhacement issue?. Thanks On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpoisraele...@gmail.com wrote: The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can use the auto commit parameter when sending your updates so that the changes are saved right away, thought this could slow down the indexing speed considerably but I do not believe there are parameters to keep those un-commited documents alive after a kill. On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderaraezech...@gmail.com wrote: Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ http://www.ironicnet.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Commits on service after shutdown
I'll see if i can resolve this adding an extra core with the same schema for holding this documents. So, Core0 will act as a Queue and the Core1 will be the real index. And the commit in the core0 will trigger an add to the core1 and its commit. That way i can be sure of not losing data. It surprises me that solr doesn't have this feature built-in. I still have to verify the perfomance, but looks good to me. Anyway, any help would be appreciated. On Mon, Oct 18, 2010 at 5:05 PM, Ezequiel Calderara ezech...@gmail.comwrote: But if something happens in between that hour, i will have lost or committed the documents to the index out of the schedule. How can i handle this scenario? I think that Solr (or Lucene) should make sure of the durabilityhttp://en.wikipedia.org/wiki/Durability_(database_systems)of the data even if its in an uncommited state. On Mon, Oct 18, 2010 at 4:53 PM, Matthew Hall mh...@informatics.jax.org wrote: No.. you would just turn autocommit off, and have the thread that is doing updates to your indexes commit every hour. I'd think that this would take care of the scenario that you are describing. Matt On 10/18/2010 3:50 PM, Ezequiel Calderara wrote: I understand, but i want to have control of what is commit or not. In our scenario, we want to add documents to the index, and maybe after an hour trigger the commit. If in the middle, we have a server shutdown or any process sending a Shutdown signal to the process. I don't want those documents being commited. Should i file a bug issue or an enhacement issue?. Thanks On Mon, Oct 18, 2010 at 3:54 PM, Israel Ekpoisraele...@gmail.com wrote: The documents should be implicitly committed when the Lucene index is closed. When you perform a graceful shutdown, the Lucene index gets closed and the documents get committed implicitly. When the shutdown is abrupt as in a KILL -9, then this does not happen and the updates are lost. You can use the auto commit parameter when sending your updates so that the changes are saved right away, thought this could slow down the indexing speed considerably but I do not believe there are parameters to keep those un-commited documents alive after a kill. On Mon, Oct 18, 2010 at 2:46 PM, Ezequiel Calderaraezech...@gmail.com wrote: Hi, i'm new in the mailing list. I'm implementing Solr in my actual job, and i'm having some problems. I was testing the consistency of the commits. I found for example that if we add X documents to the index (without commiting) and then we restart the service, the documents are commited. They show up in the results. This is interpreted to me like an error. But when we add X documents to the index (without commiting) and then we kill the process and we start it again, the documents doesn't appear. This behaviour is the one i want. Is there any param to avoid the auto-committing of documents after a shutdown? Is there any param to keep those un-commited documents alive after a kill? Thanks! -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ http://www.ironicnet.com/ http://www.ironicnet.com/ -- °O° Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- __ Ezequiel. Http://www.ironicnet.com http://www.ironicnet.com/ -- __ Ezequiel. Http://www.ironicnet.com
Re: Spell checking question from a Solr novice
You can cross the new words against a dictionary and keep them in the file as Jason described... What Pradeep said is true, is always better to have suggestions related to your index that have suggestions with no results... On Mon, Oct 18, 2010 at 6:24 PM, Jason Blackerby jblacke...@gmail.comwrote: If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so: fieldType name=textSpell class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=misspelled_words.txt/ filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) replacement= replace=all/ filter class=solr.LengthFilterFactory min=2 max=50/ /analyzer /fieldType where misspelled_words.txt contains the misspellings. On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh pksing...@gmail.com wrote: I think a spellchecker based on your index has clear advantages. You can spellcheck words specific to your domain which may not be available in an outside dictionary. You can always dump the list from wordnet to get a starter english dictionary. But then it also means that misspelled words from your domain become the suggested correct word. Hmmm ... you'll need to have a way to prune out such words. Even then, your own domain based dictionary is a total go. On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In general, the benefit of the built-in Solr spellcheck is that it can use a dictionary based on your actual index. If you want to use some external API, you certainly can, in your actual client app -- but it doesn't really need to involve Solr at all anymore, does it? Is there any benefit I'm not thinking of to doing that on the solr side, instead of just in your client app? I think Yahoo (and maybe Microsoft?) have similar APIs with more generous ToSs, but I haven't looked in a while. Xin Li wrote: Oops, never mind. Just read Google API policy. 1000 queries per day limit for non-commercial use only. -Original Message- From: Xin Li Sent: Monday, October 18, 2010 3:43 PM To: solr-user@lucene.apache.org Subject: Spell checking question from a Solr novice Hi, I am looking for a quick solution to improve a search engine's spell checking performance. I was wondering if anyone tried to integrate Google SpellCheck API with Solr search engine (if possible). Google spellcheck came to my mind because of two reasons. First, it is costly to clean up the data to be used as spell check baseline. Secondly, google probably has the most complete set of misspelled search terms. That's why I would like to know if it is a feasible way to go. Thanks, Xin This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. This electronic mail message contains information that (a) is or may be CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM DISCLOSURE, and (b) is intended only for the use of the addressee(s) named herein. If you are not an intended recipient, please contact the sender immediately and take the steps necessary to delete the message completely from your computer system. Not Intended as a Substitute for a Writing: Notwithstanding the Uniform Electronic Transaction Act or any other law of similar effect, absent an express statement to the contrary, this e-mail message, its contents, and any attachments hereto are not intended to represent an offer or acceptance to enter into a contract and are not otherwise intended to bind this sender, barnesandnoble.com llc, barnesandnoble.com inc. or any other person or entity. -- __ Ezequiel. Http://www.ironicnet.com
Re: I need to indexing the first character of a field in another field
How are you declaring the transformer in the dataconfig? On Mon, Oct 18, 2010 at 6:31 PM, Renato Wesenauer renato.wesena...@gmail.com wrote: Hello guys, I need to indexing the first character of the field autor in another field inicialautor. Example: autor = Mark Webber inicialautor = M I did a javascript function in the dataimport, but the field inicialautor indexing empty. The function: function InicialAutor(linha) { var aut = linha.get(autor); if (aut != null) { if (aut.length 0) { var ch = aut.charAt(0); linha.put(inicialautor, ch); } else { linha.put(inicialautor, ''); } } else { linha.put(inicialautor, ''); } return linha; } What's wrong? Thank's, Renato Wesenauer -- __ Ezequiel. Http://www.ironicnet.com
Re: Admin for spellchecker?
i was thinking about, you also would need to mark a word like valid, so it doesn't mark it as wrong. On Mon, Oct 18, 2010 at 6:37 PM, Pradeep Singh pksing...@gmail.com wrote: Do we need an admin screen for spellchecker? Where you can browse the words and delete the ones you don't like so that they don't get suggested? -- __ Ezequiel. Http://www.ironicnet.com