solr facet fields doesn't honor fq
Hi all, I have a question related to solr 3.5 on field facet. Here is my query: http://localhost:8081/solr_new/select?tie=0.1q.alt=*:*q=bankqf=nameaddressfq= *portal_uuid:+A4E7890F-A188-4663-89EB-176D94DF6774*defType=dismax* facet=true*facet.field=*location_uuid*facet.field=*sub_category_uuids* What I get back with field facet are: 1. Some location_uuids which is in the current portal_uuid (has facet count 0) 2. Some location_uuids are not in the current portal_uuid at all (has facet count = 0) It seems that solr doesn't honor the fq at all when returning field facet. I need to add one more parameter facet.mincount=1 in order to not return location_uuids facet (2). I think, solr does faceting on all location_uuid. It should does that scoping to current portal_uuid. Any idea? -- Chhorn Chamnap http://chamnap.github.com/
Re: Solr Faceting
You could add this filter directly in the solr query. Here is an example using SolrJ: SolrQuery solrQuery = new SolrQuery(); solrQuery.set(q, *:*); solrQuery.addFilterQuery(-myfield:N/A); Christian von Wendt-Jensen On 07/01/2012 1:32 PM, Darren Govoni dar...@ontrenet.com wrote: I don't think it comes at any added cost for solr to return that facet so you can filter it out in your business logic. On Sat, 2012-07-07 at 15:18 +0530, Shanu Jha wrote: Hi, I am generating facet for a field which has one of the value NA and I want solr should not create facet(or ignore) for this(NA) value. Is there any way to in solr to do that. Thanks
Re: Multi-thread UpdateProcessor
some benchmark added. pls check jira On Fri, Jul 6, 2012 at 11:13 PM, Dmitry Kan dmitry@gmail.com wrote: Mikhail, you have my +1 and a jira comment :) // Dmitry On Fri, Jul 6, 2012 at 7:41 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Okay, why do you think this idea is not worth to look at? On Fri, Jul 6, 2012 at 12:53 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Most times when single thread streaming http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update is used I saw lack of cpu utilization at Solr server. Resonable motivation is utilize more threads to index faster, but it requires more complicated client side. I propose to employ special update processor which can fork the stream processing onto many threads. If you like it pls vote for https://issues.apache.org/jira/browse/SOLR-3585 . Regards -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Regards, Dmitry Kan -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: MoreLikeThis and mlt.count
Hi Bruno I'm not sure if that makes sense for a query which does not have a boolean element to it. What is your use-case On 7 July 2012 18:36, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, I have a field name fid defined as: field name=fid type=string indexed=true stored=true required=true termVectors=true/ This fid can have a value like: a0001 b57855 3254 etc... (length 20 digits) I would like to get *all* docs that result returns. Actually by default mlt.count is set to 5 but I don't want to set it to 200 in my url to be sure to get all results in the same xml. Is there a way to set mlt.count to get always *all* mlt documents ? I read http://wiki.apache.org/solr/**MoreLikeThishttp://wiki.apache.org/solr/MoreLikeThiswithout find a solution Sincerely, Bruno Solr 3.6 Ubuntu
Re: Getting only one result by family?
Hi Bruno, As described See http://wiki.apache.org/solr/FieldCollapsing but also faceting as this often fits the bill On 7 July 2012 22:27, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, I have a field named FID for Family-ID: field name=fid type=string indexed=true stored=true required=true termVectors=true/ My uniqueKey is the field PN and I have several others fields (text-en, string, general text, etc...). When I do a request on my index, like: title:airplane I get several docs but some docs are from the same family members (FID are equals) Example: Doc1 fid=A0123 Doc2 fid=B777 Doc3 fid=C008 ... Doc175 = same family Doc1 fid=A0123 ... Is it possible to get only docs with FID differents? I don't want to see Doc175 on my XML result. By this way if I set rows=20 I will have 20 docs from 20 different families. Thanks for your help, Bruno Solr3.6 Ubuntu
Re: MoreLikeThis and mlt.count
Hi, My docs are patents. Patents have family members and I would like to get docs by PN (field Patent Number (uniquekey)). My request will be ?q=pn:EP100A1mlt=true. with this method I will get all equivalents (family members of EP100A1) If set automaticaly mlt.count to MAX is not possible, so I will set to 500 Le 08/07/2012 11:17, Lee Carroll a écrit : Hi Bruno I'm not sure if that makes sense for a query which does not have a boolean element to it. What is your use-case On 7 July 2012 18:36, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, I have a field name fid defined as: field name=fid type=string indexed=true stored=true required=true termVectors=true/ This fid can have a value like: a0001 b57855 3254 etc... (length 20 digits) I would like to get *all* docs that result returns. Actually by default mlt.count is set to 5 but I don't want to set it to 200 in my url to be sure to get all results in the same xml. Is there a way to set mlt.count to get always *all* mlt documents ? I read http://wiki.apache.org/solr/**MoreLikeThishttp://wiki.apache.org/solr/MoreLikeThiswithout find a solution Sincerely, Bruno Solr 3.6 Ubuntu
Re: Getting only one result by family?
Hi Lee, I tried group to my FID field and outch error 500 + outofmemory... I don't yet tested facets Thanks, Bruno Le 08/07/2012 11:19, Lee Carroll a écrit : Hi Bruno, As described See http://wiki.apache.org/solr/FieldCollapsing but also faceting as this often fits the bill On 7 July 2012 22:27, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, I have a field named FID for Family-ID: field name=fid type=string indexed=true stored=true required=true termVectors=true/ My uniqueKey is the field PN and I have several others fields (text-en, string, general text, etc...). When I do a request on my index, like: title:airplane I get several docs but some docs are from the same family members (FID are equals) Example: Doc1 fid=A0123 Doc2 fid=B777 Doc3 fid=C008 ... Doc175 = same family Doc1 fid=A0123 ... Is it possible to get only docs with FID differents? I don't want to see Doc175 on my XML result. By this way if I set rows=20 I will have 20 docs from 20 different families. Thanks for your help, Bruno Solr3.6 Ubuntu
Re: Getting only one result by family?
see http://wiki.apache.org/solr/SolrPerformanceFactors#OutOfMemoryErrors On 8 July 2012 12:37, Bruno Mannina bmann...@free.fr wrote: Hi Lee, I tried group to my FID field and outch error 500 + outofmemory... I don't yet tested facets Thanks, Bruno Le 08/07/2012 11:19, Lee Carroll a écrit : Hi Bruno, As described See http://wiki.apache.org/solr/**FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsingbut also faceting as this often fits the bill On 7 July 2012 22:27, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, I have a field named FID for Family-ID: field name=fid type=string indexed=true stored=true required=true termVectors=true/ My uniqueKey is the field PN and I have several others fields (text-en, string, general text, etc...). When I do a request on my index, like: title:airplane I get several docs but some docs are from the same family members (FID are equals) Example: Doc1 fid=A0123 Doc2 fid=B777 Doc3 fid=C008 ... Doc175 = same family Doc1 fid=A0123 ... Is it possible to get only docs with FID differents? I don't want to see Doc175 on my XML result. By this way if I set rows=20 I will have 20 docs from 20 different families. Thanks for your help, Bruno Solr3.6 Ubuntu
Re: solr facet fields doesn't honor fq
Solr faceting only counts documents that satisfy the query. Think of it as assembling a list of all possible values for a field and then adding 1 for each value found in each document that satisfies the overall query (including the filter query). So you can get counts of 0, that's expected. Adding mincount=1 will keep these from being returned. I suspect that your query is not finding the documents you think it is or your filter query is not parsed as you expect. If you add debugQuery=on you'll see the parsed form of both. In particular, look for your complex fq to be broken up and distributed with some parts against your portal_uuid and some against the default search field. In particular, '+' and '-' are operators and the top-level parsers may be splitting these up. Quoting or parenthesizing may help. Best Erick On Sun, Jul 8, 2012 at 2:32 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote: Hi all, I have a question related to solr 3.5 on field facet. Here is my query: http://localhost:8081/solr_new/select?tie=0.1q.alt=*:*q=bankqf=nameaddressfq= *portal_uuid:+A4E7890F-A188-4663-89EB-176D94DF6774*defType=dismax* facet=true*facet.field=*location_uuid*facet.field=*sub_category_uuids* What I get back with field facet are: 1. Some location_uuids which is in the current portal_uuid (has facet count 0) 2. Some location_uuids are not in the current portal_uuid at all (has facet count = 0) It seems that solr doesn't honor the fq at all when returning field facet. I need to add one more parameter facet.mincount=1 in order to not return location_uuids facet (2). I think, solr does faceting on all location_uuid. It should does that scoping to current portal_uuid. Any idea? -- Chhorn Chamnap http://chamnap.github.com/
SolrCloud error while propagating update to primary ZK node
I get a JSON parse error (pasted below) when I send an update to a replica node. I downloaded solr 4 alpha and followed the instructions at http://wiki.apache.org/solr/SolrCloud/ and setup numShards=1 with 3 total servers managed by a zookeeper ensemble, the primary at 8983 and the other two at 7574 and 8900 respectively. The error below shows up in the primary's log when I try to add a document to either replica. The document add fails. I am able to successfully add documents by directly sending to the primary. How do I correctly add documents to replicas? SEVERE: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=,position=0 BEFORE='' AFTER='adddoc boost=1.0field name=id2' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:105) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:95) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:59) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) ... [snip] -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud replication question
I am trying to wrap my head around replication in SolrCloud. I tried the setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication for high query throughput. The setup at the URL above appears to maintain just one copy of the index at the primary node (instead of a replicated index as in a master/slave configuration). Will I still get roughly an n-fold increase in query throughput with n replicas? And if so, why would one do master/slave replication with multiple copies of the index at all? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImport using last_indexed_id or getting max(id) quickly
My understanding is that the DIH in solr only enters last_indexed_time in dataimport.properties, but not say last_indexed_id for a primary key 'id'. How can I efficiently get the max(id) (note that 'id' is an auto-increment field in the database) ? Maintaining max(id) outside of solr is brittle and calling max(id) before each dataimport can take several minutes when the index has several hundred million records. How can I either import based on ID or get max(id) quickly? I can not use timestamp-based import because I get out-of-memory errors if/when solr falls behind and the suggested fixes online did not work for me. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-last-indexed-id-or-getting-max-id-quickly-tp3993763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regression of JIRA 1826?
Is there any more information that folks need to dig into this? I have been unable to this point to figure out what specifically it is happening, so would appreciate any help. On Fri, Jul 6, 2012 at 2:13 PM, Jamie Johnson jej2...@gmail.com wrote: A little more information on this. I tinkered a bit with the schema and it appears to be related to WordDelimiterFilterFactory and splitOnCaseChange being true, or at least this setting being set exhibits the issue. Also I am using the edismax query parser. Again any ideas/help would be greatly appreciated. On Fri, Jul 6, 2012 at 1:40 AM, Jamie Johnson jej2...@gmail.com wrote: I just upgraded to trunk to try to fix an issue I was having with the highlighter described in JIRA 1826, but it appears that this issue still exists on trunk. I'm running the following query subject:ztest* subject is a text field (not multivalued) and the return in highlighting is emZTest/emForemZTestForJamie/em the actual stored value is ZTestForJamie. Is anyone else experiencing this?
Top 5 high freq words - UpdateProcessorChain or DIH Script?
Hi, I want to store top 5 high frequency non-stopwords words. I use DIH to import data. Now I have two approaches - 1. Use DIH JavaScript to find top 5 frequency words and put them in a copy field. The copy field will then stem it and remove stop words based on appropriate tokenizers. 2. Write a custom function for the same and add it to UpdateRequestProcessor Chain. Which of the two would be better suited? I find the first approach rather simple, but the issue is that I won't be having access to stop words/synonyms etc at the DIH time. In the second approach, if I add it to UpdateRequestProcessor Chain and insert the function after StopWordsFilterFactory and DuplicateRemoveFilterFactory, should be rather good way of doing this? -- *Pranav Prakash* temet nosce
Re: SolrCloud error while propagating update to primary ZK node
In theory, with SolrCloiud you can add to any replica and the change gets propagated automatically to all of the other replicas for that shard. In theory. The stack trace message suggests that Solr is trying to parse your input as JSON when in fact your input is XML. I vaguely recall that Yonik was working on update and had implemented something with JSON, but I don't recall that XML was also implemented. (or maybe the work was done in trunk but not backported to 4x - I just don't recall exactly.) For now, it sounds as if you have to have to send updates to the primary node of the shard and then let Solr replicate it. I'll defer to the Cloud experts on the details. -- Jack Krupansky -Original Message- From: avenka Sent: Sunday, July 08, 2012 11:52 AM To: solr-user@lucene.apache.org Subject: SolrCloud error while propagating update to primary ZK node I get a JSON parse error (pasted below) when I send an update to a replica node. I downloaded solr 4 alpha and followed the instructions at http://wiki.apache.org/solr/SolrCloud/ and setup numShards=1 with 3 total servers managed by a zookeeper ensemble, the primary at 8983 and the other two at 7574 and 8900 respectively. The error below shows up in the primary's log when I try to add a document to either replica. The document add fails. I am able to successfully add documents by directly sending to the primary. How do I correctly add documents to replicas? SEVERE: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=,position=0 BEFORE='' AFTER='adddoc boost=1.0field name=id2' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:105) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:95) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:59) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) ... [snip] -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud error while propagating update to primary ZK node
Can you show us exactly how you are adding the document? Eg, what update handler are you using, and what is the document you are adding? On Jul 8, 2012, at 12:52 PM, avenka wrote: I get a JSON parse error (pasted below) when I send an update to a replica node. I downloaded solr 4 alpha and followed the instructions at http://wiki.apache.org/solr/SolrCloud/ and setup numShards=1 with 3 total servers managed by a zookeeper ensemble, the primary at 8983 and the other two at 7574 and 8900 respectively. The error below shows up in the primary's log when I try to add a document to either replica. The document add fails. I am able to successfully add documents by directly sending to the primary. How do I correctly add documents to replicas? SEVERE: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=,position=0 BEFORE='' AFTER='adddoc boost=1.0field name=id2' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:105) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:95) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:59) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) ... [snip] -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760.html Sent from the Solr - User mailing list archive at Nabble.com. - Mark Miller lucidimagination.com
Re: SolrCloud error while propagating update to primary ZK node
I tried adding in two ways with the same outcome: (1) using solrj to call HttpSolrServer.add(docList) using BinaryRequestWriter; (2) using DataImportHandler to import directly from a database through a db-data-config.xml file. The document I'm adding has a long primary key id field and a few other string and timestamp fields. I also added a long _version_ field coz the URL said so. I've been using this schema without problems with 3.6 for a while and it works fine when added to the primary in 4.0. Mark Miller-3 [via Lucene] ml-node+s472066n3993780...@n3.nabble.com wrote: Can you show us exactly how you are adding the document? Eg, what update handler are you using, and what is the document you are adding? On Jul 8, 2012, at 12:52 PM, avenka wrote: I get a JSON parse error (pasted below) when I send an update to a replica node. I downloaded solr 4 alpha and followed the instructions at http://wiki.apache.org/solr/SolrCloud/ and setup numShards=1 with 3 total servers managed by a zookeeper ensemble, the primary at 8983 and the other two at 7574 and 8900 respectively. The error below shows up in the primary's log when I try to add a document to either replica. The document add fails. I am able to successfully add documents by directly sending to the primary. How do I correctly add documents to replicas? SEVERE: org.apache.noggit.JSONParser$ParseException: JSON Parse Error: char=,position=0 BEFORE='' AFTER='adddoc boost=1.0field name=id2' at org.apache.noggit.JSONParser.err(JSONParser.java:221) at org.apache.noggit.JSONParser.next(JSONParser.java:620) at org.apache.noggit.JSONParser.nextEvent(JSONParser.java:661) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:105) at org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:95) at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:59) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) ... [snip] -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760.html Sent from the Solr - User mailing list archive at Nabble.com. - Mark Miller lucidimagination.com _ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760p3993780.html To unsubscribe from SolrCloud error while propagating update to primary ZK node, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-error-while-propagating-update-to-primary-ZK-node-tp3993760p3993781.html Sent from the Solr - User mailing list archive at Nabble.com.
Deployment with LUCENE-2899 patch can't load solr.OpenNLPTokenizerFactory
Hi, Platform: ubuntu 12.04 Package: apache-solr-4.0-2012-07-07_11-55-05-src.tgz Web: Apache Tomcat/7.0.26 I'm trying to use the LUCENE-2899 patch (https://issues.apache.org/jira/browse/LUCENE-2899). As an end-user I believe this is the correct list to post to. I'm new to Solr, so I started by successfully deploying the solr/example project to tomcat7. To deploy a 2nd instance of solr that uses the OpenNLP configuration I performed the following steps as per the OpenNLP wiki page (http://wiki.apache.org/solr/OpenNLP). ant compile cd solr/contrib/opennlp/src/test-files/training run 'bin/trainall.sh' run ant-testcontrib All these tasks executed successfully. Then I attempted to actually deploy my Solr w/ OpenNLP instance with the following steps. Downloaded real models from http://opennlp.sourceforge.net/models-1.5/ (except for the content of coref, do I need to get this?) copied my solr-example deployment to create a solr-nlp deployment copied opennlp config to my deployment config source: solr/contrib/opennlp/src/test-files/opennlp/solr/collection1/conf dest: /var/tomcat/solr/nlp/solr/collection1/conf copied opennlp libs to my deployment libs - source: solr/contrib/opennlp/lib - dest: /var/tomcat/solr/nlp/solr/collection1/lib updated my deployed solrconfig.xml - set dataDir: dataDir${solr.data.dir:/var/tomcat/solr/nlp/solr/collection1/data}/dataDir - an absolute path was recommended by the Tomcat7 Solr deployment guide added a lib: lib dir=/var/tomcat/solr/nlp/solr/collection1/lib regex=.*\.jar/ - again, i specified an absolute path so tomcat would know exactly where to load the opennlp libs from When I attempt to hit my NLP instance of solr I get the following error (http://localhost:8080/solr-nlp/admin) This interface requires that you activate the admin request handlers, add the following configuration to your solrconfig.xml: However, I have the admin requestHandler defined exactly as requested. Is this a catch-all error of some sort? When I dig a little deeper and look at the Catalina logs I found an error and stack trace. SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_opennlp: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:364) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:111) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:816) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:514) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:284) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:106) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:103) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:168) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:322) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
PostCommit Document
Hi All, I would like to know how to use postCommit in SOLR properly. I would like to grab the indexed document and do further processing with it. How do I capture the documents being committed to the SOLR through the arguments in the postCommit config? I'm not using SolrJ and have no intention in using Java at the moment. If this is not possible, please let me know. Thank you Dewi
Re: Regression of JIRA 1826?
Please post a trimmed-down version of your schema.xml and a sample document. On Sun, Jul 8, 2012 at 11:54 AM, Jamie Johnson jej2...@gmail.com wrote: Is there any more information that folks need to dig into this? I have been unable to this point to figure out what specifically it is happening, so would appreciate any help. On Fri, Jul 6, 2012 at 2:13 PM, Jamie Johnson jej2...@gmail.com wrote: A little more information on this. I tinkered a bit with the schema and it appears to be related to WordDelimiterFilterFactory and splitOnCaseChange being true, or at least this setting being set exhibits the issue. Also I am using the edismax query parser. Again any ideas/help would be greatly appreciated. On Fri, Jul 6, 2012 at 1:40 AM, Jamie Johnson jej2...@gmail.com wrote: I just upgraded to trunk to try to fix an issue I was having with the highlighter described in JIRA 1826, but it appears that this issue still exists on trunk. I'm running the following query subject:ztest* subject is a text field (not multivalued) and the return in highlighting is emZTest/emForemZTestForJamie/em the actual stored value is ZTestForJamie. Is anyone else experiencing this? -- Lance Norskog goks...@gmail.com
Re: Indexing Wikipedia
Hi, I would recommend indexing wikipedia xml dump. Check out dataimport hander example of indexing wikipedia(http://wiki.apache.org/solr/DataImportHandler#Example%3a_Indexing_wikipedia). Thanks Vineet Yadav On Sun, Jul 8, 2012 at 9:15 AM, kiran kumar kirankumarsm...@gmail.com wrote: Hi, In our office we have wikipedia setup for intranet. I want to index the wikipedia, I have been recently studying that all the wiki pages are stored in database and the schema is a bit of standard followed from mediawiki. I am also thinking of whether to use xmldumper to dump all the wiki pages into xml and index from there. Have anybody done something like this. If so, which way is more efficient and easy to implement. For me the DB schema look quite a bit complicated. Can somebody please help me in understanding what is the better implementation for this. Thanks, Kiran Bushireddy.
RE: Better (and valid) Spellcheck in combination with other parameters with at least one occurance
Thanks James for your reply. I am using spell check collation options (except spellcheck.maxCollationTries). However, Will spellcheck.maxCollationTries consider other parqameneters in query or just all spellcheck words in q ? Becuse in my case, if original query is -- solr/search/?q=hangryc=CA(all suggestion params) then what i want is luscene suggestion to return is if q=hungry has hits with param c=CA then return suggestion hungry if q=angry has hits with param c=CA then return suggestion angry so, does maxCollationTries consider other parameters while collating the results ? - Ninad -- View this message in context: http://lucene.472066.n3.nabble.com/Better-and-valid-Spellcheck-in-combination-with-other-parameters-with-at-least-one-occurance-tp3993484p3993816.html Sent from the Solr - User mailing list archive at Nabble.com.