Schema / Config Error?
Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Schema / Config Error?
Hi :) Looks like you forgot to paste your schema.xml and the error in your e-mail : o Gary Le 06/06/2012 10:14, Spadez a écrit : Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to find the age of a page
Hi abdul and Jack, i got the tstamp working but I really need to know the published date of each page. On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky j...@basetechnology.comwrote: If you uncomment the timestamp field in the Solr example, Solr will automatically initialize it for each new document to be the time when the document is indexed (or most recently indexed). Any field declared with default=NOW and not explicitly initialized will have the current time when indexed (or re-indexed.) -- Jack Krupansky -Original Message- From: in.abdul Sent: Friday, June 01, 2012 6:55 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page Shameema Umer, you can add another one new field in schema .. while updating or indexing add the time stamp to that current field .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com wrote: Hi all, How can i find the age of a page solr results? that is the last updated time. tstamp refers to the fetch time, not the exact updated time, right? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.**nabble.com/How-to-find-the-** age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.** nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=** 472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.**nabble.com/template/** NamlServlet.jtp?macro=macro_**viewerid=instant_html%** 21nabble%3Aemail.namlbase=**nabble.naml.namespaces.** BasicNamespace-nabble.view.**web.template.NabbleNamespace-** nabble.view.web.template.**NodeNamespacebreadcrumbs=** notify_subscribers%21nabble%**3Aemail.naml-instant_emails%** 21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.** nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html Sent from the Solr - User mailing list archive at Nabble.com.
Issue with Solrcloud /solr 4.0 : Discrepancy in number of groups and ngroups value
We are using Solr 4.0 (svn build 30th may, 2012) with Solr Cloud. While querying, we use field collpasing with ngroups set to true. However, there is a difference in the number of results got and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=messagebody:monit%20AND%20usergroupid:3group=truegroup.field=idfacet.limit=20group.ngroups=true The values returned are like int name=matches10/int int name=ngroups9/int Actual groups returned :4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Earlier we were using the same query with solr 3.5 (without solr cloud) and it was giving correct results. Any kind of help is appreciated. -- Regards, Nitesh Nandy
Re: How to find the age of a page
when ever you reindex add the current TimeStamp .. that will be the publish date .. from there you can calculate Thanks and Regards, S SYED ABDUL KATHER On Wed, Jun 6, 2012 at 2:16 PM, Shameema Umer [via Lucene] ml-node+s472066n3987930...@n3.nabble.com wrote: Hi abdul and Jack, i got the tstamp working but I really need to know the published date of each page. On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky [hidden email]http://user/SendEmail.jtp?type=nodenode=3987930i=0wrote: If you uncomment the timestamp field in the Solr example, Solr will automatically initialize it for each new document to be the time when the document is indexed (or most recently indexed). Any field declared with default=NOW and not explicitly initialized will have the current time when indexed (or re-indexed.) -- Jack Krupansky -Original Message- From: in.abdul Sent: Friday, June 01, 2012 6:55 AM To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3987930i=1 Subject: Re: How to find the age of a page Shameema Umer, you can add another one new field in schema .. while updating or indexing add the time stamp to that current field .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] ml-node+s472066n3987234h80@n3.**nabble.comml-node%[hidden email]http://user/SendEmail.jtp?type=nodenode=3987930i=2 wrote: Hi all, How can i find the age of a page solr results? that is the last updated time. tstamp refers to the fetch time, not the exact updated time, right? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.**nabble.com/How-to-find-the-** age-of-a-page-tp3987234.html http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.** nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=** 472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.**nabble.com/template/** NamlServlet.jtp?macro=macro_**viewerid=instant_html%** 21nabble%3Aemail.namlbase=**nabble.naml.namespaces.** BasicNamespace-nabble.view.**web.template.NabbleNamespace-** nabble.view.web.template.**NodeNamespacebreadcrumbs=** notify_subscribers%21nabble%**3Aemail.naml-instant_emails%** 21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.naml http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.** nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.html http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987930.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987942.html Sent from the Solr - User mailing list archive at Nabble.com.
issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults
Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel collation:{ collationQuery:heup stapel, hits:0, misspellingsAndCorrections:{ huup:heup}}, collation:{ collationQuery:hugo stapel, hits:0, misspellingsAndCorrections:{ huup:hugo}}, collation:{ collationQuery:hulp stapel, hits:0, misspellingsAndCorrections:{ huup:hulp}}, collation:{ collationQuery:hup stapel, hits:0, misspellingsAndCorrections:{ huup:hup}}, collation:{ collationQuery:huub stapel, hits:0, misspellingsAndCorrections:{ huup:huub}}, collation:{ collationQuery:huur stapel, hits:0, misspellingsAndCorrections:{ huup:huur} Now, with maxCollationTries set to 3 or higher we finally get the required collation and the only collation able to return results. How can we determine the best value for maxCollationTries regarding the decrease of the thresholdTokenFrequency? Why is hits always zero? This is with a today's build and distributed search enabled. Thanks, Markus
Re: How to find the age of a page
Hi Syed Abdul, I am sorry to ask this basic question as I am new to nutch solr(even new to java application). Can you tell me how to add tstamp to published date after re-indexing. Does an update query is enough? Also, i am not able to get the field *publishedDate* in my query results to check whether it is working properly. Thanks Shameema
Re: Schema / Config Error?
That implies one of two things: 1 you changed solr.xml. I'd go back to the original and re-edit anything you've changed 2 you somehow got a corrupted download. Try blowing your installation away and getting a new copy Because it works perfectly for me. Best Erick On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote: Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtendedDisMax Question - Strange behaviour
Sorry, but your post is really hard to read with all the data inline. Try running with debugQuery=on and looking at the parsed query, I suspect your field lists aren't the same even though you think they are. Perhaps a typo somewhere? Best Erick On Mon, Jun 4, 2012 at 1:26 PM, André Maldonado andre.maldon...@gmail.com wrote: I'm doing a query with edismax. When I don't tell solr which fields I want to do the search (so it does in default field), it returns 2752 documents. ex: http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairro The same search, defining the fiels that composes the default field, it returns 1434 docs. ex: http://000.000.0.0:/solr/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25;http://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual qf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanualhttp://192.168.20.8:8984/solr/Index/select/?q=apartamento+moema+praia+churrasqueiraversion=2.2start=0rows=10indent=ondefType=dismaxmm=75%25facet=truefacet.field=bairroqf=agrupamentos+agrupamentos2+bairro+campanhalocalempreendimento+caracteristicas+caracteristicacomum+categoria+cep+chamada+cidade+codigoanuncio+complemento+descricaopermuta+docid+empreendimento+endereco+estado+informacoescomplementares+conteudoobservacao+sigla+subtipoimovel+tipoimovel+transacao+zapid+caminhomapa+codigooferta+segmento+anuncianteorigem+zapidcorporativo+estagiodaobra+condicoescomerciais+nomejornal+nomejornalordem+textomanual This is the important part of schema: defaultSearchFieldtextoboost/defaultSearchFieldcopyField source= agrupamentos2 dest=textoboost /copyField source=agrupamentos dest= textoboost /copyField source=bairro dest=textoboost /copyField source=campanhalocalempreendimento dest=textoboost /copyField source= caracteristicas dest=textoboost /copyField source=caracteristicacomum dest=textoboost /copyField source=categoria dest=textoboost / copyField source=cep dest=textoboost /copyField source=chamada dest =textoboost /copyField source=cidade dest=textoboost /copyField source=codigoanuncio dest=textoboost /copyField source=complemento dest=textoboost /copyField source=descricaopermuta dest=textoboost /copyField source=docid dest=textoboost /copyField source= empreendimento dest=textoboost /copyField source=endereco dest= textoboost /copyField source=estado dest=textoboost /copyField source=informacoescomplementares dest=textoboost /copyField source= conteudoobservacao dest=textoboost /copyField source=sigla dest= textoboost /copyField source=subtipoimovel dest=textoboost / copyField source=tipoimovel dest=textoboost /copyField source= transacao dest=textoboost /copyField source=zapid dest=textoboost /copyField source=caminhomapa dest=textoboost /copyField source= codigooferta dest=textoboost /copyField source=segmento dest= textoboost /copyField source=anuncianteorigem dest=textoboost / copyField source=zapidcorporativo dest=textoboost /copyField source= estagiodaobra dest=textoboost /copyField source=condicoescomerciais dest=textoboost /copyField source=nomejornal dest=textoboost / copyField source=nomejornalordem dest=textoboost / copyField source= textomanual dest=textoboost / What's the problem? Thank's * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado
Re: ReadTimeout on commit
You're probably hitting a background merge and the request is timing out even though the commit succeeds. Try querying for the data in the last packet to test this. And you don't say what version of Solr you're using. One test you can do is increase the number of documents before a commit. If merging is the problem I'd expect you to _still_ encounter this problem, just much less often. That would at least tell you if this is the right path to investigate. Best Erick On Tue, Jun 5, 2012 at 6:51 AM, spr...@gmx.eu wrote: Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you
Re: Schema / Config Error?
Make sure your port is 8983 or 8080. On Wed, Jun 6, 2012 at 4:27 PM, Erick Erickson erickerick...@gmail.comwrote: That implies one of two things: 1 you changed solr.xml. I'd go back to the original and re-edit anything you've changed 2 you somehow got a corrupted download. Try blowing your installation away and getting a new copy Because it works perfectly for me. Best Erick On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote: Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort by publishedDate and get published Date in solr query results
Step 1: Verify that publishedDate is in fact the field name that Nutch uses for published date. Step 2: Make sure the Nutch is passing the date in the format -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a question for this Solr mailing list. My (very limited) understanding is that there was a Nutch plugin that worked for the old version of Nutch but that it was not updated for the new version of Nutch. Step 3: Have you added the field publishedDate to your Solr schema with field type of date or tdate? If you can't figure out how to fix the problem on the Nutch side of the fence, then you will have to do a custom update processor for Solr. Solr 4.x has some new tools that should make that easier. See: https://issues.apache.org/jira/browse/SOLR-2802 -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:12 AM To: solr-user@lucene.apache.org Subject: sort by publishedDate and get published Date in solr query results Hi, Please help me sort by publishedDate and get publishedDate in solr query results. Do i need to install anything(plugin). Thanks Shameema
Re: How to find the age of a page
My misunderstanding. I thought you were publishing to SOLR and wanted the date when that occurred (indexing). -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:45 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page Hi abdul and Jack, i got the tstamp working but I really need to know the published date of each page. On Sat, Jun 2, 2012 at 12:01 AM, Jack Krupansky j...@basetechnology.comwrote: If you uncomment the timestamp field in the Solr example, Solr will automatically initialize it for each new document to be the time when the document is indexed (or most recently indexed). Any field declared with default=NOW and not explicitly initialized will have the current time when indexed (or re-indexed.) -- Jack Krupansky -Original Message- From: in.abdul Sent: Friday, June 01, 2012 6:55 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page Shameema Umer, you can add another one new field in schema .. while updating or indexing add the time stamp to that current field .. Thanks and Regards, S SYED ABDUL KATHER On Fri, Jun 1, 2012 at 3:44 PM, Shameema Umer [via Lucene] ml-node+s472066n3987234h80@n3.**nabble.comml-node%2bs472066n3987234...@n3.nabble.com wrote: Hi all, How can i find the age of a page solr results? that is the last updated time. tstamp refers to the fetch time, not the exact updated time, right? -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.**nabble.com/How-to-find-the-** age-of-a-page-tp3987234.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234.html To unsubscribe from Lucene, click herehttp://lucene.472066.n3.** nabble.com/template/**NamlServlet.jtp?macro=**unsubscribe_by_codenode=** 472066code=**aW4uYWJkdWxAZ21haWwuY29tfDQ3Mj**A2NnwxMDczOTUyNDEwhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw . NAMLhttp://lucene.472066.n3.**nabble.com/template/** NamlServlet.jtp?macro=macro_**viewerid=instant_html%** 21nabble%3Aemail.namlbase=**nabble.naml.namespaces.** BasicNamespace-nabble.view.**web.template.NabbleNamespace-** nabble.view.web.template.**NodeNamespacebreadcrumbs=** notify_subscribers%21nabble%**3Aemail.naml-instant_emails%** 21nabble%3Aemail.naml-send_**instant_email%21nabble%**3Aemail.namlhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml - THANKS AND REGARDS, SYED ABDUL KATHER -- View this message in context: http://lucene.472066.n3.** nabble.com/How-to-find-the-**age-of-a-page-**tp3987234p3987238.htmlhttp://lucene.472066.n3.nabble.com/How-to-find-the-age-of-a-page-tp3987234p3987238.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to find the age of a page
See the reply on the other email thread you started. -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 6:28 AM To: solr-user@lucene.apache.org Subject: Re: How to find the age of a page Hi Syed Abdul, I am sorry to ask this basic question as I am new to nutch solr(even new to java application). Can you tell me how to add tstamp to published date after re-indexing. Does an update query is enough? Also, i am not able to get the field *publishedDate* in my query results to check whether it is working properly. Thanks Shameema
Re: Schema / Config Error?
Read CHANGES.txt carefully, especially the section entitled Upgrading from Solr 3.5. For example, * As of Solr 3.6, the indexDefaults and mainIndex sections of solrconfig.xml are deprecated and replaced with a new indexConfig section. Read more in SOLR-1052 below. If you simply copied your schema/config directly, unchanged, then this could be the problem. You may need to compare your schema/config line-by-line to the new 3.6 schema/config for any differences. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, June 06, 2012 6:57 AM To: solr-user@lucene.apache.org Subject: Re: Schema / Config Error? That implies one of two things: 1 you changed solr.xml. I'd go back to the original and re-edit anything you've changed 2 you somehow got a corrupted download. Try blowing your installation away and getting a new copy Because it works perfectly for me. Best Erick On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote: Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort by publishedDate and get published Date in solr query results
Versions: Nutch: 1.4 and Solr: 3.4 My schema file contains !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ But I do not know whether this feed plugin is working or not as I am new to nutch and solr. Here is my query http://localhost:8983/solr/select/?q=title:'.$v.' content:'.$v.'sort=publishedDate descfl=tilte content url publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300' But this is not returning publishedDate on the results. Should i post this on nutch users mailing? Thanks. On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.comwrote: Step 1: Verify that publishedDate is in fact the field name that Nutch uses for published date. Step 2: Make sure the Nutch is passing the date in the format -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a question for this Solr mailing list. My (very limited) understanding is that there was a Nutch plugin that worked for the old version of Nutch but that it was not updated for the new version of Nutch. Step 3: Have you added the field publishedDate to your Solr schema with field type of date or tdate? If you can't figure out how to fix the problem on the Nutch side of the fence, then you will have to do a custom update processor for Solr. Solr 4.x has some new tools that should make that easier. See: https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802 -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:12 AM To: solr-user@lucene.apache.org Subject: sort by publishedDate and get published Date in solr query results Hi, Please help me sort by publishedDate and get publishedDate in solr query results. Do i need to install anything(plugin). Thanks Shameema
Re: ReadTimeout on commit
As Erick says, you are probably hitting an occasional automatic background merge which takes a bit longer. That is not an indication of a problem. Increase your connection timeout. Check the log to see how long the merge or slow commit takes. You have a timeout of 1000 which is 1 second. Make it longer, and possibly put the commit or other indexing operations in a loop with a few retries before considering connection timeout a fatal error. Occasional delays are a fact or life in a multi-process, networked environment. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, June 06, 2012 7:02 AM To: solr-user@lucene.apache.org Subject: Re: ReadTimeout on commit You're probably hitting a background merge and the request is timing out even though the commit succeeds. Try querying for the data in the last packet to test this. And you don't say what version of Solr you're using. One test you can do is increase the number of documents before a commit. If merging is the problem I'd expect you to _still_ encounter this problem, just much less often. That would at least tell you if this is the right path to investigate. Best Erick On Tue, Jun 5, 2012 at 6:51 AM, spr...@gmx.eu wrote: Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you
Re: sort by publishedDate and get published Date in solr query results
Check your Solr log file to see whether errors or warnings are issued. If Nutch is sending bogus date values, they should produce warnings. At this stage there are two strong possibilities: 1. Nutch is simply not sending that date field value at all. 2. Solr is rejecting the date field value because it is not in required -mm-ddThh:mm:ssZ format. If #2, you need to go the update processor route I mentioned previously. -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: sort by publishedDate and get published Date in solr query results Versions: Nutch: 1.4 and Solr: 3.4 My schema file contains !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ But I do not know whether this feed plugin is working or not as I am new to nutch and solr. Here is my query http://localhost:8983/solr/select/?q=title:'.$v.' content:'.$v.'sort=publishedDate descfl=tilte content url publishedDatestart=0rows=1version=2.2indent=onhl=truehl.fl=contenthl.fragsize=300' But this is not returning publishedDate on the results. Should i post this on nutch users mailing? Thanks. On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.comwrote: Step 1: Verify that publishedDate is in fact the field name that Nutch uses for published date. Step 2: Make sure the Nutch is passing the date in the format -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a question for this Solr mailing list. My (very limited) understanding is that there was a Nutch plugin that worked for the old version of Nutch but that it was not updated for the new version of Nutch. Step 3: Have you added the field publishedDate to your Solr schema with field type of date or tdate? If you can't figure out how to fix the problem on the Nutch side of the fence, then you will have to do a custom update processor for Solr. Solr 4.x has some new tools that should make that easier. See: https://issues.apache.org/**jira/browse/SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802 -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:12 AM To: solr-user@lucene.apache.org Subject: sort by publishedDate and get published Date in solr query results Hi, Please help me sort by publishedDate and get publishedDate in solr query results. Do i need to install anything(plugin). Thanks Shameema
Re: Efficiently mining or parsing data out of XML source files
I agree, that seems odd. We routinely index XML using either HTMLStripCharFilter, or XmlCharFilter (see patch: https://issues.apache.org/jira/browse/SOLR-2597), both of which parse the XML, and we don't see such a huge speed difference from indexing other field types. XmlCharFilter also allows you to specify which elements to index if you don't want the whole file. -Mike On 6/3/2012 8:42 AM, Erick Erickson wrote: This seems really odd. How big are these XML files? Where are you parsing them? You could consider using a SolrJ program with a SAX-style parser. But the first question I'd answer is what is slow?. The implications of your post is that parsing the XML is the slow part, it really shouldn't be taking anywhere near this long IMO... Best Erick On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian kristian.vantass...@siemens.com wrote: I'm just wondering what the general consensus is on indexing XML data to Solr in terms of parsing and mining the relevant data out of the file and putting them into Solr fields. Assume that this is the XML file and resulting Solr fields: XML data: mydoc id=1234 titlefoo/title bar attr1=val1/ bazgarbage data/baz / mydoc Solr Fields: Id=1234 Title=foo Bar=val1 I'd previously set this process up using XSLT and have since tested using XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, however, is not acceptable. 2800 objects take 11 minutes to parse and index into Solr. The big slowdown appears to be that I'm parsing the data with an XML parser. So, now I'm testing mining the data by opening the file as just a text file (using Groovy) and picking out relevant data using regular expression matching. I'm now able to parse (mine) the data and index the 2800 files in 72 seconds. So I'm wondering if the typical solution people use is to go with a non-XML solution. It seems to make sense considering the search index would only want to store (as much data) as possible and not rely on the incoming documents being xml compliant. Thanks in advance for any thoughts on this! -Kristian
Re: ReadTimeout on commit
Looks like the commit is taking longer than your set timeout. On Jun 5, 2012, at 6:51 AM, spr...@gmx.eu spr...@gmx.eu wrote: Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you - Mark Miller lucidimagination.com
Re: sort by publishedDate and get published Date in solr query results
OK Jack. Will do. On Wed, Jun 6, 2012 at 5:29 PM, Jack Krupansky j...@basetechnology.comwrote: Check your Solr log file to see whether errors or warnings are issued. If Nutch is sending bogus date values, they should produce warnings. At this stage there are two strong possibilities: 1. Nutch is simply not sending that date field value at all. 2. Solr is rejecting the date field value because it is not in required -mm-ddThh:mm:ssZ format. If #2, you need to go the update processor route I mentioned previously. -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 7:37 AM To: solr-user@lucene.apache.org Subject: Re: sort by publishedDate and get published Date in solr query results Versions: Nutch: 1.4 and Solr: 3.4 My schema file contains !-- fields for feed plugin (tag is also used by microformats-reltag)-- field name=author type=string stored=true indexed=true/ field name=tag type=string stored=true indexed=true multiValued=true/ field name=feed type=string stored=true indexed=true/ field name=publishedDate type=date stored=true indexed=true/ field name=updatedDate type=date stored=true indexed=true/ But I do not know whether this feed plugin is working or not as I am new to nutch and solr. Here is my query http://localhost:8983/solr/**select/?q=title:'.$vhttp://localhost:8983/solr/select/?q=title:%27.$v .' content:'.$v.'sort=**publishedDate descfl=tilte content url publishedDatestart=0rows=1**version=2.2indent=onhl=true** hl.fl=contenthl.fragsize=300' But this is not returning publishedDate on the results. Should i post this on nutch users mailing? Thanks. On Wed, Jun 6, 2012 at 4:52 PM, Jack Krupansky j...@basetechnology.com** wrote: Step 1: Verify that publishedDate is in fact the field name that Nutch uses for published date. Step 2: Make sure the Nutch is passing the date in the format -MM-DDTHH:MM:SSZ. Whether you need a Nutch plugin to do that is not a question for this Solr mailing list. My (very limited) understanding is that there was a Nutch plugin that worked for the old version of Nutch but that it was not updated for the new version of Nutch. Step 3: Have you added the field publishedDate to your Solr schema with field type of date or tdate? If you can't figure out how to fix the problem on the Nutch side of the fence, then you will have to do a custom update processor for Solr. Solr 4.x has some new tools that should make that easier. See: https://issues.apache.org/jira/browse/SOLR-2802https://issues.apache.org/**jira/browse/SOLR-2802 https://**issues.apache.org/jira/browse/**SOLR-2802https://issues.apache.org/jira/browse/SOLR-2802 -- Jack Krupansky -Original Message- From: Shameema Umer Sent: Wednesday, June 06, 2012 4:12 AM To: solr-user@lucene.apache.org Subject: sort by publishedDate and get published Date in solr query results Hi, Please help me sort by publishedDate and get publishedDate in solr query results. Do i need to install anything(plugin). Thanks Shameema
RE: ReadTimeout on commit
Hi Jack, hi Erik, thanks for the tips! It's solr 3.6 I increased the batch to 1000 docs and the timeout to 10 s. Now it works. And I will implement the retry around the commit-call. Thx! -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Mittwoch, 6. Juni 2012 13:52 To: solr-user@lucene.apache.org Subject: Re: ReadTimeout on commit As Erick says, you are probably hitting an occasional automatic background merge which takes a bit longer. That is not an indication of a problem. Increase your connection timeout. Check the log to see how long the merge or slow commit takes. You have a timeout of 1000 which is 1 second. Make it longer, and possibly put the commit or other indexing operations in a loop with a few retries before considering connection timeout a fatal error. Occasional delays are a fact or life in a multi-process, networked environment. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, June 06, 2012 7:02 AM To: solr-user@lucene.apache.org Subject: Re: ReadTimeout on commit You're probably hitting a background merge and the request is timing out even though the commit succeeds. Try querying for the data in the last packet to test this. And you don't say what version of Solr you're using. One test you can do is increase the number of documents before a commit. If merging is the problem I'd expect you to _still_ encounter this problem, just much less often. That would at least tell you if this is the right path to investigate. Best Erick On Tue, Jun 5, 2012 at 6:51 AM, spr...@gmx.eu wrote: Hi, I'm indexing documents in batches of 100 docs. Then commit. Sometimes I get this exception: org.apache.solr.client.solrj.SolrServerException: java.net.SocketTimeoutException: Read timed out at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques t(CommonsHttpS olrServer.java:475) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.reques t(CommonsHttpS olrServer.java:249) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.pro cess(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:178) I found some similar postings in the web, all recommending autocommit. This is unfortunately not an option for me, because I have to know whether solr committed or not. What is causing this timeout? I'm using these settings in solrj: server.setSoTimeout(1000); server.setConnectionTimeout(100); server.setDefaultMaxConnectionsPerHost(100); server.setMaxTotalConnections(100); server.setFollowRedirects(false); server.setAllowCompression(true); server.setMaxRetries(1); Thank you
Re: Efficiently mining or parsing data out of XML source files
I did see a mention yesterday to a situation involving DIH and large XML files where is was unusually slow, but if the big XML file was broken into many smaller files it went really fast for the same amount of data. If that is the case, you don't need to parse all of the XML, just detect the boundaries between documents and break them into smaller XML files. -- Jack Krupansky -Original Message- From: Mike Sokolov Sent: Wednesday, June 06, 2012 8:02 AM To: solr-user@lucene.apache.org Cc: Erick Erickson Subject: Re: Efficiently mining or parsing data out of XML source files I agree, that seems odd. We routinely index XML using either HTMLStripCharFilter, or XmlCharFilter (see patch: https://issues.apache.org/jira/browse/SOLR-2597), both of which parse the XML, and we don't see such a huge speed difference from indexing other field types. XmlCharFilter also allows you to specify which elements to index if you don't want the whole file. -Mike On 6/3/2012 8:42 AM, Erick Erickson wrote: This seems really odd. How big are these XML files? Where are you parsing them? You could consider using a SolrJ program with a SAX-style parser. But the first question I'd answer is what is slow?. The implications of your post is that parsing the XML is the slow part, it really shouldn't be taking anywhere near this long IMO... Best Erick On Thu, May 31, 2012 at 9:14 AM, Van Tassell, Kristian kristian.vantass...@siemens.com wrote: I'm just wondering what the general consensus is on indexing XML data to Solr in terms of parsing and mining the relevant data out of the file and putting them into Solr fields. Assume that this is the XML file and resulting Solr fields: XML data: mydoc id=1234 titlefoo/title bar attr1=val1/ bazgarbage data/baz / mydoc Solr Fields: Id=1234 Title=foo Bar=val1 I'd previously set this process up using XSLT and have since tested using XMLBeans, JAXB, etc. to get the relevant data. The speed at which this occurs, however, is not acceptable. 2800 objects take 11 minutes to parse and index into Solr. The big slowdown appears to be that I'm parsing the data with an XML parser. So, now I'm testing mining the data by opening the file as just a text file (using Groovy) and picking out relevant data using regular expression matching. I'm now able to parse (mine) the data and index the 2800 files in 72 seconds. So I'm wondering if the typical solution people use is to go with a non-XML solution. It seems to make sense considering the search index would only want to store (as much data) as possible and not rely on the incoming documents being xml compliant. Thanks in advance for any thoughts on this! -Kristian
Fielded searches with Solr ExtendedDisMax Query Parser
Hi all, I'm having a problem using the Solr ExtendedDisMax Query Parser with query that contains fielded searches inside not-plain queries. The case is the following. If I send to SOLR an edismax request (defType=edismax) with parameters 1. qf=field1^10 2. q=field2:ciao 3. debugQuery=on (for debug purposes) solr parses the query as I expect, in fact the debug part of the response tells me that [parsedquery_toString] = +field2:ciao But if I make the expression only a bit more complex, like putting the condition into brackets: 1. qf=field1^10 2. q=(field2:ciao) I get [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2) where Solr seems not recognize the field syntax. I've not found any mention to this behavior in the [documentation][1], where instead they say that This parser supports full Lucene QueryParser syntax including boolean operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuzzy... This problem is really annoying me because I would like to do compelx boolean and fielded queries even with the edismax parser. Do you know a way to workaround this? Thank you in advance. Nicolò Martini [1]: http://wiki.apache.org/solr/ExtendedDisMax
Re: Exception when optimizing index
It could be related to https://issues.apache.org/jira/browse/LUCENE-2975. At least the exception comes from the same function. Caused by: java.io.IOException: Invalid vInt detected (too many bits) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112) What hardware and Java version are you running? -- Jack Krupansky -Original Message- From: Rok Rejc Sent: Wednesday, June 06, 2012 3:45 AM To: solr-user@lucene.apache.org Subject: Exception when optimizing index Hi all, I have a solr installation (version 4.0 from trunk - 1st May 2012). After I imported documents (99831145 documents) I have run the optimization. I got an exception: responselst name=responseHeaderint name=status500/intint name=QTime281615/int/lstlst name=errorstr name=msgbackground merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785 _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814 _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475 _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618 _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402 _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113 _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324 _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft [maxNumSegments=1]/strstr name=tracejava.io.IOException: background merge hit exception: _8x(4.0):C202059 _e0(4.0):C192649 _3r(4.0):C205785 _1s(4.0):C203526 _4w(4.0):C199793 _7f(4.0):C193108 _dy(4.0):C185814 _7d(4.0):C190364 _c5(4.0):C187881 _8u(4.0):C185001 _r(4.0):C183475 _1r(4.0):C185622 _2s(4.0):C174349 _3s(4.0):C171683 _7h(4.0):C170618 _fj(4.0):C179232 _2t(4.0):C161907 _fi(4.0):C168713 _1q(4.0):C165402 _2r(4.0):C152995 _e1(4.0):C146080 _f4(4.0):C155072 _af(4.0):C149113 _dx(4.0):C147298 _3t(4.0):C150806 _q(4.0):C146874 _4v(4.0):C146324 _fc(4.0):C141426 _al(4.0):C125361 _64(4.0):C119208 into _ft [maxNumSegments=1] at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1475) at org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1412) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:385) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:82) at org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:783) at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:155) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:865) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:579) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1556) at java.lang.Thread.run(Thread.java:679) Caused by: java.io.IOException: Invalid vInt detected (too many bits) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:112) at org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextUnreadDoc(Lucene40PostingsReader.java:557) at org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$SegmentDocsEnumBase.refill(Lucene40PostingsReader.java:408) at org.apache.lucene.codecs.lucene40.Lucene40PostingsReader$AllDocsSegmentDocsEnum.nextDoc(Lucene40PostingsReader.java:508) at org.apache.lucene.codecs.MappingMultiDocsEnum.nextDoc(MappingMultiDocsEnum.java:85) at org.apache.lucene.codecs.PostingsConsumer.merge(PostingsConsumer.java:65) at
Re: Fielded searches with Solr ExtendedDisMax Query Parser
This is a known (unfixed) bug. The workaround is to add a space between each left parenthesis and field name. See: https://issues.apache.org/jira/browse/SOLR-3377 So, q=(field2:ciao) becomes: q=( field2:ciao) -- Jack Krupansky -Original Message- From: Nicolò Martini Sent: Wednesday, June 06, 2012 8:35 AM To: solr-user@lucene.apache.org Subject: Fielded searches with Solr ExtendedDisMax Query Parser Hi all, I'm having a problem using the Solr ExtendedDisMax Query Parser with query that contains fielded searches inside not-plain queries. The case is the following. If I send to SOLR an edismax request (defType=edismax) with parameters 1. qf=field1^10 2. q=field2:ciao 3. debugQuery=on (for debug purposes) solr parses the query as I expect, in fact the debug part of the response tells me that [parsedquery_toString] = +field2:ciao But if I make the expression only a bit more complex, like putting the condition into brackets: 1. qf=field1^10 2. q=(field2:ciao) I get [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2) where Solr seems not recognize the field syntax. I've not found any mention to this behavior in the [documentation][1], where instead they say that This parser supports full Lucene QueryParser syntax including boolean operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuzzy... This problem is really annoying me because I would like to do compelx boolean and fielded queries even with the edismax parser. Do you know a way to workaround this? Thank you in advance. Nicolò Martini [1]: http://wiki.apache.org/solr/ExtendedDisMax=
Re: highlighter not respecting sentence boundry
I don't quite understand the problem. What is an example snippet that you think is incorrect and what do you think the snipppet should be? Also, try the /browse handler in the Solr example after following the Solr tutorial to post data. Do a search that will highlight terms similar to what you want. When you see that it works in /browse, try to replicate the settings for your own handler. -- Jack Krupansky -Original Message- From: abhayd Sent: Tuesday, June 05, 2012 2:41 AM To: solr-user@lucene.apache.org Subject: Re: highlighter not respecting sentence boundry Any help on this one? Seems like highlighting component does not always start the snippet from starting of snippet. I tried several options. Has anyone successfully got this working? -- View this message in context: http://lucene.472066.n3.nabble.com/highlighter-not-respecting-sentence-boundry-tp3984327p3987718.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtendedDisMax Question - Strange behaviour
Erick, thanks for your reply and sorry for the confusion in last e-mail. But it is hard to explain the situation without that bunch of code. In my schema I have a field called textoboost that contains copies of a lot of other fields. Doing the query in this field I got this: +(((textoboost:apartamento) (textoboost:moema) (textoboost:praia) (textoboost:churrasqueira))~3) This is correct and returns 2452 documents. When I do the same search but, instead of doing it on textoboost field, doing in all fields that textoboost contains I got the following query (without typos and returning only 1434 documents). +(((estagiodaobra:apartamento | campanhalocalempreendimento:apartamento | textomanual:apartamento | codigooferta:apartamento | zapidcorporativo:apartamento | conteudoobservacao:apartamento | categoria:apartamento | docid:apartamento | cep:apartamento | caracteristicas:apartamento | condicoescomerciais:apartamento | anuncianteorigem:apartamento | empreendimento:apartamento | complemento:apartamento | caracteristicacomum:apartamento | codigoanuncio:apartamento | nomejornal:apartamento | agrupamentos2:apartamento | subtipoimovel:apartamento | descricaopermuta:apartamento | zapid:apartamento | cidade:apartamento | bairro:apartamento | transacao:apartamento | estado:apartamento | tipoimovel:apartamento | sigla:apartamento | caminhomapa:apartamento | chamada:apartamento | segmento:apartamento | nomejornalordem:apartamento | agrupamentos:apartamento | endereco:apartamento | informacoescomplementares:apartamento) (estagiodaobra:moema | campanhalocalempreendimento:moema | textomanual:moema | codigooferta:moema | zapidcorporativo:moema | conteudoobservacao:moema | categoria:moema | docid:moema | cep:moema | caracteristicas:moema | condicoescomerciais:moema | anuncianteorigem:moema | empreendimento:moema | complemento:moema | caracteristicacomum:moema | codigoanuncio:moema | nomejornal:moema | agrupamentos2:moema | subtipoimovel:moema | descricaopermuta:moema | zapid:moema | cidade:moema | bairro:moema | transacao:moema | estado:moema | tipoimovel:moema | sigla:moema | caminhomapa:moema | chamada:moema | segmento:moema | nomejornalordem:moema | agrupamentos:moema | endereco:moema | informacoescomplementares:moema) (estagiodaobra:praia | campanhalocalempreendimento:praia | textomanual:praia | codigooferta:praia | zapidcorporativo:praia | conteudoobservacao:praia | categoria:praia | docid:praia | cep:praia | caracteristicas:praia | condicoescomerciais:praia | anuncianteorigem:praia | empreendimento:praia | complemento:praia | caracteristicacomum:praia | codigoanuncio:praia | nomejornal:praia | agrupamentos2:praia | subtipoimovel:praia | descricaopermuta:praia | zapid:praia | cidade:praia | bairro:praia | transacao:praia | estado:praia | tipoimovel:praia | sigla:praia | caminhomapa:praia | chamada:praia | segmento:praia | nomejornalordem:praia | agrupamentos:praia | endereco:praia | informacoescomplementares:praia) (estagiodaobra:churrasqueira | campanhalocalempreendimento:churrasqueira | textomanual:churrasqueira | codigooferta:churrasqueira | zapidcorporativo:churrasqueira | conteudoobservacao:churrasqueira | categoria:churrasqueira | docid:churrasqueira | cep:churrasqueira | caracteristicas:churrasqueira | condicoescomerciais:churrasqueira | anuncianteorigem:churrasqueira | empreendimento:churrasqueira | complemento:churrasqueira | caracteristicacomum:churrasqueira | codigoanuncio:churrasqueira | nomejornal:churrasqueira | agrupamentos2:churrasqueira | subtipoimovel:churrasqueira | descricaopermuta:churrasqueira | zapid:churrasqueira | cidade:churrasqueira | bairro:churrasqueira | transacao:churrasqueira | estado:churrasqueira | tipoimovel:churrasqueira | sigla:churrasqueira | caminhomapa:churrasqueira | chamada:churrasqueira | segmento:churrasqueira | nomejornalordem:churrasqueira | agrupamentos:churrasqueira | endereco:churrasqueira | informacoescomplementares:churrasqueira))~3) What can be wrong? Thank's * -- * *E conhecereis a verdade, e a verdade vos libertará. (João 8:32)* *andre.maldonado*@gmail.com andre.maldon...@gmail.com (11) 9112-4227 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.orkut.com.br/Main#Profile?uid=2397703412199036664 http://www.facebook.com/profile.php?id=10659376883 http://twitter.com/andremaldonado http://www.delicious.com/andre.maldonado https://profiles.google.com/105605760943701739931 http://www.linkedin.com/pub/andr%C3%A9-maldonado/23/234/4b3 http://www.youtube.com/andremaldonado On Wed, Jun 6, 2012 at 7:59 AM, Erick Erickson erickerick...@gmail.comwrote: Sorry, but your post is really hard to read with all the data inline. Try running with debugQuery=on and looking at the parsed query, I suspect your field lists aren't the same even though you think they are. Perhaps a typo somewhere? Best Erick On Mon, Jun 4, 2012 at 1:26 PM, André
solrj library requirements: slf4j-jdk14-1.5.5.jar
the section of the solrj wiki page on setting up the class path calls for slf4j-jdk14-1.5.5.jar which is supposed to be in a lib/ subdirectory. i don't see this jar or any like it with a different version anywhere in either the 3.5.0 or 3.6.0 distributions. is it really needed or is this just slightly outdated documentation? the top of the page (which references solr 1.4) suggests this is true, and i see other docs on the web suggesting this is the case, but the first result that pops out of google for solrj is the apparently outdated wiki page, so i imagine others will encounter the same issue. the other, more recent pages are not without issue as well, for example this page: http://lucidworks.lucidimagination.com/display/solr/Using+SolrJ references apache-solr-common which i'm not finding either. thanks, richard
problem with mapping-iso accents
Hi, i have a problem ISOaccent tokenize filter. i have e field in my schema with this filter: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ if i try this filter with analyisis tool in solr admin panel it works. for example: sarà = sara. but when i create indexes it doesn't work. in the index the field is sarà with accent. why? i use ad mysqlconnector to create indexes directly from mysql db the mysql db is in uft-8, the connector charset is utf-8, solr is in utf-8 by default. recently i changed my java from openjdk to sun-jdk. can be that the reason? thanx -- *Gastone Penzo* * *
Re: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults
Do single-word queries return hits? Is this a multi-shard environment? Does the request list all the shards needed to give hits for all the collations you expect? Maybe the queries are being done locally and don't have hits for the collations locally. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Wednesday, June 06, 2012 6:21 AM To: solr-user@lucene.apache.org Subject: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel collation:{ collationQuery:heup stapel, hits:0, misspellingsAndCorrections:{ huup:heup}}, collation:{ collationQuery:hugo stapel, hits:0, misspellingsAndCorrections:{ huup:hugo}}, collation:{ collationQuery:hulp stapel, hits:0, misspellingsAndCorrections:{ huup:hulp}}, collation:{ collationQuery:hup stapel, hits:0, misspellingsAndCorrections:{ huup:hup}}, collation:{ collationQuery:huub stapel, hits:0, misspellingsAndCorrections:{ huup:huub}}, collation:{ collationQuery:huur stapel, hits:0, misspellingsAndCorrections:{ huup:huur} Now, with maxCollationTries set to 3 or higher we finally get the required collation and the only collation able to return results. How can we determine the best value for maxCollationTries regarding the decrease of the thresholdTokenFrequency? Why is hits always zero? This is with a today's build and distributed search enabled. Thanks, Markus
Re: Fielded searches with Solr ExtendedDisMax Query Parser
Great! Thank you a lot, that solved all my problems. Regards, Nicolò Il giorno 06/giu/2012, alle ore 14:55, Jack Krupansky ha scritto: This is a known (unfixed) bug. The workaround is to add a space between each left parenthesis and field name. See: https://issues.apache.org/jira/browse/SOLR-3377 So, q=(field2:ciao) becomes: q=( field2:ciao) -- Jack Krupansky -Original Message- From: Nicolò Martini Sent: Wednesday, June 06, 2012 8:35 AM To: solr-user@lucene.apache.org Subject: Fielded searches with Solr ExtendedDisMax Query Parser Hi all, I'm having a problem using the Solr ExtendedDisMax Query Parser with query that contains fielded searches inside not-plain queries. The case is the following. If I send to SOLR an edismax request (defType=edismax) with parameters 1. qf=field1^10 2. q=field2:ciao 3. debugQuery=on (for debug purposes) solr parses the query as I expect, in fact the debug part of the response tells me that [parsedquery_toString] = +field2:ciao But if I make the expression only a bit more complex, like putting the condition into brackets: 1. qf=field1^10 2. q=(field2:ciao) I get [parsedquery_toString] = +(((field1:field2:^2.0) (field1:ciao^2.0))~2) where Solr seems not recognize the field syntax. I've not found any mention to this behavior in the [documentation][1], where instead they say that This parser supports full Lucene QueryParser syntax including boolean operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuzzy... This problem is really annoying me because I would like to do compelx boolean and fielded queries even with the edismax parser. Do you know a way to workaround this? Thank you in advance. Nicolò Martini [1]: http://wiki.apache.org/solr/ExtendedDisMax=
RE: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults
Markus, With maxCollationTries=0, it is not going out and querying the collations to see how many hits they each produce. So it doesn't know the # of hits. That is why if you also specify collateExtendedResults=true, all the hit counts are zero. It would probably be better in this case if it would not report hits in the extended response at all. (On the other hand, if you're seeing zeros and maxCollationTries0, then you've hit a bug!) thresholdTokenFrequency in my opinion is a pretty blunt instrument for getting rid of bad suggestions. It takes out all of the rare terms, presuming that if a term is rare in the data it either is a mistake or isn't worthy to be suggested ever. But if you're using maxCollationTries the suggestions that don't fit will be filtered out automatically, making thresholdTokenFrequency to be needed less. (On the other hand, if you're using IndexBasedSpellChecker, thresholdTokenFrequency will make the dictionary smaller and spellcheck.build run faster... This is solved entirely in 4.0 with DirectSolrSpellChecker...) For the apps here, I've been using maxCollationTries=10 and have been getting good results. Keep in mind that even though you're allowing it to try up to 10 queries to find a viable collation, so long as you're setting maxCollations to something low it will (hopefully) seldom need to try more than a couple before finding one with hits. (I always ask for only 1 collation as we just re-apply the spelling correction automatically if the original query returned nothing). Also, if spellcheck.count is low it might not have enough terms available to try, so you might need to raise this value also if raising maxCollationTries. The worse problem, in my opinion is the fact that it won't ever suggest words if they're in the index (even if using thresholdTokenFrequency to remove them from the dictionary). For that there is https://issues.apache.org/jira/browse/SOLR-2585 which is part of Solr 4. The only other workaround is onlyMorePopular which has its own issues. (see http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Wednesday, June 06, 2012 5:22 AM To: solr-user@lucene.apache.org Subject: issues with spellcheck.maxCollationTries and spellcheck.collateExtendedResults Hi, We've had some issues with a bad zero-hits collation being returned for a two word query where one word was only one edit away from the required collation. With spellcheck.maxCollations to a reasonable number we saw the various suggestions without the required collation. We decreased thresholdTokenFrequency to make it appear in the list of collations. However, with collateExtendedResults=true the hits field for each collation was zero, which is incorrect. Required collation=huub stapel (two hits) and q=huup stapel collation:{ collationQuery:heup stapel, hits:0, misspellingsAndCorrections:{ huup:heup}}, collation:{ collationQuery:hugo stapel, hits:0, misspellingsAndCorrections:{ huup:hugo}}, collation:{ collationQuery:hulp stapel, hits:0, misspellingsAndCorrections:{ huup:hulp}}, collation:{ collationQuery:hup stapel, hits:0, misspellingsAndCorrections:{ huup:hup}}, collation:{ collationQuery:huub stapel, hits:0, misspellingsAndCorrections:{ huup:huub}}, collation:{ collationQuery:huur stapel, hits:0, misspellingsAndCorrections:{ huup:huur} Now, with maxCollationTries set to 3 or higher we finally get the required collation and the only collation able to return results. How can we determine the best value for maxCollationTries regarding the decrease of the thresholdTokenFrequency? Why is hits always zero? This is with a today's build and distributed search enabled. Thanks, Markus
Levenstein Distance
I have a list of synoynms which is being expanded at query time. This yields a lot of results (in millions). My use-case is name search. I want to sort the results by Levenstein Distance. I know this can be done with strdist function. But sorting being inefficient and Solr function adding to its woes kills the performance. I want the results to be returned as quickly as possible. One of the ways which I think Levenstein can work is, applying the strdist on the synonym file and getting the scores of each of the synonym. And then use these scores to boost the results appropriately, it should be equivalent to levenstein distance. But I am not sure how to do this in Solr or infact if Solr supports this. -- View this message in context: http://lucene.472066.n3.nabble.com/Levenstein-Distance-tp3988026.html Sent from the Solr - User mailing list archive at Nabble.com.
Single term boosting with dismax
Hi, i'm using dismax query parser. i would like to boost on a single term at query time, instead that on the whole field. i should probably use the standard query parser, however i've also overriden the dismax query parser to handle payload boosting on terms. what i want to obtain is a double boost (query and indexing time) . for example q = cat^2.0 dog^3.0 qf=text defType=myPayloadHandler having text = cat|3.0 dog|3.0 in my index obtaining (excluding other score components) score(cat) = 3.0*2.0*restOfScore score(dog)= 3.0*3.0 *restOfScore however it seems impossible doing it with myPayloadHandler (that simply override dismax) it is only possible boosting on a field like that qf=text^10.0 Am i right? how can i boost on a single field at query time? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-term-boosting-with-dismax-tp3988027.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boost by Nested Query / Join Needed?
Generally, you just have to bite the bullet and denormalize. Yes, it really runs counter to to your DB mindset G But before jumping that way, how many denormalized records are we talking here? 1M? 100M? 1B? Solr has (4.x) some join capability, but it makes a lousy general-purpose database. You might want to look at Function Queries as a way to boost results based on numeric fields. If you want a strict ordering, you're looking at sort, but note that sorts only work on a single-valued field. Best Erick On Tue, Jun 5, 2012 at 12:48 PM, naleiden nalei...@gmail.com wrote: Hi, First off, I'm about a week into all things Solr, and still trying to figure out how to fit my relational-shaped peg through a denormalized hole. Please forgive my ignorance below :-D I have the need store a One-to-N type relationship, and perform a boost a related field. Let's say I want to index a number of different types of candy, and also a customer's preference for each type of candy (which I index/update when a customer makes a purchase), and then boost by that preference on search. Here is my paired-down attempt at a denormalized schema: ! -- Common Fields -- field name=id type=string indexed=true stored=true required=true / field name=type type=string indexed=true stored=true required=true / ! -- Fields for 'candy' -- field name=name type=text_general indexed=true stored=true/ field name=description type=text_general indexed=true stored=true/ ! -- Fields for Customer-Candy Preference ('preference') -- field name=user type=integer indexed=true stored=true field name=candy type=integer indexed=true stored=true field name=weight type=integer indexed=true stored=true default=0 I am indexing 'candy' and 'preferences' separately, and when indexing one, I leave the fields of the other empty (with the exception of the required 'id' and 'type'). Ignoring the query score, this is effectively what I'm looking to do in SQL: SELECT candy.id, candy.name, candy.description FROM candy LEFT JOIN preference ON (preference.candy = candy.id AND preference.customer = 'someCustomerID') // Where some match is made on query against candy.name or candy.description ORDER BY preference.weight DESC My questions are: 1.) Am I making any assumptions with respect to what are effectively different document types in the schema that will not scale well? I don't think I want to be duplicating each 'candy' entry for every customer, or maybe that wouldn't be such a big deal in Solr. 2.) Can someone point me in the right direction on how to perform this type of boost in a Solr query? Thanks in advance, Nick -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-by-Nested-Query-Join-Needed-tp3987818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?
Hmmm, it would be better to open a Solr JIRA and attach this as a patch. Although we've had some folks provide a Git-based rather than an SVN-based patch. Anyone can open a JIRA, but you must create a signon to do that. It'd get more attention that way Best Erick On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan gregg...@gmail.com wrote: We've encountered GC spikes at Etsy after adding new ExternalFileFields a decent number of times. I was always a little confused by this behavior -- isn't it just one big float[]? why does that cause problems for the GC? -- but looking at the FileFloatSource code a little more carefully, I wonder if this is due to using a WeakHashMap that is only cleaned by GC or manual invocation of a request handler. FileFloatSource stores a WeakHashMap containing IndexReader,float[] or CreationPlaceholder. In the code[1], it mentions that the implementation is modeled after the FieldCache implementation. However, the FieldCacheImpl adds listeners for IndexReader close events and uses those to purge its caches. [2] Should we be doing the same in FileFloatSource? Here's a mostly untested patch[3] with a possible implementation. There are probably better ways to do it (e.g. I don't love using another WeakHashMap), but I found it tough to hook into the IndexReader lifecycle without a) relying on classes other than FileFloatSource b) changing the public API of FIleFloatSource or c) changing the implementation too much. There is a RequestHandler inside of FileFloatSource (ReloadCacheRequestHandler) that can be used to clear the cache entirely[4], but this is sub-optimal for us for a few reasons: --It clears the entire cache. ExternalFileFields often take some non-trivial time to load and we prefer to do so during SolrCore warmups. Clearing the entire cache while serving traffic would likely cause user-facing requests to timeout. --It forces an extra commit with its consequent cache cycling, etc.. I'm thinking of ways to monitor the size of FileFloatSource's cache to track its size against GC pause times, but it seems tricky because even calling WeakHashMap#size() has side-effects. Any ideas? Overall, what do you think? Does relying on GC to clean this cache make sense as a possible cause of GC spikiness? If so, does the patch [3] look like a decent approach? Thanks! --Gregg [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135 [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166 [3] https://gist.github.com/2876371 [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310
Re: Replication
A couple of things to check. 1 Are you optimizing all the time? An optimization will merge all the segments into a single segment, which will cause the whole index to be replicated after each optimization. Best Erick On Wed, Jun 6, 2012 at 1:33 AM, William Bell billnb...@gmail.com wrote: We are using SOLR 1.4, and we are experiencing full index replication every 15 minutes. I have checked the solrconfig and it has maxsegments set to 20. It appears like it is indexing a segment, but replicating the whole index. How can I verify it and possibly fix the issue? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: TermComponent and Optimize
It is possible to use the expungeDeletes option in the commit, that could solve your problem. http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22 Sadly, there is currently a bug with the TieredMergePolicy : https://issues.apache.org/jira/browse/SOLR-2725 SOLR-2725 . But you can use another merge policy (LogMergePolicy for instance). Your updates will be (a bit) slower if you use this solution. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/TermComponent-and-Optimize-tp3985696p3988056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: pass custom parameters from client to solr
What would be a good place to read the custom solr params I passed from the client to solr ? I saw that all the params passed to solr is available in rb.req. I have a business requirement to collapse or combine some properties together based on some conditions. Currently I have a custom component (added it as the first component in solrconfig), which reads the custom params from rb.req.getParams() and remove it from req and put it into context. I feel that probably custom component is not the best place and there could be a better place to do it. Does anyone have any suggestions ? -- View this message in context: http://lucene.472066.n3.nabble.com/pass-custom-parameters-from-client-to-solr-tp3987511p3988066.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Extract information from url field
Yes, using PatternTokenizerFactory. Here's an example field type that if you define a department field with this type and do a copyField from url to department, it will end up with the department name alone. It handles embedded punctuation (e.g., dot, dash, and underscore) and mixed case words (breaks into separate words.) It is text rather than string, so you can search on individual name words or a phrase. It also lower-cases the name, but you can skip that step fieldType name=pat_url_department_text class=solr.TextField sortMissingLast=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=://[^/]*/([^/]*)/ group=1/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType -- Jack Krupansky -Original Message- From: AlessandroF Sent: Wednesday, June 06, 2012 2:57 AM To: solr-user@lucene.apache.org Subject: Extract information from url field Hi All, I would like to know if it's possible to set up a field where Solr, after posting a document, automatically extracts part of the content as a result of a regexp to field. e.g. Having an URL field containing http://www.myCompany.Com/Department/Service/index.html congifured as field name=url type=url stored=true indexed=true required=true/ after posting It should be splitted like : doc str name=urlhttp://www.myCompany.Com/Department/Service/index.html/str str name=departmentDepartment/str /doc Thanks for helping! Alessandro -- View this message in context: http://lucene.472066.n3.nabble.com/Extract-information-from-url-field-tp3987913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtendedDisMax Question - Strange behaviour
First, it appears that you are using the dismax query parser, not the extended dismax (edismax) query parser. My hunch is that some of those fields may be non-tokenized string fields in which one or more of your search keywords do appear but not as the full string value or maybe with a different case than in the query. But when you do a copyField from a string field to a tokenized text field those strings would be broken up into individual keywords and probably lowercased. So, it will be easier for a document to match the combined text field than the source string fields. A fair percentage of the terms may occur in both text and string fields, but it looks like a fair percentage may occur only in the string fields. Identify a specific document that is returned by the first query and not the second. Then examine each non-text string field value of that document to see if the query terms would match after text field analysis but are not exact string matches for the string fields in which the terms do occur. -- Jack Krupansky -Original Message- From: André Maldonado Sent: Wednesday, June 06, 2012 9:23 AM To: solr-user@lucene.apache.org Subject: Re: ExtendedDisMax Question - Strange behaviour Erick, thanks for your reply and sorry for the confusion in last e-mail. But it is hard to explain the situation without that bunch of code. ...
Re: Is FileFloatSource's WeakHashMap cache only cleaned by GC?
Thanks for the suggestion, Erick. I created a JIRA and moved the patch to SVN, just to be safe. [1] --Gregg [1] https://issues.apache.org/jira/browse/SOLR-3514 On Wed, Jun 6, 2012 at 2:35 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, it would be better to open a Solr JIRA and attach this as a patch. Although we've had some folks provide a Git-based rather than an SVN-based patch. Anyone can open a JIRA, but you must create a signon to do that. It'd get more attention that way Best Erick On Tue, Jun 5, 2012 at 2:19 PM, Gregg Donovan gregg...@gmail.com wrote: We've encountered GC spikes at Etsy after adding new ExternalFileFields a decent number of times. I was always a little confused by this behavior -- isn't it just one big float[]? why does that cause problems for the GC? -- but looking at the FileFloatSource code a little more carefully, I wonder if this is due to using a WeakHashMap that is only cleaned by GC or manual invocation of a request handler. FileFloatSource stores a WeakHashMap containing IndexReader,float[] or CreationPlaceholder. In the code[1], it mentions that the implementation is modeled after the FieldCache implementation. However, the FieldCacheImpl adds listeners for IndexReader close events and uses those to purge its caches. [2] Should we be doing the same in FileFloatSource? Here's a mostly untested patch[3] with a possible implementation. There are probably better ways to do it (e.g. I don't love using another WeakHashMap), but I found it tough to hook into the IndexReader lifecycle without a) relying on classes other than FileFloatSource b) changing the public API of FIleFloatSource or c) changing the implementation too much. There is a RequestHandler inside of FileFloatSource (ReloadCacheRequestHandler) that can be used to clear the cache entirely[4], but this is sub-optimal for us for a few reasons: --It clears the entire cache. ExternalFileFields often take some non-trivial time to load and we prefer to do so during SolrCore warmups. Clearing the entire cache while serving traffic would likely cause user-facing requests to timeout. --It forces an extra commit with its consequent cache cycling, etc.. I'm thinking of ways to monitor the size of FileFloatSource's cache to track its size against GC pause times, but it seems tricky because even calling WeakHashMap#size() has side-effects. Any ideas? Overall, what do you think? Does relying on GC to clean this cache make sense as a possible cause of GC spikiness? If so, does the patch [3] look like a decent approach? Thanks! --Gregg [1] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L135 [2] https://github.com/apache/lucene-solr/blob/1c0eee5c5cdfddcc715369dad9d35c81027bddca/lucene/core/src/java/org/apache/lucene/search/FieldCacheImpl.java#L166 [3] https://gist.github.com/2876371 [4] https://github.com/apache/lucene-solr/blob/a3914cb5c0243913b827762db2d616ad7cc6801d/solr/core/src/java/org/apache/solr/search/function/FileFloatSource.java#L310
Re: Solr, I have perfomance problem for indexing.
Each table has 35,000 rows. (35 thousands). I will check the log for each step of indexing. I run Solr 3.5. 2012/6/6 Jihyun Suh jhsuh.ourli...@gmail.com I have 128 tables of mysql 5.x and each table have 3,5000 rows. When I start dataimport(indexing) in Solr, it takes 5 minutes for one table. But When Solr indexs 20th table, it takes around 10 minutes for one table. And then When it indexs 40th table, it takes around 20 minutes for one table. Solr has some performance problem for too many documents? Should I set some configuration?
Question on addBean and deleteByQuery
When using SolrJ (1.4.1 or 3.5.0) and calling either addBean or deleteByQuery, the POST body has numbers before and after the XML (47 and 0 as noted in the example below): *** POST /solr/123456/update?wt=xmlversion=2.2 HTTP/1.1 User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0 Host: localhost Transfer-Encoding: chunked Content-Type: application/xml; charset=UTF-8 47 deletequeryname:fred AND currency:USD/query/delete 0 *** Due to the way our servers are setup, we get an error and we think it is due to these numbers being in the body of the request. What do these numbers mean and is there any way to get rid of them or do we need to make some changes to our server configs? Thanks, Darin -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-addBean-and-deleteByQuery-tp3988107.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use of Soundex in solr spellchecker
Metaphone and DoubleMetaphone are more advanced that Soundex, and they already exist as filters. There is no independent measure of accuracy for Solr- you have to decide if you like the results. On Wed, Jun 6, 2012 at 4:36 AM, nutchsolruser nutchsolru...@gmail.com wrote: Does incorporating soundex algorithm into solr improve spellchecker accuracy? (If yes then please provide useful pointers for doing this ) -- View this message in context: http://lucene.472066.n3.nabble.com/Use-of-Soundex-in-solr-spellchecker-tp3987968.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com