Re: Problem with Query Parser
Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Re: Problem with Query Parser
Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Re: Problem with Query Parser
Another way to do multi-lingual indexing is to have a separate field for each language. Solr/Lucene have custom processing for some languages. On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli germanbiozz...@gmail.com wrote: Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful. -- Lance Norskog goks...@gmail.com
Seattle / NW Hadoop, Lucene, Apache Cloud Stack Meetup, Wed Oct 28 6:45pm
Greetings, (You're receiving this e-mail because you're on a DL or I think you'd be interested) It's time for another Hadoop/Lucene/Apache Cloud stack meetup! This month it'll be on Wednesday, the 28th, at 6:45 pm. A *huge* thanks for everyone who showed up last month, and to Facebook for sending someone awesome to speak about Hive. We learned quite a bit! For October, we will have someone speaking about Cascading, and how it helps workflow abstraction with MapReduce. Very useful stuff to know. We've had great attendance in the past few months, let's keep it up! I'm always amazed by the things I learn from everyone. We're at the University of Washington, Allen Computer Science Center (not Electrical Engineering) Map: http://www.washington.edu/home/maps/?CSE Room: 303 -or- the Entry level. If there are changes, signs will be posted. More Info: The meetup is about 2 hours (and there's usually food): we'll have two in-depth talks, and then several lightning talks of 5 minutes. We'll then have discussion and 'social time'. Let me know if you're interested in speaking or attending. We'd like to focus on education, so feel free to ask questions. Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com -- http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase, and Distributed Lucene Consulting http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science
Re: Solr 1.4 release candidate
FYI, the latest nightly includes more lucene bug fixes targeted toward Lucene 2.9.1 The (current) full list is here: http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/CHANGES.txt?view=markuppathrev=826563 -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 10:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: Folks, we've been in code freeze since Monday and a test release candidate was created yesterday, however it already had to be updated last night due to a serious bug found in Lucene. For now you can use the latest nightly build to get any recent changes like this: http://people.apache.org/builds/lucene/solr/nightly/ We'll probably release the final bits next week, so in the meantime, download the latest nightly build and give it a spin! -Yonik http://www.lucidimagination.com
Re: Boosting of words
Hi Arslan, Yes,I am using Solr as an input to carrot. Yes,I am using org.carrot2.source.solr.SolrDocumentSource just to cluster search results. Currently we are focusing to Solr search results only. In future we will focuse to clustered search results. Now i am using Solr 1.3. Regards Bhaskar --- On Sat, 10/17/09, AHMET ARSLAN iori...@yahoo.com wrote: From: AHMET ARSLAN iori...@yahoo.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Saturday, October 17, 2009, 1:55 PM I am using Solr 1.3. I access Solr through carrot and use Java. What is the meaning of accessing solr through carrot? Are you using solr as an input to carrot? Using org.carrot2.source.solr.SolrDocumentSource just to cluster search results? Can we say that you are interested in clustered search results rather than search results them selfs? If yes solr 1.4 will have Grant Ingersoll's ClusteringComponent [1] which uses carrot2 to cluster search results. [1] http://wiki.apache.org/solr/ClusteringComponent