Re: Problem with Query Parser

2009-10-18 Thread AHMET ARSLAN

 Hi everybody
 
 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:
 
 
 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str
 
 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter. 

Yes you are right. ion part of the term was deleted by it. You can verify 
this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory 
removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes? 

Yes re-index is necessary.

 Should I delete also from the index section? 

You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

You will lose stemming functionality. But since you have spanish, portuguese 
and english documents using English porter for all the documents is not 
meaningful. 






Re: Problem with Query Parser

2009-10-18 Thread Germán Biozzoli
Thanks Ahmet. Definitely using analyzer appears the english porter as
the killer ;)
Regards
German

On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.







Re: Problem with Query Parser

2009-10-18 Thread Lance Norskog
Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some
languages.

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
germanbiozz...@gmail.com wrote:
 Thanks Ahmet. Definitely using analyzer appears the english porter as
 the killer ;)
 Regards
 German

 On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hi everybody

 I have a simple but (for me) annoying problem. I'm happy
 user of Solr
 1.4 with a small collection of documents. Today one of the
 users has
 reported that a query returns documents that are
 non-pertinent to the
 expression. I have spanish, portuguese and english text
 inside the
 collection. Using the Solr administration interface I've
 found that
 she was right, if I search for the spanish term
 represion, I found
 just only the word root, I mean it returns every document
 with the
 term repres. Using the admin-debug search I found this:


 lst name=debug
 str
 name=rawquerystringdescription:represion/str
 str
 name=querystringdescription:represion/str
 str
 name=parsedquerydescription:repres/str
 str
 name=parsedquery_toStringdescription:repres/str

 the ion part of the term was deleted by the query parser.
 The first
 question is: I don´t know now where should I see to
 correct this, at
 the schema.xml or at the solrconfig.xml.

 The only thing that is suspicious to me is the
 EnglishPorter.

 Yes you are right. ion part of the term was deleted by it. You can verify 
 this using /admin/analysis.jsp page. It will tell you which 
 TokenFilterFactory removes it.

 I've deleted from the configuration but nothing changes. Should
 I reindex the collection to see the changes?

 Yes re-index is necessary.

 Should I delete also from the index section?

 You should remove English porter from both query and index analyzer.

 What I will loose deleting English porter?

 You will lose stemming functionality. But since you have spanish, portuguese 
 and english documents using English porter for all the documents is not 
 meaningful.









-- 
Lance Norskog
goks...@gmail.com


Seattle / NW Hadoop, Lucene, Apache Cloud Stack Meetup, Wed Oct 28 6:45pm

2009-10-18 Thread Bradford Stephens
Greetings,

(You're receiving this e-mail because you're on a DL or I think you'd
be interested)

It's time for another Hadoop/Lucene/Apache Cloud stack meetup! This
month it'll be on Wednesday, the 28th, at 6:45 pm.

A *huge* thanks for everyone who showed up last month, and to Facebook
for sending someone awesome to speak about Hive. We learned quite a
bit!

For October, we will have someone speaking about Cascading, and how it
helps workflow abstraction with MapReduce. Very useful stuff to know.

We've had great attendance in the past few months, let's keep it up!
I'm always amazed by the things I learn from everyone.

We're at the University of Washington, Allen Computer Science Center
(not Electrical Engineering)

Map: http://www.washington.edu/home/maps/?CSE

Room: 303 -or- the Entry level. If there are changes, signs will be posted.

More Info:

The meetup is about 2 hours (and there's usually food): we'll have two
in-depth talks, and then several lightning talks of 5 minutes. We'll
then have discussion and 'social time'. Let me know if you're
interested in speaking or attending. We'd like to focus on education,
so feel free to ask questions.

Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com

-- 
http://www.drawntoscaleconsulting.com - Scalability, Hadoop, HBase,
and Distributed Lucene Consulting

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Re: Solr 1.4 release candidate

2009-10-18 Thread Yonik Seeley
FYI, the latest nightly includes more lucene bug fixes targeted toward
Lucene 2.9.1
The (current) full list is here:
http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/CHANGES.txt?view=markuppathrev=826563

-Yonik
http://www.lucidimagination.com


On Wed, Oct 14, 2009 at 10:01 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 Folks, we've been in code freeze since Monday and a test release
 candidate was created yesterday, however it already had to be updated
 last night due to a serious bug found in Lucene.

 For now you can use the latest nightly build to get any recent changes
 like this:
 http://people.apache.org/builds/lucene/solr/nightly/

 We'll probably release the final bits next week, so in the meantime,
 download the latest nightly build and give it a spin!

 -Yonik
 http://www.lucidimagination.com



Re: Boosting of words

2009-10-18 Thread bhaskar chandrasekar
 
Hi Arslan,
 
Yes,I am using Solr as an input to carrot.
Yes,I am using org.carrot2.source.solr.SolrDocumentSource just to cluster 
search results.
Currently we are focusing to Solr search results only.
In future we will focuse to clustered search results.
Now i am using Solr 1.3.
 
Regards
Bhaskar
--- On Sat, 10/17/09, AHMET ARSLAN iori...@yahoo.com wrote:


From: AHMET ARSLAN iori...@yahoo.com
Subject: Re: Boosting of words
To: solr-user@lucene.apache.org
Date: Saturday, October 17, 2009, 1:55 PM


 I am using Solr 1.3.
 I access Solr through carrot and use Java.

What is the meaning of accessing solr through carrot?
Are you using solr as an input to carrot? Using 
org.carrot2.source.solr.SolrDocumentSource just to cluster search results?
Can we say that you are interested in clustered search results rather than 
search results them selfs? If yes solr 1.4 will have Grant Ingersoll's 
ClusteringComponent [1] which uses carrot2 to cluster search results.

[1] http://wiki.apache.org/solr/ClusteringComponent