Re: Problem with Query Parser

2009-10-18 Thread Lance Norskog
Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some
languages.

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
 wrote:
> Thanks Ahmet. Definitely using analyzer appears the english porter as
> the killer ;)
> Regards
> German
>
> On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN  wrote:
>>
>>> Hi everybody
>>>
>>> I have a simple but (for me) annoying problem. I'm happy
>>> user of Solr
>>> 1.4 with a small collection of documents. Today one of the
>>> users has
>>> reported that a query returns documents that are
>>> non-pertinent to the
>>> expression. I have spanish, portuguese and english text
>>> inside the
>>> collection. Using the Solr administration interface I've
>>> found that
>>> she was right, if I search for the spanish term
>>> "represion", I found
>>> just only the word root, I mean it returns every document
>>> with the
>>> term "repres". Using the admin-debug search I found this:
>>>
>>>
>>> 
>>> >> name="rawquerystring">description:represion
>>> >> name="querystring">description:represion
>>> >> name="parsedquery">description:repres
>>> >> name="parsedquery_toString">description:repres
>>>
>>> the "ion" part of the term was deleted by the query parser.
>>> The first
>>> question is: I don´t know now where should I see to
>>> correct this, at
>>> the schema.xml or at the solrconfig.xml.
>>
>>> The only thing that is suspicious to me is the
>>> EnglishPorter.
>>
>> Yes you are right. "ion" part of the term was deleted by it. You can verify 
>> this using /admin/analysis.jsp page. It will tell you which 
>> TokenFilterFactory removes it.
>>
>>> I've deleted from the configuration but nothing changes. Should
>>> I reindex the collection to see the changes?
>>
>> Yes re-index is necessary.
>>
>>> Should I delete also from the index section?
>>
>> You should remove English porter from both query and index analyzer.
>>
>>> What I will loose deleting English porter?
>>
>> You will lose stemming functionality. But since you have spanish, portuguese 
>> and english documents using English porter for all the documents is not 
>> meaningful.
>>
>>
>>
>>
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Problem with Query Parser

2009-10-18 Thread Germán Biozzoli
Thanks Ahmet. Definitely using analyzer appears the english porter as
the killer ;)
Regards
German

On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN  wrote:
>
>> Hi everybody
>>
>> I have a simple but (for me) annoying problem. I'm happy
>> user of Solr
>> 1.4 with a small collection of documents. Today one of the
>> users has
>> reported that a query returns documents that are
>> non-pertinent to the
>> expression. I have spanish, portuguese and english text
>> inside the
>> collection. Using the Solr administration interface I've
>> found that
>> she was right, if I search for the spanish term
>> "represion", I found
>> just only the word root, I mean it returns every document
>> with the
>> term "repres". Using the admin-debug search I found this:
>>
>>
>> 
>> > name="rawquerystring">description:represion
>> > name="querystring">description:represion
>> > name="parsedquery">description:repres
>> > name="parsedquery_toString">description:repres
>>
>> the "ion" part of the term was deleted by the query parser.
>> The first
>> question is: I don´t know now where should I see to
>> correct this, at
>> the schema.xml or at the solrconfig.xml.
>
>> The only thing that is suspicious to me is the
>> EnglishPorter.
>
> Yes you are right. "ion" part of the term was deleted by it. You can verify 
> this using /admin/analysis.jsp page. It will tell you which 
> TokenFilterFactory removes it.
>
>> I've deleted from the configuration but nothing changes. Should
>> I reindex the collection to see the changes?
>
> Yes re-index is necessary.
>
>> Should I delete also from the index section?
>
> You should remove English porter from both query and index analyzer.
>
>> What I will loose deleting English porter?
>
> You will lose stemming functionality. But since you have spanish, portuguese 
> and english documents using English porter for all the documents is not 
> meaningful.
>
>
>
>
>


Re: Problem with Query Parser

2009-10-18 Thread AHMET ARSLAN

> Hi everybody
> 
> I have a simple but (for me) annoying problem. I'm happy
> user of Solr
> 1.4 with a small collection of documents. Today one of the
> users has
> reported that a query returns documents that are
> non-pertinent to the
> expression. I have spanish, portuguese and english text
> inside the
> collection. Using the Solr administration interface I've
> found that
> she was right, if I search for the spanish term
> "represion", I found
> just only the word root, I mean it returns every document
> with the
> term "repres". Using the admin-debug search I found this:
> 
> 
> 
>  name="rawquerystring">description:represion
>  name="querystring">description:represion
>  name="parsedquery">description:repres
>  name="parsedquery_toString">description:repres
> 
> the "ion" part of the term was deleted by the query parser.
> The first
> question is: I don´t know now where should I see to
> correct this, at
> the schema.xml or at the solrconfig.xml.

> The only thing that is suspicious to me is the
> EnglishPorter. 

Yes you are right. "ion" part of the term was deleted by it. You can verify 
this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory 
removes it.

> I've deleted from the configuration but nothing changes. Should
> I reindex the collection to see the changes? 

Yes re-index is necessary.

> Should I delete also from the index section? 

You should remove English porter from both query and index analyzer.

> What I will loose deleting English porter?

You will lose stemming functionality. But since you have spanish, portuguese 
and english documents using English porter for all the documents is not 
meaningful. 






Problem with Query Parser

2009-10-17 Thread Germán Biozzoli
Hi everybody

I have a simple but (for me) annoying problem. I'm happy user of Solr
1.4 with a small collection of documents. Today one of the users has
reported that a query returns documents that are non-pertinent to the
expression. I have spanish, portuguese and english text inside the
collection. Using the Solr administration interface I've found that
she was right, if I search for the spanish term "represion", I found
just only the word root, I mean it returns every document with the
term "repres". Using the admin-debug search I found this:



description:represion
description:represion
description:repres
description:repres

the "ion" part of the term was deleted by the query parser. The first
question is: I don´t know now where should I see to correct this, at
the schema.xml or at the solrconfig.xml.

At schema, description is



and text is:


  







  

  








  



The only thing that is suspicious to me is the EnglishPorter. I've
deleted from the configuration but nothing changes. Should I reindex
the collection to see the changes? Should I delete also from the index
section? What I will loose deleting English porter?

Thanks a lot for the help
German


Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh
Thanks Yonik!

Cheers
Avlesh

On Tue, Jun 16, 2009 at 7:25 PM, Yonik Seeley wrote:

> On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singh wrote:
> > Can someone explain this?
> > +myField:"\*" +city:Mumbai gives me all results for +city:Mumbai
> >
> > myField is a regular text field and "*" is not a stopword.
>
> * and other non alphanumerics are probably being dropped by
> WordDelimiterFilter.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Problem with Query Parser?

2009-06-16 Thread Yonik Seeley
On Tue, Jun 16, 2009 at 8:33 AM, Avlesh Singh wrote:
> Can someone explain this?
> +myField:"\*" +city:Mumbai gives me all results for +city:Mumbai
>
> myField is a regular text field and "*" is not a stopword.

* and other non alphanumerics are probably being dropped by WordDelimiterFilter.

-Yonik
http://www.lucidimagination.com


Re: Problem with Query Parser?

2009-06-16 Thread Avlesh Singh
Can someone explain this?
+myField:"\*" +city:Mumbai gives me all results for +city:Mumbai

myField is a regular text field and "*" is not a stopword.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 10:26 AM, Yonik Seeley
wrote:

> On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singh wrote:
> >>
> >> Probably the analyzer removed the "$", leaving an empty term and causing
> >> the clause to be removed altogether.
> >>
> >
> > I predicted this behavior while writing the mail yesterday, Yonik.
> > Does it sound logical and intuitive?
>
> It's intuitive in some circumstances, and not in others.  It's
> certainly not intuitive in this particular case.  I think there's
> another JIRA issue already open for this somewhere.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Problem with Query Parser?

2009-06-15 Thread Yonik Seeley
On Tue, Jun 16, 2009 at 12:28 AM, Avlesh Singh wrote:
>>
>> Probably the analyzer removed the "$", leaving an empty term and causing
>> the clause to be removed altogether.
>>
>
> I predicted this behavior while writing the mail yesterday, Yonik.
> Does it sound logical and intuitive?

It's intuitive in some circumstances, and not in others.  It's
certainly not intuitive in this particular case.  I think there's
another JIRA issue already open for this somewhere.

-Yonik
http://www.lucidimagination.com


Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
>
> Maybe you can use this method directly or at least mimic it in your
> application:
> ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
> static String escapeQueryChars(String s)
>

Does not help either, Otis.
(+myField:"$" +city:Mumbai) at best could get converted into (+myField:"\\$"
+city:Mumbai)
Output remains the same: all results rather than expected no results.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hi,
>
> It looks like the query parser is doing its job of removing certain
> characters from the query string.
>
> Maybe you can use this method directly or at least mimic it in your
> application:
>
> ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
> static String escapeQueryChars(String s) {
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Avlesh Singh 
> > To: solr-user@lucene.apache.org
> > Sent: Monday, June 15, 2009 8:06:03 AM
> > Subject: Problem with Query Parser?
> >
> > I noticed a strange behavior of the Query parser for the following query
> on
> > my index.
> > +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1
> > Fields, category_name, product_name and brand_name are of type "text" and
> > is_available is a "string" field, storing 0 or 1 for each doc in the
> index.
> >
> > When I perform the query: *+(category_name:"$" product_name:"$"
> > brand_name:"$")*, i get no results (which is as expected);
> > However, when I perform the query: *+(category_name:"$" product_name:"$"
> > brand_name:"$") +is_available:1*, I get results for all is_available=1.
> This
> > is unexpected and undesired, the first half of the query is simply
> ignored.
> >
> > I have noticed this behaviour for pretty much all the special characters:
> $,
> > ^, * etc ... I am using the default text field analyzer.
> > Am I missing something or is this a known bug in Solr?
> >
> > Cheers
> > Avlesh
>
>


Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
And here's the debug info:
+myField:"$" +city:Mumbai
+myField:"$" +city:Mumbai
+city:Mumbai
+city:Mumbai
OldLuceneQParser

I found this unintuitive. "No results" rather than "All results" was the
expected behavior.

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:58 AM, Avlesh Singh  wrote:

> Probably the analyzer removed the "$", leaving an empty term and causing
>> the clause to be removed altogether.
>>
>
> I predicted this behavior while writing the mail yesterday, Yonik.
> Does it sound logical and intuitive?
>
> Cheers
> Avlesh
>
>
> On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley 
> wrote:
>
>> On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote:
>> > How does one explain this?
>> > +myField:"$" give zero result
>> > +myField:"$" +city:"Mumbai" gives result for city:"Mumbai"
>>
>> Probably the analyzer removed the "$", leaving an empty term and
>> causing the clause to be removed altogether.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>


Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
>
> Probably the analyzer removed the "$", leaving an empty term and causing
> the clause to be removed altogether.
>

I predicted this behavior while writing the mail yesterday, Yonik.
Does it sound logical and intuitive?

Cheers
Avlesh

On Tue, Jun 16, 2009 at 9:42 AM, Yonik Seeley wrote:

> On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote:
> > How does one explain this?
> > +myField:"$" give zero result
> > +myField:"$" +city:"Mumbai" gives result for city:"Mumbai"
>
> Probably the analyzer removed the "$", leaving an empty term and
> causing the clause to be removed altogether.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Problem with Query Parser?

2009-06-15 Thread Yonik Seeley
On Mon, Jun 15, 2009 at 11:53 PM, Avlesh Singh wrote:
> How does one explain this?
> +myField:"$" give zero result
> +myField:"$" +city:"Mumbai" gives result for city:"Mumbai"

Probably the analyzer removed the "$", leaving an empty term and
causing the clause to be removed altogether.

-Yonik
http://www.lucidimagination.com


Re: Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
How does one explain this?
+myField:"$" give zero result
+myField:"$" +city:"Mumbai" gives result for city:"Mumbai"

Cheers
Avlesh

On Tue, Jun 16, 2009 at 12:50 AM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hi,
>
> It looks like the query parser is doing its job of removing certain
> characters from the query string.
>
> Maybe you can use this method directly or at least mimic it in your
> application:
>
> ./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public
> static String escapeQueryChars(String s) {
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: Avlesh Singh 
> > To: solr-user@lucene.apache.org
> > Sent: Monday, June 15, 2009 8:06:03 AM
> > Subject: Problem with Query Parser?
> >
> > I noticed a strange behavior of the Query parser for the following query
> on
> > my index.
> > +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1
> > Fields, category_name, product_name and brand_name are of type "text" and
> > is_available is a "string" field, storing 0 or 1 for each doc in the
> index.
> >
> > When I perform the query: *+(category_name:"$" product_name:"$"
> > brand_name:"$")*, i get no results (which is as expected);
> > However, when I perform the query: *+(category_name:"$" product_name:"$"
> > brand_name:"$") +is_available:1*, I get results for all is_available=1.
> This
> > is unexpected and undesired, the first half of the query is simply
> ignored.
> >
> > I have noticed this behaviour for pretty much all the special characters:
> $,
> > ^, * etc ... I am using the default text field analyzer.
> > Am I missing something or is this a known bug in Solr?
> >
> > Cheers
> > Avlesh
>
>


Re: Problem with Query Parser?

2009-06-15 Thread Otis Gospodnetic

Hi,

It looks like the query parser is doing its job of removing certain characters 
from the query string.

Maybe you can use this method directly or at least mimic it in your application:

./src/solrj/org/apache/solr/client/solrj/util/ClientUtils.java:  public static 
String escapeQueryChars(String s) {


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Avlesh Singh 
> To: solr-user@lucene.apache.org
> Sent: Monday, June 15, 2009 8:06:03 AM
> Subject: Problem with Query Parser?
> 
> I noticed a strange behavior of the Query parser for the following query on
> my index.
> +(category_name:"$" product_name:"$" brand_name:"$") +is_available:1
> Fields, category_name, product_name and brand_name are of type "text" and
> is_available is a "string" field, storing 0 or 1 for each doc in the index.
> 
> When I perform the query: *+(category_name:"$" product_name:"$"
> brand_name:"$")*, i get no results (which is as expected);
> However, when I perform the query: *+(category_name:"$" product_name:"$"
> brand_name:"$") +is_available:1*, I get results for all is_available=1. This
> is unexpected and undesired, the first half of the query is simply ignored.
> 
> I have noticed this behaviour for pretty much all the special characters: $,
> ^, * etc ... I am using the default text field analyzer.
> Am I missing something or is this a known bug in Solr?
> 
> Cheers
> Avlesh



Problem with Query Parser?

2009-06-15 Thread Avlesh Singh
I noticed a strange behavior of the Query parser for the following query on
my index.
+(category_name:"$" product_name:"$" brand_name:"$") +is_available:1
Fields, category_name, product_name and brand_name are of type "text" and
is_available is a "string" field, storing 0 or 1 for each doc in the index.

When I perform the query: *+(category_name:"$" product_name:"$"
brand_name:"$")*, i get no results (which is as expected);
However, when I perform the query: *+(category_name:"$" product_name:"$"
brand_name:"$") +is_available:1*, I get results for all is_available=1. This
is unexpected and undesired, the first half of the query is simply ignored.

I have noticed this behaviour for pretty much all the special characters: $,
^, * etc ... I am using the default text field analyzer.
Am I missing something or is this a known bug in Solr?

Cheers
Avlesh