Re: Porter Stem filter and employing

2019-03-07 Thread Erick Erickson
The easiest way to think about it is that the “mm” parameter is a sliding scale 
between
pure OR and pure AND, i.e. any clause that matches returns the doc (0) and all 
clauses
must be matched (100) for the doc to be returned…

But no, I don’t know of any other explanation pages for that parameter.

Best,
Erick

> On Mar 7, 2019, at 1:37 AM, Marisol Redondo 
>  wrote:
> 
> Following Erik idea, I started to look in different fields or queries than
> the title field itself, and I started using the normal requesthandler
> (/select) and adding parameters to see if any of the parameters in my query
> make this problem.
> And I discovered that in my customize RequestHandler I'm using
> deftype=edixmax and mm=100% (and other params), when I remove the param mm,
> I get the documents.
> 
> I have been looking for information about this parameter and I've only
> found one page in solr
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html.
> Is there any other documentation that can help me to understand how this
> parameter works, I don't want to break all the searches removing that.
> 
> Thanks for all your help
> 
> 
> On Mon, 4 Mar 2019 at 17:11, Erick Erickson  wrote:
> 
>> First, if you _changed_ the analysis chain without re-indexing all
>> documents, that could account for it.
>> 
>> Second, the analysis page is a little tricky. It _assumes_ that the words
>> you put in the boxes have been parsed into the field you select. So let’s
>> say you have this field “title” that has stemming turned on. Let’s further
>> assume your default search field is “text” (this is configured in
>> solrconfig.xml, the “df” parameter in your request handler).
>> 
>> Now, if your search is "q=employ” the actual search will be against your
>> default field, as though you had entered “q=text:employ”. This is a common
>> problem, adding "=query" to the search and looking at the result
>> parsed_query.toString() will show you what’s actually the result of the
>> query parsing and may help.
>> 
>> Best,
>> Erick
>> 
>>> On Mar 4, 2019, at 3:13 AM, Marisol Redondo <
>> marisol.redondo.gar...@gmail.com> wrote:
>>> 
>>> Thank you for the answer and heading me to this solution. But I've
>> already
>>> used this filter for index analysis and I'm not getting any result. So I
>>> don't understand why I'm not getting the result.
>>> If I use the Analysis tool, I'm gettin
>>> So, maybe the problem is other? But I don't see what can be the problem,
>>> because, when using the Analysis took I got the same result for index and
>>> query: (the entry to this filter was employing carer)
>>> 
>>> *PSF (Index)*
>>> 
>>> *PSF (query)*
>>> 
>>> text
>>> 
>>> emploi
>>> 
>>> carer
>>> 
>>> text
>>> 
>>> emploi
>>> 
>>> carer
>>> 
>>> raw_bytes
>>> 
>>> [65 6d 70 6c 6f 69]
>>> 
>>> [63 61 72 65 72]
>>> 
>>> raw_bytes
>>> 
>>> [65 6d 70 6c 6f 69]
>>> 
>>> [63 61 72 65 72]
>>> 
>>> start
>>> 
>>> 0
>>> 
>>> 12
>>> 
>>> start
>>> 
>>> 0
>>> 
>>> 12
>>> 
>>> end
>>> 
>>> 9
>>> 
>>> 17
>>> 
>>> end
>>> 
>>> 9
>>> 
>>> 17
>>> 
>>> positionLength
>>> 
>>> 1
>>> 
>>> 1
>>> 
>>> positionLength
>>> 
>>> 1
>>> 
>>> 1
>>> 
>>> type
>>> 
>>> 
>>> 
>>> 
>>> 
>>> type
>>> 
>>> 
>>> 
>>> 
>>> 
>>> position
>>> 
>>> 1
>>> 
>>> 3
>>> 
>>> position
>>> 
>>> 1
>>> 
>>> 3
>>> 
>>> keyword
>>> 
>>> FALSE
>>> 
>>> FALSE
>>> 
>>> keyword
>>> 
>>> FALSE
>>> 
>>> FALSE
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, 1 Mar 2019 at 15:51, Shawn Heisey  wrote:
>>> 
 On 3/1/2019 4:38 AM, Marisol Redondo wrote:
> When using the PorterStemFilter, I saw that the work "employing" is
 change
> to "emploi" and my document is not found in the query to solr because
>> of
> that.
> 
> This also happens with other words that finish in -ying as annoying or
> deploying.
> 
> It there any path for this filter or should I create a new Jira issue?
 
 
 When you are using a stemming filter, you will need to use the same
 filter on both the query analysis and the index analysis, so that
 similar words are stemmed to the same root in both cases, leading to
 matches.
 
 If the other steps in your analysis chain are changing the words so that
 the stemming filter cannot recognize the word, that might also cause
 problems.
 
 Thanks,
 Shawn
 
>> 
>> 



Re: Porter Stem filter and employing

2019-03-07 Thread Marisol Redondo
Following Erik idea, I started to look in different fields or queries than
the title field itself, and I started using the normal requesthandler
(/select) and adding parameters to see if any of the parameters in my query
make this problem.
And I discovered that in my customize RequestHandler I'm using
deftype=edixmax and mm=100% (and other params), when I remove the param mm,
I get the documents.

I have been looking for information about this parameter and I've only
found one page in solr
https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html.
Is there any other documentation that can help me to understand how this
parameter works, I don't want to break all the searches removing that.

Thanks for all your help


On Mon, 4 Mar 2019 at 17:11, Erick Erickson  wrote:

> First, if you _changed_ the analysis chain without re-indexing all
> documents, that could account for it.
>
> Second, the analysis page is a little tricky. It _assumes_ that the words
> you put in the boxes have been parsed into the field you select. So let’s
> say you have this field “title” that has stemming turned on. Let’s further
> assume your default search field is “text” (this is configured in
> solrconfig.xml, the “df” parameter in your request handler).
>
> Now, if your search is "q=employ” the actual search will be against your
> default field, as though you had entered “q=text:employ”. This is a common
> problem, adding "=query" to the search and looking at the result
> parsed_query.toString() will show you what’s actually the result of the
> query parsing and may help.
>
> Best,
> Erick
>
> > On Mar 4, 2019, at 3:13 AM, Marisol Redondo <
> marisol.redondo.gar...@gmail.com> wrote:
> >
> > Thank you for the answer and heading me to this solution. But I've
> already
> > used this filter for index analysis and I'm not getting any result. So I
> > don't understand why I'm not getting the result.
> > If I use the Analysis tool, I'm gettin
> > So, maybe the problem is other? But I don't see what can be the problem,
> > because, when using the Analysis took I got the same result for index and
> > query: (the entry to this filter was employing carer)
> >
> > *PSF (Index)*
> >
> > *PSF (query)*
> >
> > text
> >
> > emploi
> >
> > carer
> >
> > text
> >
> > emploi
> >
> > carer
> >
> > raw_bytes
> >
> > [65 6d 70 6c 6f 69]
> >
> > [63 61 72 65 72]
> >
> > raw_bytes
> >
> > [65 6d 70 6c 6f 69]
> >
> > [63 61 72 65 72]
> >
> > start
> >
> > 0
> >
> > 12
> >
> > start
> >
> > 0
> >
> > 12
> >
> > end
> >
> > 9
> >
> > 17
> >
> > end
> >
> > 9
> >
> > 17
> >
> > positionLength
> >
> > 1
> >
> > 1
> >
> > positionLength
> >
> > 1
> >
> > 1
> >
> > type
> >
> > 
> >
> > 
> >
> > type
> >
> > 
> >
> > 
> >
> > position
> >
> > 1
> >
> > 3
> >
> > position
> >
> > 1
> >
> > 3
> >
> > keyword
> >
> > FALSE
> >
> > FALSE
> >
> > keyword
> >
> > FALSE
> >
> > FALSE
> >
> >
> >
> >
> >
> >
> >
> >
> > On Fri, 1 Mar 2019 at 15:51, Shawn Heisey  wrote:
> >
> >> On 3/1/2019 4:38 AM, Marisol Redondo wrote:
> >>> When using the PorterStemFilter, I saw that the work "employing" is
> >> change
> >>> to "emploi" and my document is not found in the query to solr because
> of
> >>> that.
> >>>
> >>> This also happens with other words that finish in -ying as annoying or
> >>> deploying.
> >>>
> >>> It there any path for this filter or should I create a new Jira issue?
> >>
> >>
> >> When you are using a stemming filter, you will need to use the same
> >> filter on both the query analysis and the index analysis, so that
> >> similar words are stemmed to the same root in both cases, leading to
> >> matches.
> >>
> >> If the other steps in your analysis chain are changing the words so that
> >> the stemming filter cannot recognize the word, that might also cause
> >> problems.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: Porter Stem filter and employing

2019-03-04 Thread Erick Erickson
First, if you _changed_ the analysis chain without re-indexing all documents, 
that could account for it.

Second, the analysis page is a little tricky. It _assumes_ that the words you 
put in the boxes have been parsed into the field you select. So let’s say you 
have this field “title” that has stemming turned on. Let’s further assume your 
default search field is “text” (this is configured in solrconfig.xml, the “df” 
parameter in your request handler).

Now, if your search is "q=employ” the actual search will be against your 
default field, as though you had entered “q=text:employ”. This is a common 
problem, adding "=query" to the search and looking at the result 
parsed_query.toString() will show you what’s actually the result of the query 
parsing and may help.

Best,
Erick

> On Mar 4, 2019, at 3:13 AM, Marisol Redondo 
>  wrote:
> 
> Thank you for the answer and heading me to this solution. But I've already
> used this filter for index analysis and I'm not getting any result. So I
> don't understand why I'm not getting the result.
> If I use the Analysis tool, I'm gettin
> So, maybe the problem is other? But I don't see what can be the problem,
> because, when using the Analysis took I got the same result for index and
> query: (the entry to this filter was employing carer)
> 
> *PSF (Index)*
> 
> *PSF (query)*
> 
> text
> 
> emploi
> 
> carer
> 
> text
> 
> emploi
> 
> carer
> 
> raw_bytes
> 
> [65 6d 70 6c 6f 69]
> 
> [63 61 72 65 72]
> 
> raw_bytes
> 
> [65 6d 70 6c 6f 69]
> 
> [63 61 72 65 72]
> 
> start
> 
> 0
> 
> 12
> 
> start
> 
> 0
> 
> 12
> 
> end
> 
> 9
> 
> 17
> 
> end
> 
> 9
> 
> 17
> 
> positionLength
> 
> 1
> 
> 1
> 
> positionLength
> 
> 1
> 
> 1
> 
> type
> 
> 
> 
> 
> 
> type
> 
> 
> 
> 
> 
> position
> 
> 1
> 
> 3
> 
> position
> 
> 1
> 
> 3
> 
> keyword
> 
> FALSE
> 
> FALSE
> 
> keyword
> 
> FALSE
> 
> FALSE
> 
> 
> 
> 
> 
> 
> 
> 
> On Fri, 1 Mar 2019 at 15:51, Shawn Heisey  wrote:
> 
>> On 3/1/2019 4:38 AM, Marisol Redondo wrote:
>>> When using the PorterStemFilter, I saw that the work "employing" is
>> change
>>> to "emploi" and my document is not found in the query to solr because of
>>> that.
>>> 
>>> This also happens with other words that finish in -ying as annoying or
>>> deploying.
>>> 
>>> It there any path for this filter or should I create a new Jira issue?
>> 
>> 
>> When you are using a stemming filter, you will need to use the same
>> filter on both the query analysis and the index analysis, so that
>> similar words are stemmed to the same root in both cases, leading to
>> matches.
>> 
>> If the other steps in your analysis chain are changing the words so that
>> the stemming filter cannot recognize the word, that might also cause
>> problems.
>> 
>> Thanks,
>> Shawn
>> 



Re: Porter Stem filter and employing

2019-03-04 Thread Marisol Redondo
Thank you for the answer and heading me to this solution. But I've already
used this filter for index analysis and I'm not getting any result. So I
don't understand why I'm not getting the result.
If I use the Analysis tool, I'm gettin
So, maybe the problem is other? But I don't see what can be the problem,
because, when using the Analysis took I got the same result for index and
query: (the entry to this filter was employing carer)

*PSF (Index)*

*PSF (query)*

text

emploi

carer

text

emploi

carer

raw_bytes

[65 6d 70 6c 6f 69]

[63 61 72 65 72]

raw_bytes

[65 6d 70 6c 6f 69]

[63 61 72 65 72]

start

0

12

start

0

12

end

9

17

end

9

17

positionLength

1

1

positionLength

1

1

type





type





position

1

3

position

1

3

keyword

FALSE

FALSE

keyword

FALSE

FALSE








On Fri, 1 Mar 2019 at 15:51, Shawn Heisey  wrote:

> On 3/1/2019 4:38 AM, Marisol Redondo wrote:
> > When using the PorterStemFilter, I saw that the work "employing" is
> change
> > to "emploi" and my document is not found in the query to solr because of
> > that.
> >
> > This also happens with other words that finish in -ying as annoying or
> > deploying.
> >
> > It there any path for this filter or should I create a new Jira issue?
>
>
> When you are using a stemming filter, you will need to use the same
> filter on both the query analysis and the index analysis, so that
> similar words are stemmed to the same root in both cases, leading to
> matches.
>
> If the other steps in your analysis chain are changing the words so that
> the stemming filter cannot recognize the word, that might also cause
> problems.
>
> Thanks,
> Shawn
>


Re: Porter Stem filter and employing

2019-03-01 Thread Shawn Heisey

On 3/1/2019 4:38 AM, Marisol Redondo wrote:

When using the PorterStemFilter, I saw that the work "employing" is change
to "emploi" and my document is not found in the query to solr because of
that.

This also happens with other words that finish in -ying as annoying or
deploying.

It there any path for this filter or should I create a new Jira issue?



When you are using a stemming filter, you will need to use the same 
filter on both the query analysis and the index analysis, so that 
similar words are stemmed to the same root in both cases, leading to 
matches.


If the other steps in your analysis chain are changing the words so that 
the stemming filter cannot recognize the word, that might also cause 
problems.


Thanks,
Shawn


Porter Stem filter and employing

2019-03-01 Thread Marisol Redondo
Hi.

When using the PorterStemFilter, I saw that the work "employing" is change
to "emploi" and my document is not found in the query to solr because of
that.

This also happens with other words that finish in -ying as annoying or
deploying.

It there any path for this filter or should I create a new Jira issue?

Thanks