Re: Protwords in solr spellchecker

2015-07-10 Thread davidphilip cherian
Hi Kamal,

Not necessarily. You can have different filters applied at index time and
query time. (note that the order in which filters are defined matters). You
could just add the stop filter at query time.
Have your own custom data type defined (similar to 'text_en' that will be
in schem.xml) and perhaps use standard/whitespace tokenizer followed by
stop filter at query time.

Tip: Use analysis tool that is available in solr admin page to further
understand the analysis chain of data types.

HTH



On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi David,

 This one is a good suggestion. But, if add these *adult* keywords in the
 stopwords.txt file, it will be requiring the re-indexing of these keywords
 related data.

 How can I see the change instantly. Is there any other great suggestion
 that you can suggest me.




 On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian 
 davidphilipcher...@gmail.com wrote:

  The best bet is to use solr.StopFilterFactory.
  Have all such words added to stopwords.txt and add this filter to your
  analyzer.
 
  Reference links
 
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
 
 
 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
 
  HTH
 
 
  On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
   Hi Team,
  
   I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
 there
   any feature by which I can refrain the following words to appear in
 spell
   suggestion.
  
   For example: Somebody searches for sexe, I does not want to show him
 sex
  as
   the spell suggestion via solr. How can I stop these type of keywords to
  be
   shown in suggestion.
  
   Any help is appreciated.
  
  
   Regards
   Kamal Kishore
   Solr Beginner
  
 



Re: Protwords in solr spellchecker

2015-07-10 Thread Kamal Kishore Aggarwal
Hi David,

This one is a good suggestion. But, if add these *adult* keywords in the
stopwords.txt file, it will be requiring the re-indexing of these keywords
related data.

How can I see the change instantly. Is there any other great suggestion
that you can suggest me.




On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian 
davidphilipcher...@gmail.com wrote:

 The best bet is to use solr.StopFilterFactory.
 Have all such words added to stopwords.txt and add this filter to your
 analyzer.

 Reference links

 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory

 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter

 HTH


 On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  Hi Team,
 
  I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is there
  any feature by which I can refrain the following words to appear in spell
  suggestion.
 
  For example: Somebody searches for sexe, I does not want to show him sex
 as
  the spell suggestion via solr. How can I stop these type of keywords to
 be
  shown in suggestion.
 
  Any help is appreciated.
 
 
  Regards
  Kamal Kishore
  Solr Beginner
 



Re: Protwords in solr spellchecker

2015-07-10 Thread Alessandro Benedetti
So let's try to analyse the situation from the spellchecking point of view .
First of all we follow David suggestions and we add in the QueryTime
analysis, the StopWordsFilter, with our configured bad words.

*Starting scenario*
- we have the protected words in our index, we still want them to be in
there

Let's explore the different kind of Spellcheckers available, where do they
take the suggestions ? :

*Index Based Spellchecker*
The suggestions will come from an auxiliary index.

*Direct Spellchecker*
The suggestions will come from the current index.

*File based spellchecker*
It uses an external file to get the spelling suggestions from, so we can
curate this file properly with only good words, and we are fine.
But I guess you would like to use a blacklist, in this case we are going to
have a white list.

*Query Time*
At query time *the query is analysed *and a token stream is provided.
Then depending on the implementation we trigger a different lookup.
In the case of the Direct Spellchecker, if I remember well :
For each token a FST with all the supported inflections is generated and an
intersection happen with the Index FST ( based on the field), and the
suggestion is returned.

Unfortunately a proper* query time analysis will not help .*
When we analyse the query we have the misspelled word sexe that is not
going to be recognised as the bad word.
Then the inflections are calculated, the FST built and the intersection
will actually produce the feared suggestion sex .
This because the word is in the index.

If we can't modify the index, the *Direct Spellcheck is not an option *if
my understanding is correct.

Let's see if the Index Based spellcheck can help …
Unfortunately also in this case, the auxiliary index produced is based on
the analysed form of the original field.

If you really can not re-index content I would suggest you an
implementation based on a concept similar to the AnalyzingSuggester in Solr.

Open to clarify your further questions.








2015-07-10 9:31 GMT+01:00 davidphilip cherian davidphilipcher...@gmail.com
:

 Hi Kamal,

 Not necessarily. You can have different filters applied at index time and
 query time. (note that the order in which filters are defined matters). You
 could just add the stop filter at query time.
 Have your own custom data type defined (similar to 'text_en' that will be
 in schem.xml) and perhaps use standard/whitespace tokenizer followed by
 stop filter at query time.

 Tip: Use analysis tool that is available in solr admin page to further
 understand the analysis chain of data types.

 HTH



 On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  Hi David,
 
  This one is a good suggestion. But, if add these *adult* keywords in the
  stopwords.txt file, it will be requiring the re-indexing of these
 keywords
  related data.
 
  How can I see the change instantly. Is there any other great suggestion
  that you can suggest me.
 
 
 
 
  On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian 
  davidphilipcher...@gmail.com wrote:
 
   The best bet is to use solr.StopFilterFactory.
   Have all such words added to stopwords.txt and add this filter to your
   analyzer.
  
   Reference links
  
  
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
  
   HTH
  
  
   On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
Hi Team,
   
I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
  there
any feature by which I can refrain the following words to appear in
  spell
suggestion.
   
For example: Somebody searches for sexe, I does not want to show him
  sex
   as
the spell suggestion via solr. How can I stop these type of keywords
 to
   be
shown in suggestion.
   
Any help is appreciated.
   
   
Regards
Kamal Kishore
Solr Beginner
   
  
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


RE: Protwords in solr spellchecker

2015-07-10 Thread Dyer, James
Kamal,

Given the constraint that you cannot re-index the data, your best bet might be 
to simply filter out the suggestions at the application level, or maybe even 
have a proxy do it.

Possibly another option, you might be able to extend DirectSolrSpellchecker and 
override #getSuggestions(), calling super(), then post-filtering out your stop 
words from the response.  You'll want to request a few more terms so you're 
more likely to get results even if a term or two get filtered out.  You can 
specify your custom spell checker in solrconfig.xml.

James Dyer
Ingram Content Group


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Friday, July 10, 2015 7:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Protwords in solr spellchecker

So let's try to analyse the situation from the spellchecking point of view .
First of all we follow David suggestions and we add in the QueryTime
analysis, the StopWordsFilter, with our configured bad words.

*Starting scenario*
- we have the protected words in our index, we still want them to be in
there

Let's explore the different kind of Spellcheckers available, where do they
take the suggestions ? :

*Index Based Spellchecker*
The suggestions will come from an auxiliary index.

*Direct Spellchecker*
The suggestions will come from the current index.

*File based spellchecker*
It uses an external file to get the spelling suggestions from, so we can
curate this file properly with only good words, and we are fine.
But I guess you would like to use a blacklist, in this case we are going to
have a white list.

*Query Time*
At query time *the query is analysed *and a token stream is provided.
Then depending on the implementation we trigger a different lookup.
In the case of the Direct Spellchecker, if I remember well :
For each token a FST with all the supported inflections is generated and an
intersection happen with the Index FST ( based on the field), and the
suggestion is returned.

Unfortunately a proper* query time analysis will not help .*
When we analyse the query we have the misspelled word sexe that is not
going to be recognised as the bad word.
Then the inflections are calculated, the FST built and the intersection
will actually produce the feared suggestion sex .
This because the word is in the index.

If we can't modify the index, the *Direct Spellcheck is not an option *if
my understanding is correct.

Let's see if the Index Based spellcheck can help …
Unfortunately also in this case, the auxiliary index produced is based on
the analysed form of the original field.

If you really can not re-index content I would suggest you an
implementation based on a concept similar to the AnalyzingSuggester in Solr.

Open to clarify your further questions.








2015-07-10 9:31 GMT+01:00 davidphilip cherian davidphilipcher...@gmail.com
:

 Hi Kamal,

 Not necessarily. You can have different filters applied at index time and
 query time. (note that the order in which filters are defined matters). You
 could just add the stop filter at query time.
 Have your own custom data type defined (similar to 'text_en' that will be
 in schem.xml) and perhaps use standard/whitespace tokenizer followed by
 stop filter at query time.

 Tip: Use analysis tool that is available in solr admin page to further
 understand the analysis chain of data types.

 HTH



 On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  Hi David,
 
  This one is a good suggestion. But, if add these *adult* keywords in the
  stopwords.txt file, it will be requiring the re-indexing of these
 keywords
  related data.
 
  How can I see the change instantly. Is there any other great suggestion
  that you can suggest me.
 
 
 
 
  On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian 
  davidphilipcher...@gmail.com wrote:
 
   The best bet is to use solr.StopFilterFactory.
   Have all such words added to stopwords.txt and add this filter to your
   analyzer.
  
   Reference links
  
  
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
  
   HTH
  
  
   On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
Hi Team,
   
I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
  there
any feature by which I can refrain the following words to appear in
  spell
suggestion.
   
For example: Somebody searches for sexe, I does not want to show him
  sex
   as
the spell suggestion via solr. How can I stop these type of keywords
 to
   be
shown in suggestion.
   
Any help is appreciated.
   
   
Regards
Kamal Kishore
Solr Beginner
   
  
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What

Re: Protwords in solr spellchecker

2015-07-09 Thread davidphilip cherian
The best bet is to use solr.StopFilterFactory.
Have all such words added to stopwords.txt and add this filter to your
analyzer.

Reference links
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter

HTH


On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi Team,

 I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is there
 any feature by which I can refrain the following words to appear in spell
 suggestion.

 For example: Somebody searches for sexe, I does not want to show him sex as
 the spell suggestion via solr. How can I stop these type of keywords to be
 shown in suggestion.

 Any help is appreciated.


 Regards
 Kamal Kishore
 Solr Beginner