Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Volkan Altan
Yes. I did it. Bu it doesn’t work.

New Example;

TSTLookup

doc 1 : shoe adidas 2 hiking
doc 2 : galaxy samsung s5 phone
doc 3 : shakeology sample packets

http://localhost:8983/solr/solr/suggest?q=samsung+hi

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=samsung
int name=numFound2/int
int name=startOffset0/int
int name=endOffset7/int
arr name=suggestion
strsamsung s5/str
strsamsung s5 phone/str
/arr
/lst
lst name=hi
int name=numFound1/int
int name=startOffset8/int
int name=endOffset10/int
arr name=suggestion
strhiking/str
/arr
/lst
lst name=collation
str name=collationQuery(samsung s5) hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung s5/str
str name=hihiking/str
/lst
/lst
lst name=collation
str name=collationQuery(samsung s5 phone) hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung s5 phone/str
str name=hihiking/str
/lst
/lst
lst name=collation
str name=collationQuerysamsung hiking/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=samsungsamsung/str
str name=hihiking/str
/lst
/lst
/lst
/lst
/response

field name=suggestions type=suggest_term indexed=true multiValued=true 
stored=false omitNorms=true/
fieldType name=suggest_term class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 
preserveOriginal=1 /
filter class=solr.TrimFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2 
maxShingleSize=4 outputUnigrams=true/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 splitOnNumerics=0 
preserveOriginal=0 /
filter class=solr.TrimFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ShingleFilterFactory minShingleSize=2 
maxShingleSize=4 outputUnigrams=true/
filter class=solr.ApostropheFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
/analyzer
/fieldType

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldsuggestions/str  !-- the indexed field to 
derive suggestions from --
float name=threshold0.1/float
str name=buildOnCommittrue/str
/lst
str name=queryAnalyzerFieldTypesuggest_term/str
/searchComponent
!-- auto-complete --
requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.buildfalse/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollations10/str
str name=spellcheck.maxCollationTries100/str
/lst
arr name=components
strsuggest/str
/arr
/requestHandler

---

FreeTextLookupFactory

doc 1 : shoe adidas 2 hiking
doc 2 : galaxy samsung s5 phone
doc 3 : shakeology sample packets

http://localhost:8983/solr/solr/suggest?q=samsung+hi

response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=samsung
int name=numFound9/int
int name=startOffset0/int
int name=endOffset7/int
arr name=suggestion
strsamsung s5/str
strsamsung s5 phone/str
strsamsung s5 phone galaxy/str
strsamsung s5 phone samsung/str
strsamsung s5 phone samsung samsungs5phone/str
strsamsung s5 phone samsung samsungs5phone s5/str
strsamsung s5 

Re: Solr suggest is related to second letter, not to initial letter

2015-02-18 Thread Michael Sokolov


On 02/17/2015 03:46 AM, Volkan Altan wrote:

First of all thank you for your answer.
You're welcome - thanks for sending a more complete example of your 
problem and expected behavior.


I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!

Did you try the other suggestions in my earlier reply?

-Mike



Re: Solr suggest is related to second letter, not to initial letter

2015-02-17 Thread Volkan Altan
First of all thank you for your answer.

Example Url:
doc 1 suggest_field: galaxy samsung s5 phone
doc 2 suggest_field: shoe adidas 2 hiking 


http://localhost:8983/solr/solr/suggest?q=galaxy+s

The result for which I am waiting is just like the one indicated below. But; 
the ‘’Galaxy shoe’’ isn’t supposed to appear. However,unfortunately, the galaxy 
shoe appears now.


lst name=collation
str name=collationQuerygalaxy samsung/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=galaxygalaxy/str
str name=samsungsamsung/str
/lst
/lst
lst name=collation
str name=collationQuerygalaxy s5/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=galaxygalaxy/str
str name=s5s5/str
/lst
/lst


I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!


My settings;

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str 
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str  
  
str name=fieldsuggestions/str 
float name=threshold0.1/float
str name=buildOnCommittrue/str
/lst
str name=queryAnalyzerFieldTypesuggest_term/str
/searchComponent
!-- auto-complete --
requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.buildfalse/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count10/str
str name=“spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollations10/str
str name=spellcheck.maxCollationTries100/str
/lst
arr name=components
strsuggest/str
/arr
 /requestHandler


fieldType name=suggest_term class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TurkishLowerCaseFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
/analyzer
analyzer type=query
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-PunctuationToSpace.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.TurkishLowerCaseFilterFactory/

filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
/analyzer
/fieldType


 On 16 Şub 2015, at 03:52, Michael Sokolov msoko...@safaribooksonline.com 
 wrote:
 
 StandardTokenizer splits your text into tokens, and the suggester suggests 
 tokens independently.  It sounds as if you want the suggestions to be based 
 on the entire text (not just the current word), and that only adjacent words 
 in the original should appear as suggestions.  Assuming that's what you are 
 after (it's a little hard to tell from your e-mail -- you might want to 
 clarify by providing a few example of how you *do* want it to work instead of 
 just examples of how you *don't* want it to work), you have a couple of 
 choices:
 
 1) don't use StandardTokenizer, use KeywordTokenizer instead - this will 
 preserve the entire original text and suggest complete texts, rather than 
 words
 2) maybe consider using a shingle filter along with standard tokenizer, so 
 that your tokens include multi-word shingles
 3) Use a suggester with better support for a statistical language model, like 
 this one: 
 http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html,
  but to do this you will probably need to do some java programming since it 
 isn't well integrated into solr
 
 -Mike
 
 On 2/14/2015 3:44 AM, Volkan Altan wrote:
 Any idea?
 
 
 On 12 Şub 2015, at 11:12, Volkan Altan volkanal...@gmail.com wrote:
 
 Hello Everyone,
 
 All I want to do with Solr suggester is obtaining the 

Re: Solr suggest is related to second letter, not to initial letter

2015-02-15 Thread Michael Sokolov
StandardTokenizer splits your text into tokens, and the suggester 
suggests tokens independently.  It sounds as if you want the suggestions 
to be based on the entire text (not just the current word), and that 
only adjacent words in the original should appear as suggestions.  
Assuming that's what you are after (it's a little hard to tell from your 
e-mail -- you might want to clarify by providing a few example of how 
you *do* want it to work instead of just examples of how you *don't* 
want it to work), you have a couple of choices:


1) don't use StandardTokenizer, use KeywordTokenizer instead - this will 
preserve the entire original text and suggest complete texts, rather 
than words
2) maybe consider using a shingle filter along with standard tokenizer, 
so that your tokens include multi-word shingles
3) Use a suggester with better support for a statistical language model, 
like this one: 
http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html, 
but to do this you will probably need to do some java programming since 
it isn't well integrated into solr


-Mike

On 2/14/2015 3:44 AM, Volkan Altan wrote:

Any idea?



On 12 Şub 2015, at 11:12, Volkan Altan volkanal...@gmail.com wrote:

Hello Everyone,

All I want to do with Solr suggester is obtaining the fact that the asserted 
suggestions  for the second letter whose entry actualizes after the initial 
letter  is actually related to initial letter, itself. But; just like the 
initial letters, the second letters rotate independently, as well.


Example;
http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s; 
http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22

adidas s

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=s
int name=numFound1/int
int name=startOffset27/int
int name=endOffset28/int
arr name=suggestion
strsamsung/str
/arr
/lst
lst name=collation
str name=collationQueryfacet_suggest_data:adidas samsung/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=adidasadidas/str
str name=ssamsung/str
/lst
/lst
/lst
/lst
/response


The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
documents. A common place in which both of them are available cannot be found.

How can I solve that problem?



schema.xml

fieldType name=suggestions_type class=solr.TextField 
positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ApostropheFilterFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ApostropheFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType

field name=“facet_suggest_data type=suggestions_type indexed=true multiValued=true 
stored=false omitNorms=true/


Best







Re: Solr suggest is related to second letter, not to initial letter

2015-02-14 Thread Volkan Altan
Any idea?


 On 12 Şub 2015, at 11:12, Volkan Altan volkanal...@gmail.com wrote:
 
 Hello Everyone,
 
 All I want to do with Solr suggester is obtaining the fact that the asserted 
 suggestions  for the second letter whose entry actualizes after the initial 
 letter  is actually related to initial letter, itself. But; just like the 
 initial letters, the second letters rotate independently, as well. 
 
 
 Example; 
 http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s; 
 http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22
 
 adidas s
 
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime4/int
 /lst
 lst name=spellcheck
 lst name=suggestions
 lst name=s
 int name=numFound1/int
 int name=startOffset27/int
 int name=endOffset28/int
 arr name=suggestion
 strsamsung/str
 /arr
 /lst
 lst name=collation
 str name=collationQueryfacet_suggest_data:adidas samsung/str
 int name=hits0/int
 lst name=misspellingsAndCorrections
 str name=adidasadidas/str
 str name=ssamsung/str
 /lst
 /lst
 /lst
 /lst
 /response
 
 
 The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
 documents. A common place in which both of them are available cannot be found.
 
 How can I solve that problem?  
 
 
 
 schema.xml
 
 fieldType name=suggestions_type class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ApostropheFilterFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt ignoreCase=true expand=false/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.ApostropheFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType
 
 field name=“facet_suggest_data type=suggestions_type indexed=true 
 multiValued=true stored=false omitNorms=true/
 
 
 Best
 



Solr suggest is related to second letter, not to initial letter

2015-02-12 Thread Volkan Altan
Hello Everyone,

All I want to do with Solr suggester is obtaining the fact that the asserted 
suggestions  for the second letter whose entry actualizes after the initial 
letter  is actually related to initial letter, itself. But; just like the 
initial letters, the second letters rotate independently, as well. 


Example; 
http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s; 
http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22

adidas s

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=s
int name=numFound1/int
int name=startOffset27/int
int name=endOffset28/int
arr name=suggestion
strsamsung/str
/arr
/lst
lst name=collation
str name=collationQueryfacet_suggest_data:adidas samsung/str
int name=hits0/int
lst name=misspellingsAndCorrections
str name=adidasadidas/str
str name=ssamsung/str
/lst
/lst
/lst
/lst
/response


The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
documents. A common place in which both of them are available cannot be found.

How can I solve that problem?  



schema.xml

fieldType name=suggestions_type class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ApostropheFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

field name=“facet_suggest_data type=suggestions_type indexed=true 
multiValued=true stored=false omitNorms=true/


Best