Re: Solr's suggester results

2019-06-19 Thread ppunet
Here is my problem statement and I would really appreciate for your feedback.

1. There are 1000's of pdf's with large amount of content are indexed to
Solr.
2. Using AnalyzingInfixSuggester for the suggestions.

Q. As the SuggeterComponent provides the 'entire content' of the field in
the suggestions. How is it possible to have Suggester to return only part of
the content of the field, instead of the entire content, which in my
scenario quite long?


Thanks in advance.

PD



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr's suggester results

2015-06-17 Thread Zheng Lin Edwin Yeo
I'm using the FreeTextLookupFactory in my implementation now.

Yes, now it can suggest part of the field from the middle of the content.

I read that this implementation is able to consider the previous tokens
when making the suggestions. However, when I try to enter a search phrase,
it seems that it is only considering the last token and not any of the
previous tokens.

For example, when I search for
http://localhost:8983/edm/collection1/suggest?suggest.q=trouble free, it is
giving me suggestions based on the word 'free' only, and not 'trouble free'.

This is my configuration:

In solrconfig.xml:

searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester

str name=lookupImplFreeTextLookupFactory/str
str name=indexPathsuggester_freetext_dir/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=suggestFreeTextAnalyzerFieldTypesuggestType/str
str name=ngrams5/str
str name=buildOnStartupfalse/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent

requestHandler name=/suggest class=solr.SearchHandler startup=lazy 
  lst name=defaults
str name=wtjson/str
str name=indenttrue/str

str name=suggesttrue/str
str name=suggest.count10/str
str name=suggest.dictionarymySuggester/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler

In schema.xml

fieldType name=suggestType class=solr.TextField
positionIncrementGap=100
analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=[^a-zA-Z0-9] replacement=  /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=5
outputUnigrams=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
/analyzer
/fieldType

Is there anything I configured wrongly? I've set the ngrams to 5, which
means it is supposed to consider up to the previous 5 tokens entered?


Regards,
Edwin


On 17 June 2015 at 22:12, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 Edwin,
 The spellcheck is a thing, the Suggester is another.

 If you need to provide auto suggestion to your users, the suggester is the
 right thing to use.
 But I really doubt to be useful to select as a suggester field the entire
 content.
 it is going to be quite expensive.

 In the case I would again really suggest you to take a look to the article
 I quoted and Solr generic documentation.

 It is possible to suggest part of the field.
 You can use the FreeText suggester with a proper analysis selected.

 Cheers

 2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  Yes I've looked at that before, but I was told that the newer version of
  Solr has its own suggester, and does not need to use spellchecker
 anymore?
 
  So it's not necessary to use the spellechecker inside suggester anymore?
 
  Regards,
  Edwin
 
 
  On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com
 wrote:
 
   Have you looked at spellchecker? Because that sound much more like
   what you're asking about than suggester.
  
   Spell checking is more what you're asking for, have you even looked at
  that
   after it was suggested?
  
   bq: Also, when I do a search, it shouldn't be returning whole fields,
   but just to return a portion of the sentence
  
   This is what highlighting is built for.
  
   Really, I recommend you take the time to do some familiarization with
 the
   whole search space and Solr. The excellent book here:
  
  
  
 
 http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E
  
   will give you the grounding you need to get the most out of Solr.
  
   Best,
   Erick
  
   On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
   edwinye...@gmail.com wrote:
The long content is from when I tried to index PDF files. As some PDF
   files
has alot of words in the content, it will lead to the *UTF8 encoding
 is
longer than the max length 32766 error.*
   
I think the problem is the content size of the PDF file exceed 32766
characters?
   
I'm trying to accomplish to be able to index documents that can be of
  any
size (even those with very large contents), and build the suggester
  from
there. Also, when I do a search, it shouldn't be returning whole
  fields,
but just to return a portion of the sentence.
   
   
   
Regards,
Edwin
   
   
On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com
   wrote:
   
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..
   
This is an XY problem IMO. Please describe exactly what
you're trying to accomplish, with examples rather than
continue to pursue this path. It sounds like you want
spellcheck or 

Re: Solr's suggester results

2015-06-17 Thread Alessandro Benedetti
Edwin,
The spellcheck is a thing, the Suggester is another.

If you need to provide auto suggestion to your users, the suggester is the
right thing to use.
But I really doubt to be useful to select as a suggester field the entire
content.
it is going to be quite expensive.

In the case I would again really suggest you to take a look to the article
I quoted and Solr generic documentation.

It is possible to suggest part of the field.
You can use the FreeText suggester with a proper analysis selected.

Cheers

2015-06-17 6:14 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Yes I've looked at that before, but I was told that the newer version of
 Solr has its own suggester, and does not need to use spellchecker anymore?

 So it's not necessary to use the spellechecker inside suggester anymore?

 Regards,
 Edwin


 On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com wrote:

  Have you looked at spellchecker? Because that sound much more like
  what you're asking about than suggester.
 
  Spell checking is more what you're asking for, have you even looked at
 that
  after it was suggested?
 
  bq: Also, when I do a search, it shouldn't be returning whole fields,
  but just to return a portion of the sentence
 
  This is what highlighting is built for.
 
  Really, I recommend you take the time to do some familiarization with the
  whole search space and Solr. The excellent book here:
 
 
 
 http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E
 
  will give you the grounding you need to get the most out of Solr.
 
  Best,
  Erick
 
  On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
  edwinye...@gmail.com wrote:
   The long content is from when I tried to index PDF files. As some PDF
  files
   has alot of words in the content, it will lead to the *UTF8 encoding is
   longer than the max length 32766 error.*
  
   I think the problem is the content size of the PDF file exceed 32766
   characters?
  
   I'm trying to accomplish to be able to index documents that can be of
 any
   size (even those with very large contents), and build the suggester
 from
   there. Also, when I do a search, it shouldn't be returning whole
 fields,
   but just to return a portion of the sentence.
  
  
  
   Regards,
   Edwin
  
  
   On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com
  wrote:
  
   The suggesters are built to return whole fields. You _might_
   be able to add multiple fragments to a multiValued
   entry and get fragments, I haven't tried that though
   and I suspect that actually you'd get the same thing..
  
   This is an XY problem IMO. Please describe exactly what
   you're trying to accomplish, with examples rather than
   continue to pursue this path. It sounds like you want
   spellcheck or similar. The _point_ behind the
   suggesters is that they handle multiple-word suggestions
   by returning he whole field. So putting long text fields
   into them is not going to work.
  
   Best,
   Erick
  
   On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
   benedetti.ale...@gmail.com wrote:
in line :
   
2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com
 :
   
Thanks Benedetti,
   
I've change to the AnalyzingInfixLookup approach, and it is able to
   start
searching from the middle of the field.
   
However, is it possible to make the suggester to show only part of
  the
content of the field (like 2 or 3 fields after), instead of the
  entire
content/sentence, which can be quite long?
   
   
I assume you use fields in the place of tokens.
The answer is yes, I already said that in my previous mail, I invite
  you
   to
read carefully the answers and the documentation linked !
   
Related the excessive dimensions of tokens. This is weird, what are
  you
trying to autocomplete ?
I really doubt would be useful for a user to see super long auto
   completed
terms.
   
Cheers
   
   
   
Regards,
Edwin
   
   
   
On 15 June 2015 at 17:33, Alessandro Benedetti 
   benedetti.ale...@gmail.com

wrote:
   
 ehehe Edwin, I think you should read again the document I linked
  time
ago :

 http://lucidworks.com/blog/solr-suggester/

 The suggester you used is not meant to provide infix suggestions.
 The fuzzy suggester is working on a fuzzy basis , with the
  *starting*
terms
 of a field content.

 What you are looking for is actually one of the Infix Suggesters.
 For example the AnalyzingInfixLookup approach.

 When working with Suggesters is important first to make a
  distinction
   :

 1) Returning the full content of the field ( analysisInfix or
  Fuzzy)

 2) Returning token(s) ( Free Text Suggester)

 Then the second difference is :

 1) Infix suggestions ( from the middle of the field 

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..

This is an XY problem IMO. Please describe exactly what
you're trying to accomplish, with examples rather than
continue to pursue this path. It sounds like you want
spellcheck or similar. The _point_ behind the
suggesters is that they handle multiple-word suggestions
by returning he whole field. So putting long text fields
into them is not going to work.

Best,
Erick

On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 in line :

 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thanks Benedetti,

 I've change to the AnalyzingInfixLookup approach, and it is able to start
 searching from the middle of the field.

 However, is it possible to make the suggester to show only part of the
 content of the field (like 2 or 3 fields after), instead of the entire
 content/sentence, which can be quite long?


 I assume you use fields in the place of tokens.
 The answer is yes, I already said that in my previous mail, I invite you to
 read carefully the answers and the documentation linked !

 Related the excessive dimensions of tokens. This is weird, what are you
 trying to autocomplete ?
 I really doubt would be useful for a user to see super long auto completed
 terms.

 Cheers



 Regards,
 Edwin



 On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com
 
 wrote:

  ehehe Edwin, I think you should read again the document I linked time
 ago :
 
  http://lucidworks.com/blog/solr-suggester/
 
  The suggester you used is not meant to provide infix suggestions.
  The fuzzy suggester is working on a fuzzy basis , with the *starting*
 terms
  of a field content.
 
  What you are looking for is actually one of the Infix Suggesters.
  For example the AnalyzingInfixLookup approach.
 
  When working with Suggesters is important first to make a distinction :
 
  1) Returning the full content of the field ( analysisInfix or Fuzzy)
 
  2) Returning token(s) ( Free Text Suggester)
 
  Then the second difference is :
 
  1) Infix suggestions ( from the middle of the field content)
  2) Classic suggester ( from the beginning of the field content)
 
  Clarified that, will be quite simple to work with suggesters.
 
  Cheers
 
  2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   I've indexed a rich-text documents with the following content:
  
   This is a testing rich text documents to test the uploading of files to
   Solr
  
  
   When I tried to use the suggestion, it return me the entire field in
 the
   content once I enter suggest?q=t. However, when I tried to search for
   q='rich', I don't get any results returned.
  
   This is my current configuration for the suggester:
   searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldSuggestion/str
   str name=suggestAnalyzerFieldTypesuggestType/str
   str name=buildOnStartuptrue/str
   str name=buildOnCommitfalse/str
 /lst
   /searchComponent
  
   requestHandler name=/suggest class=solr.SearchHandler
  startup=lazy 
 lst name=defaults
   str name=wtjson/str
   str name=indenttrue/str
  
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler
  
   Is it possible to allow the suggester to return something even from the
   middle of the sentence, and also not to return the entire sentence if
 the
   sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
   return more fields as the users type.
  
   For example,
   When user type 'this', it should return 'This is a testing'
   When user type 'this is a testing', it should return 'This is a testing
   rich text documents'.
  
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England


Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
The long content is from when I tried to index PDF files. As some PDF files
has alot of words in the content, it will lead to the *UTF8 encoding is
longer than the max length 32766 error.*

I think the problem is the content size of the PDF file exceed 32766
characters?

I'm trying to accomplish to be able to index documents that can be of any
size (even those with very large contents), and build the suggester from
there. Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence.



Regards,
Edwin


On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote:

 The suggesters are built to return whole fields. You _might_
 be able to add multiple fragments to a multiValued
 entry and get fragments, I haven't tried that though
 and I suspect that actually you'd get the same thing..

 This is an XY problem IMO. Please describe exactly what
 you're trying to accomplish, with examples rather than
 continue to pursue this path. It sounds like you want
 spellcheck or similar. The _point_ behind the
 suggesters is that they handle multiple-word suggestions
 by returning he whole field. So putting long text fields
 into them is not going to work.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  in line :
 
  2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  Thanks Benedetti,
 
  I've change to the AnalyzingInfixLookup approach, and it is able to
 start
  searching from the middle of the field.
 
  However, is it possible to make the suggester to show only part of the
  content of the field (like 2 or 3 fields after), instead of the entire
  content/sentence, which can be quite long?
 
 
  I assume you use fields in the place of tokens.
  The answer is yes, I already said that in my previous mail, I invite you
 to
  read carefully the answers and the documentation linked !
 
  Related the excessive dimensions of tokens. This is weird, what are you
  trying to autocomplete ?
  I really doubt would be useful for a user to see super long auto
 completed
  terms.
 
  Cheers
 
 
 
  Regards,
  Edwin
 
 
 
  On 15 June 2015 at 17:33, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
  wrote:
 
   ehehe Edwin, I think you should read again the document I linked time
  ago :
  
   http://lucidworks.com/blog/solr-suggester/
  
   The suggester you used is not meant to provide infix suggestions.
   The fuzzy suggester is working on a fuzzy basis , with the *starting*
  terms
   of a field content.
  
   What you are looking for is actually one of the Infix Suggesters.
   For example the AnalyzingInfixLookup approach.
  
   When working with Suggesters is important first to make a distinction
 :
  
   1) Returning the full content of the field ( analysisInfix or Fuzzy)
  
   2) Returning token(s) ( Free Text Suggester)
  
   Then the second difference is :
  
   1) Infix suggestions ( from the middle of the field content)
   2) Classic suggester ( from the beginning of the field content)
  
   Clarified that, will be quite simple to work with suggesters.
  
   Cheers
  
   2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
I've indexed a rich-text documents with the following content:
   
This is a testing rich text documents to test the uploading of
 files to
Solr
   
   
When I tried to use the suggestion, it return me the entire field in
  the
content once I enter suggest?q=t. However, when I tried to search
 for
q='rich', I don't get any results returned.
   
This is my current configuration for the suggester:
searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
str name=namemySuggester/str
str name=lookupImplFuzzyLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=suggestAnalyzerFieldTypesuggestType/str
str name=buildOnStartuptrue/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent
   
requestHandler name=/suggest class=solr.SearchHandler
   startup=lazy 
  lst name=defaults
str name=wtjson/str
str name=indenttrue/str
   
str name=suggesttrue/str
str name=suggest.count10/str
str name=suggest.dictionarymySuggester/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler
   
Is it possible to allow the suggester to return something even from
 the
middle of the sentence, and also not to return the entire sentence
 if
  the
sentence. Perhaps it should just suggest the next 2 or 3 fields,
 and to
return more fields as the users type.
   
For example,
When user type 'this', it should return 'This is a testing'
When user type 'this is a testing', it should return 'This is a
 testing
rich text documents'.
   
   
Regards,
Edwin
   
  
  
  
   --
   --
  
   Benedetti 

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
Have you looked at spellchecker? Because that sound much more like
what you're asking about than suggester.

Spell checking is more what you're asking for, have you even looked at that
after it was suggested?

bq: Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence

This is what highlighting is built for.

Really, I recommend you take the time to do some familiarization with the
whole search space and Solr. The excellent book here:

http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E

will give you the grounding you need to get the most out of Solr.

Best,
Erick

On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The long content is from when I tried to index PDF files. As some PDF files
 has alot of words in the content, it will lead to the *UTF8 encoding is
 longer than the max length 32766 error.*

 I think the problem is the content size of the PDF file exceed 32766
 characters?

 I'm trying to accomplish to be able to index documents that can be of any
 size (even those with very large contents), and build the suggester from
 there. Also, when I do a search, it shouldn't be returning whole fields,
 but just to return a portion of the sentence.



 Regards,
 Edwin


 On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote:

 The suggesters are built to return whole fields. You _might_
 be able to add multiple fragments to a multiValued
 entry and get fragments, I haven't tried that though
 and I suspect that actually you'd get the same thing..

 This is an XY problem IMO. Please describe exactly what
 you're trying to accomplish, with examples rather than
 continue to pursue this path. It sounds like you want
 spellcheck or similar. The _point_ behind the
 suggesters is that they handle multiple-word suggestions
 by returning he whole field. So putting long text fields
 into them is not going to work.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  in line :
 
  2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  Thanks Benedetti,
 
  I've change to the AnalyzingInfixLookup approach, and it is able to
 start
  searching from the middle of the field.
 
  However, is it possible to make the suggester to show only part of the
  content of the field (like 2 or 3 fields after), instead of the entire
  content/sentence, which can be quite long?
 
 
  I assume you use fields in the place of tokens.
  The answer is yes, I already said that in my previous mail, I invite you
 to
  read carefully the answers and the documentation linked !
 
  Related the excessive dimensions of tokens. This is weird, what are you
  trying to autocomplete ?
  I really doubt would be useful for a user to see super long auto
 completed
  terms.
 
  Cheers
 
 
 
  Regards,
  Edwin
 
 
 
  On 15 June 2015 at 17:33, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
  wrote:
 
   ehehe Edwin, I think you should read again the document I linked time
  ago :
  
   http://lucidworks.com/blog/solr-suggester/
  
   The suggester you used is not meant to provide infix suggestions.
   The fuzzy suggester is working on a fuzzy basis , with the *starting*
  terms
   of a field content.
  
   What you are looking for is actually one of the Infix Suggesters.
   For example the AnalyzingInfixLookup approach.
  
   When working with Suggesters is important first to make a distinction
 :
  
   1) Returning the full content of the field ( analysisInfix or Fuzzy)
  
   2) Returning token(s) ( Free Text Suggester)
  
   Then the second difference is :
  
   1) Infix suggestions ( from the middle of the field content)
   2) Classic suggester ( from the beginning of the field content)
  
   Clarified that, will be quite simple to work with suggesters.
  
   Cheers
  
   2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
I've indexed a rich-text documents with the following content:
   
This is a testing rich text documents to test the uploading of
 files to
Solr
   
   
When I tried to use the suggestion, it return me the entire field in
  the
content once I enter suggest?q=t. However, when I tried to search
 for
q='rich', I don't get any results returned.
   
This is my current configuration for the suggester:
searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
str name=namemySuggester/str
str name=lookupImplFuzzyLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=suggestAnalyzerFieldTypesuggestType/str
str name=buildOnStartuptrue/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent
   
requestHandler name=/suggest class=solr.SearchHandler
   startup=lazy 
  lst 

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
Yes I've looked at that before, but I was told that the newer version of
Solr has its own suggester, and does not need to use spellchecker anymore?

So it's not necessary to use the spellechecker inside suggester anymore?

Regards,
Edwin


On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com wrote:

 Have you looked at spellchecker? Because that sound much more like
 what you're asking about than suggester.

 Spell checking is more what you're asking for, have you even looked at that
 after it was suggested?

 bq: Also, when I do a search, it shouldn't be returning whole fields,
 but just to return a portion of the sentence

 This is what highlighting is built for.

 Really, I recommend you take the time to do some familiarization with the
 whole search space and Solr. The excellent book here:


 http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E

 will give you the grounding you need to get the most out of Solr.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  The long content is from when I tried to index PDF files. As some PDF
 files
  has alot of words in the content, it will lead to the *UTF8 encoding is
  longer than the max length 32766 error.*
 
  I think the problem is the content size of the PDF file exceed 32766
  characters?
 
  I'm trying to accomplish to be able to index documents that can be of any
  size (even those with very large contents), and build the suggester from
  there. Also, when I do a search, it shouldn't be returning whole fields,
  but just to return a portion of the sentence.
 
 
 
  Regards,
  Edwin
 
 
  On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com
 wrote:
 
  The suggesters are built to return whole fields. You _might_
  be able to add multiple fragments to a multiValued
  entry and get fragments, I haven't tried that though
  and I suspect that actually you'd get the same thing..
 
  This is an XY problem IMO. Please describe exactly what
  you're trying to accomplish, with examples rather than
  continue to pursue this path. It sounds like you want
  spellcheck or similar. The _point_ behind the
  suggesters is that they handle multiple-word suggestions
  by returning he whole field. So putting long text fields
  into them is not going to work.
 
  Best,
  Erick
 
  On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
  benedetti.ale...@gmail.com wrote:
   in line :
  
   2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
   Thanks Benedetti,
  
   I've change to the AnalyzingInfixLookup approach, and it is able to
  start
   searching from the middle of the field.
  
   However, is it possible to make the suggester to show only part of
 the
   content of the field (like 2 or 3 fields after), instead of the
 entire
   content/sentence, which can be quite long?
  
  
   I assume you use fields in the place of tokens.
   The answer is yes, I already said that in my previous mail, I invite
 you
  to
   read carefully the answers and the documentation linked !
  
   Related the excessive dimensions of tokens. This is weird, what are
 you
   trying to autocomplete ?
   I really doubt would be useful for a user to see super long auto
  completed
   terms.
  
   Cheers
  
  
  
   Regards,
   Edwin
  
  
  
   On 15 June 2015 at 17:33, Alessandro Benedetti 
  benedetti.ale...@gmail.com
   
   wrote:
  
ehehe Edwin, I think you should read again the document I linked
 time
   ago :
   
http://lucidworks.com/blog/solr-suggester/
   
The suggester you used is not meant to provide infix suggestions.
The fuzzy suggester is working on a fuzzy basis , with the
 *starting*
   terms
of a field content.
   
What you are looking for is actually one of the Infix Suggesters.
For example the AnalyzingInfixLookup approach.
   
When working with Suggesters is important first to make a
 distinction
  :
   
1) Returning the full content of the field ( analysisInfix or
 Fuzzy)
   
2) Returning token(s) ( Free Text Suggester)
   
Then the second difference is :
   
1) Infix suggestions ( from the middle of the field content)
2) Classic suggester ( from the beginning of the field content)
   
Clarified that, will be quite simple to work with suggesters.
   
Cheers
   
2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo 
 edwinye...@gmail.com:
   
 I've indexed a rich-text documents with the following content:

 This is a testing rich text documents to test the uploading of
  files to
 Solr


 When I tried to use the suggestion, it return me the entire
 field in
   the
 content once I enter suggest?q=t. However, when I tried to search
  for
 q='rich', I don't get any results returned.

 This is my current configuration for the suggester:
 searchComponent name=suggest 

Re: Solr's suggester results

2015-06-16 Thread Alessandro Benedetti
in line :

2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thanks Benedetti,

 I've change to the AnalyzingInfixLookup approach, and it is able to start
 searching from the middle of the field.

 However, is it possible to make the suggester to show only part of the
 content of the field (like 2 or 3 fields after), instead of the entire
 content/sentence, which can be quite long?


I assume you use fields in the place of tokens.
The answer is yes, I already said that in my previous mail, I invite you to
read carefully the answers and the documentation linked !

Related the excessive dimensions of tokens. This is weird, what are you
trying to autocomplete ?
I really doubt would be useful for a user to see super long auto completed
terms.

Cheers



 Regards,
 Edwin



 On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com
 
 wrote:

  ehehe Edwin, I think you should read again the document I linked time
 ago :
 
  http://lucidworks.com/blog/solr-suggester/
 
  The suggester you used is not meant to provide infix suggestions.
  The fuzzy suggester is working on a fuzzy basis , with the *starting*
 terms
  of a field content.
 
  What you are looking for is actually one of the Infix Suggesters.
  For example the AnalyzingInfixLookup approach.
 
  When working with Suggesters is important first to make a distinction :
 
  1) Returning the full content of the field ( analysisInfix or Fuzzy)
 
  2) Returning token(s) ( Free Text Suggester)
 
  Then the second difference is :
 
  1) Infix suggestions ( from the middle of the field content)
  2) Classic suggester ( from the beginning of the field content)
 
  Clarified that, will be quite simple to work with suggesters.
 
  Cheers
 
  2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   I've indexed a rich-text documents with the following content:
  
   This is a testing rich text documents to test the uploading of files to
   Solr
  
  
   When I tried to use the suggestion, it return me the entire field in
 the
   content once I enter suggest?q=t. However, when I tried to search for
   q='rich', I don't get any results returned.
  
   This is my current configuration for the suggester:
   searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldSuggestion/str
   str name=suggestAnalyzerFieldTypesuggestType/str
   str name=buildOnStartuptrue/str
   str name=buildOnCommitfalse/str
 /lst
   /searchComponent
  
   requestHandler name=/suggest class=solr.SearchHandler
  startup=lazy 
 lst name=defaults
   str name=wtjson/str
   str name=indenttrue/str
  
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler
  
   Is it possible to allow the suggester to return something even from the
   middle of the sentence, and also not to return the entire sentence if
 the
   sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
   return more fields as the users type.
  
   For example,
   When user type 'this', it should return 'This is a testing'
   When user type 'this is a testing', it should return 'This is a testing
   rich text documents'.
  
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr's suggester results

2015-06-15 Thread Alessandro Benedetti
ehehe Edwin, I think you should read again the document I linked time ago :

http://lucidworks.com/blog/solr-suggester/

The suggester you used is not meant to provide infix suggestions.
The fuzzy suggester is working on a fuzzy basis , with the *starting* terms
of a field content.

What you are looking for is actually one of the Infix Suggesters.
For example the AnalyzingInfixLookup approach.

When working with Suggesters is important first to make a distinction :

1) Returning the full content of the field ( analysisInfix or Fuzzy)

2) Returning token(s) ( Free Text Suggester)

Then the second difference is :

1) Infix suggestions ( from the middle of the field content)
2) Classic suggester ( from the beginning of the field content)

Clarified that, will be quite simple to work with suggesters.

Cheers

2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 I've indexed a rich-text documents with the following content:

 This is a testing rich text documents to test the uploading of files to
 Solr


 When I tried to use the suggestion, it return me the entire field in the
 content once I enter suggest?q=t. However, when I tried to search for
 q='rich', I don't get any results returned.

 This is my current configuration for the suggester:
 searchComponent name=suggest class=solr.SuggestComponent
   lst name=suggester
 str name=namemySuggester/str
 str name=lookupImplFuzzyLookupFactory/str
 str name=dictionaryImplDocumentDictionaryFactory/str
 str name=fieldSuggestion/str
 str name=suggestAnalyzerFieldTypesuggestType/str
 str name=buildOnStartuptrue/str
 str name=buildOnCommitfalse/str
   /lst
 /searchComponent

 requestHandler name=/suggest class=solr.SearchHandler startup=lazy 
   lst name=defaults
 str name=wtjson/str
 str name=indenttrue/str

 str name=suggesttrue/str
 str name=suggest.count10/str
 str name=suggest.dictionarymySuggester/str
   /lst
   arr name=components
 strsuggest/str
   /arr
 /requestHandler

 Is it possible to allow the suggester to return something even from the
 middle of the sentence, and also not to return the entire sentence if the
 sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
 return more fields as the users type.

 For example,
 When user type 'this', it should return 'This is a testing'
 When user type 'this is a testing', it should return 'This is a testing
 rich text documents'.


 Regards,
 Edwin




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr's suggester results

2015-06-15 Thread Zheng Lin Edwin Yeo
Thanks Benedetti,

I've change to the AnalyzingInfixLookup approach, and it is able to start
searching from the middle of the field.

However, is it possible to make the suggester to show only part of the
content of the field (like 2 or 3 fields after), instead of the entire
content/sentence, which can be quite long?


Regards,
Edwin



On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 ehehe Edwin, I think you should read again the document I linked time ago :

 http://lucidworks.com/blog/solr-suggester/

 The suggester you used is not meant to provide infix suggestions.
 The fuzzy suggester is working on a fuzzy basis , with the *starting* terms
 of a field content.

 What you are looking for is actually one of the Infix Suggesters.
 For example the AnalyzingInfixLookup approach.

 When working with Suggesters is important first to make a distinction :

 1) Returning the full content of the field ( analysisInfix or Fuzzy)

 2) Returning token(s) ( Free Text Suggester)

 Then the second difference is :

 1) Infix suggestions ( from the middle of the field content)
 2) Classic suggester ( from the beginning of the field content)

 Clarified that, will be quite simple to work with suggesters.

 Cheers

 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  I've indexed a rich-text documents with the following content:
 
  This is a testing rich text documents to test the uploading of files to
  Solr
 
 
  When I tried to use the suggestion, it return me the entire field in the
  content once I enter suggest?q=t. However, when I tried to search for
  q='rich', I don't get any results returned.
 
  This is my current configuration for the suggester:
  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
  str name=namemySuggester/str
  str name=lookupImplFuzzyLookupFactory/str
  str name=dictionaryImplDocumentDictionaryFactory/str
  str name=fieldSuggestion/str
  str name=suggestAnalyzerFieldTypesuggestType/str
  str name=buildOnStartuptrue/str
  str name=buildOnCommitfalse/str
/lst
  /searchComponent
 
  requestHandler name=/suggest class=solr.SearchHandler
 startup=lazy 
lst name=defaults
  str name=wtjson/str
  str name=indenttrue/str
 
  str name=suggesttrue/str
  str name=suggest.count10/str
  str name=suggest.dictionarymySuggester/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler
 
  Is it possible to allow the suggester to return something even from the
  middle of the sentence, and also not to return the entire sentence if the
  sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
  return more fields as the users type.
 
  For example,
  When user type 'this', it should return 'This is a testing'
  When user type 'this is a testing', it should return 'This is a testing
  rich text documents'.
 
 
  Regards,
  Edwin
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Solr's suggester results

2015-06-15 Thread Zheng Lin Edwin Yeo
Also, is there a way to overcome the long content problem?

I'm getting this error when I've indexed large rich-text documents and
tried to build the suggester.

*{*
*  responseHeader:{*
*status:500,*
*QTime:47},*
*  error:{*
*msg:Document contains at least one immense term in
field=\exacttext\ (whose UTF8 encoding is longer than the max length
32766), all of which were skipped.  Please correct the analyzer to not
produce such terms.  The prefix of the first immense term is: '[32, 10, 32,
10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10,
32, 32, 10, 32, 32, 10, 32, 32]...', original message: bytes can be at most
32766 in length; got 139402,*
*trace:java.lang.IllegalArgumentException: Document contains at
least one immense term in field=\exacttext\ (whose UTF8 encoding is
longer than the max length 32766), all of which were skipped.  Please
correct the analyzer to not produce such terms.  The prefix of the first
immense term is: '[32, 10, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32,
32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32, 10, 32, 32]...',
original message: bytes can be at most 32766 in length; got 139402\r\n\tat
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)\r\n\tat
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)\r\n\tat
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)\r\n\tat
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458)\r\n\tat
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350)\r\n\tat
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1138)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.add(AnalyzingInfixSuggester.java:381)\r\n\tat
org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester.build(AnalyzingInfixSuggester.java:310)\r\n\tat
org.apache.lucene.search.suggest.Lookup.build(Lookup.java:193)\r\n\tat
org.apache.solr.spelling.suggest.SolrSuggester.build(SolrSuggester.java:163)\r\n\tat
org.apache.solr.handler.component.SuggestComponent.prepare(SuggestComponent.java:179)\r\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:196)\r\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\r\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\r\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)\r\n\tat
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:294)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\r\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\r\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\r\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\r\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\r\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\r\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\r\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\r\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\r\n\tat
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\r\n\tat
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\r\n\tat