mapreduce job using soirj 5

2015-06-16 Thread adfel70
Hi, 

We recently started testing solr 5, our indexer creates mapreduce job that
uses solrj5 to index documents to our SolrCloud. Until now, we used solr
4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
with httpclient-4.2.5
and that causing us jar-hell because hadoop jars are being loaded first and
solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5

Does anyone encounter that? and have a solution? or a workaround?

Right now we are replacing the jar physically in each data node





--
View this message in context: 
http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi,

I have a requirement to create the concatenated token of all the tokens
created from the last item of my analyzer chain.

*Suppose my analyzer chain is :*





* tokenizer class=solr.WhitespaceTokenizerFactory /  filter
class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1
preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
minGramSize=2 maxGramSize=15 side=front /filter
class=solr.PorterStemmerFilterFactory/*
I want to create a concatenated token plugin to add at concatenated token
along with the last token.

e.g. Solr training

*Porter:-*  solr  train
  Position 1 2

*Concatenated :-*   solr  train
   solrtrain
   Position 1  2

Please help me out. How to create custom filter for this requirement.

With Regards
Aman Tandon


Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Toke Eskildsen
Shenghua(Daniel) Wan wansheng...@gmail.com wrote:
 Actually, I am currently interested in how to boost merging/optimizing
 performance of single solr instance.

We have the same challenge (we build static 900GB shards one at a time and the 
final optimization takes 8 hours with only 1 CPU core at 100%). I know that 
there is code for detecting SSDs, which should make merging faster (by running 
more merges in parallel?), but I am afraid that optimize (a single merge) is 
always single threaded.

It seems to me that at least some of the different files making up a segment 
could be created in parallel, but I do not know how hard it would be to do so.

- Toke Eskildsen


Re: How to create concatenated token

2015-06-16 Thread Alessandro Benedetti
Can I ask you why you need to concatenate the tokens ? Maybe we can find a
better solution to concat all the tokens in one single big token .
I find it difficult to understand the reasons behind tokenising, token
filtering and then un-tokenizing again :)
It would be great if you explain a little bit better what you would like to
do !


Cheers

2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Hi,

 I have a requirement to create the concatenated token of all the tokens
 created from the last item of my analyzer chain.

 *Suppose my analyzer chain is :*





 * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
 class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1
 preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
 minGramSize=2 maxGramSize=15 side=front /filter
 class=solr.PorterStemmerFilterFactory/*
 I want to create a concatenated token plugin to add at concatenated token
 along with the last token.

 e.g. Solr training

 *Porter:-*  solr  train
   Position 1 2

 *Concatenated :-*   solr  train
solrtrain
Position 1  2

 Please help me out. How to create custom filter for this requirement.

 With Regards
 Aman Tandon




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Sznajder ForMailingList
Hi,

I was testing the highlight feature and played with the techproducts
example.
It appears that the highlighting works on Mozilla Firefox, but not on
Google Chrome.

For your information

Benjamin


Re: Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread Erick Erickson
Did you look in the example schema files? None of them have
_version_ set as docValues.

Best,
Erick

On Tue, Jun 16, 2015 at 1:44 AM, forest_soup tanglin0...@gmail.com wrote:
 For the _version_ field in the schema.xml, do we need to set it be
 docValues=true?
field name=_version_ type=long indexed=true stored=true/

 As we noticed there are FieldCache for _version_ in the solr stats:
 http://lucene.472066.n3.nabble.com/file/n4212123/IMAGE%245A8381797719FDA9.jpg



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Do-we-need-to-add-docValues-true-to-version-field-in-schema-xml-tp4212123.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to create concatenated token

2015-06-16 Thread Aman Tandon

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)


typo error
e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training)

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com
wrote:

 We has some business logic to search the user query in user intent or
 finding the exact matching products.

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)

 As we can see it is phrase query so it will took more time than the single
 stemmed token query. There are also 5-7 words phrase query. So we want to
 reduce the search time by implementing this feature.

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 Can I ask you why you need to concatenate the tokens ? Maybe we can find a
 better solution to concat all the tokens in one single big token .
 I find it difficult to understand the reasons behind tokenising, token
 filtering and then un-tokenizing again :)
 It would be great if you explain a little bit better what you would like
 to
 do !


 Cheers

 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I have a requirement to create the concatenated token of all the tokens
  created from the last item of my analyzer chain.
 
  *Suppose my analyzer chain is :*
 
 
 
 
 
  * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
  class=solr.WordDelimiterFilterFactory catenateAll=1
 splitOnNumerics=1
  preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=15 side=front /filter
  class=solr.PorterStemmerFilterFactory/*
  I want to create a concatenated token plugin to add at concatenated
 token
  along with the last token.
 
  e.g. Solr training
 
  *Porter:-*  solr  train
Position 1 2
 
  *Concatenated :-*   solr  train
 solrtrain
 Position 1  2
 
  Please help me out. How to create custom filter for this requirement.
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England





Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..

This is an XY problem IMO. Please describe exactly what
you're trying to accomplish, with examples rather than
continue to pursue this path. It sounds like you want
spellcheck or similar. The _point_ behind the
suggesters is that they handle multiple-word suggestions
by returning he whole field. So putting long text fields
into them is not going to work.

Best,
Erick

On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 in line :

 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thanks Benedetti,

 I've change to the AnalyzingInfixLookup approach, and it is able to start
 searching from the middle of the field.

 However, is it possible to make the suggester to show only part of the
 content of the field (like 2 or 3 fields after), instead of the entire
 content/sentence, which can be quite long?


 I assume you use fields in the place of tokens.
 The answer is yes, I already said that in my previous mail, I invite you to
 read carefully the answers and the documentation linked !

 Related the excessive dimensions of tokens. This is weird, what are you
 trying to autocomplete ?
 I really doubt would be useful for a user to see super long auto completed
 terms.

 Cheers



 Regards,
 Edwin



 On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com
 
 wrote:

  ehehe Edwin, I think you should read again the document I linked time
 ago :
 
  http://lucidworks.com/blog/solr-suggester/
 
  The suggester you used is not meant to provide infix suggestions.
  The fuzzy suggester is working on a fuzzy basis , with the *starting*
 terms
  of a field content.
 
  What you are looking for is actually one of the Infix Suggesters.
  For example the AnalyzingInfixLookup approach.
 
  When working with Suggesters is important first to make a distinction :
 
  1) Returning the full content of the field ( analysisInfix or Fuzzy)
 
  2) Returning token(s) ( Free Text Suggester)
 
  Then the second difference is :
 
  1) Infix suggestions ( from the middle of the field content)
  2) Classic suggester ( from the beginning of the field content)
 
  Clarified that, will be quite simple to work with suggesters.
 
  Cheers
 
  2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   I've indexed a rich-text documents with the following content:
  
   This is a testing rich text documents to test the uploading of files to
   Solr
  
  
   When I tried to use the suggestion, it return me the entire field in
 the
   content once I enter suggest?q=t. However, when I tried to search for
   q='rich', I don't get any results returned.
  
   This is my current configuration for the suggester:
   searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldSuggestion/str
   str name=suggestAnalyzerFieldTypesuggestType/str
   str name=buildOnStartuptrue/str
   str name=buildOnCommitfalse/str
 /lst
   /searchComponent
  
   requestHandler name=/suggest class=solr.SearchHandler
  startup=lazy 
 lst name=defaults
   str name=wtjson/str
   str name=indenttrue/str
  
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler
  
   Is it possible to allow the suggester to return something even from the
   middle of the sentence, and also not to return the entire sentence if
 the
   sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
   return more fields as the users type.
  
   For example,
   When user type 'this', it should return 'This is a testing'
   When user type 'this is a testing', it should return 'This is a testing
   rich text documents'.
  
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England


Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yep seems that’s the answer. The highlighting is done separately by the
rails app, so I’ll look into proper solr highlighting.

thanks a lot for the use of your ears, much improved understanding!

cheers,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:33, Erick Erickson erickerick...@gmail.com wrote:

Hmmm. First, highlighting should work here. If you have it configured
to work  on the dc.description field.

As to whether the phrase management changes is near enough, I
pretty much guarantee it is. This is where the admin/analysis page can
answer this type of question authoritatively since it's based exactly
on your particular analysis chain.

Best,
Erick

On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yes prolly not a bug. The highlighting is on but nothing is highlighted.
 Perhaps this text is triggering it?

 'consider the impacts of land management changes’

 that would seem reasonable. It’s not a direct match so no highlighting
 (the highlighting does work on a direct match) but 'management changes’
 must be near enough ‘manage change’ to trigger a result.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote:

I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query
?

 From the documentation

 Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase.
For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag
chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 

Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query ?

 From the documentation

 Lucene supports finding words are a within a specific distance away. To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase. For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time35.0/double
   /lst
 /lst
   /lst
 /lst


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

 Can you show us how the query is parsed ?
 You didn't tell us nothing about the query parser you are using.
 Enable the debugQuery=true will show you how the query is parsed and this
 will be quite useful for us.
 
 
 Cheers
 
 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:
 
  Hiya,
 
  I've been looking for documentation that would point to where I could
  modify or explain why 'near neighbours' are returned from a phrase
 search.
  If I search for:
 
  manage change
 
  I get back a document that contains this will help in your management
 of
  lots more words... changes. It's relevant but I'd like to understand
 why
  solr is returning it. Is it a combination of fuzzy/slop? The distance
  between the two variations of the two words in the document is quite
 large.
 
  thanks,
 
  Alistair
 
  --
  mov eax,1
  mov ebx,0
  int 80h
 
 
 
 
 --
 --
 
 Benedetti Alessandro
 Visiting card : 

Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
Hmmm. First, highlighting should work here. If you have it configured
to work  on the dc.description field.

As to whether the phrase management changes is near enough, I
pretty much guarantee it is. This is where the admin/analysis page can
answer this type of question authoritatively since it's based exactly
on your particular analysis chain.

Best,
Erick

On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young
alistair.yo...@uhi.ac.uk wrote:
 yes prolly not a bug. The highlighting is on but nothing is highlighted.
 Perhaps this text is triggering it?

 'consider the impacts of land management changes’

 that would seem reasonable. It’s not a direct match so no highlighting
 (the highlighting does work on a direct match) but 'management changes’
 must be near enough ‘manage change’ to trigger a result.

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote:

I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query ?

 From the documentation

 Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase.
For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag
chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time35.0/double
   /lst
 /lst
   /lst
 /lst


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, Alessandro Benedetti
benedetti.ale...@gmail.com
 wrote:

 Can you show us how the query is parsed ?
 You 

Re: mapreduce job using soirj 5

2015-06-16 Thread Erick Erickson
Sounds like a question better asked in one of the Cloudera support
forums, 'cause all I can do is guess ;).

I suppose, theoretically, that you could check out the Solr5
code and substitute the httpclient-4.2.5.jar in the build system,
recompile and go, but that's totally a guess based on zero knowledge
of whether compiling Solr with an earlier httpclient would even work.
Frankly, though, that sounds like more work than distributing the older
jar to the data nodes.

Best,
Erick

On Tue, Jun 16, 2015 at 7:23 AM, adfel70 adfe...@gmail.com wrote:
 Hi,

 We recently started testing solr 5, our indexer creates mapreduce job that
 uses solrj5 to index documents to our SolrCloud. Until now, we used solr
 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

 The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
 with httpclient-4.2.5
 and that causing us jar-hell because hadoop jars are being loaded first and
 solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5

 Does anyone encounter that? and have a solution? or a workaround?

 Right now we are replacing the jar physically in each data node





 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yes prolly not a bug. The highlighting is on but nothing is highlighted.
Perhaps this text is triggering it?

'consider the impacts of land management changes’

that would seem reasonable. It’s not a direct match so no highlighting
(the highlighting does work on a direct match) but 'management changes’
must be near enough ‘manage change’ to trigger a result.

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:18, Erick Erickson erickerick...@gmail.com wrote:

I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1 There's something interesting about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2 it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. are managers changing? for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3 you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the verbose check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
looks correct

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query ?

 From the documentation

 Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, ~, symbol at the end of a Phrase.
For
 example to search for a apache and jakarta within 10 words of each
 other in a document use the search:

 jakarta apache~10 


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag
chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time35.0/double
   /lst
 /lst
   /lst
 /lst


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, Alessandro Benedetti
benedetti.ale...@gmail.com
 wrote:

 Can you show us how the query is parsed ?
 You didn't tell us nothing about the query parser you are using.
 Enable the debugQuery=true will show you how the query is parsed and
this
 will be quite useful for us.
 
 
 Cheers
 
 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:
 
  Hiya,
 
  I've been looking for documentation that would point to where I
could
  modify or explain why 'near neighbours' are returned from a phrase
 search.
  If I search for:
 
  manage change
 
  I 

TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
Hi, there's a guy who's already asked a question similar to this and I'm
basically going off what he did here. It's exactly what I'm doing which is
taking a file path from a database and using TikaEntityProcessor to analyze
the document. The link to his question is here. 

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a3524905
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a3524905
  

His problem was version issues with Tika but I'm using a version that is
about five years older so I'm not sure if it's still issues with the current
version of Tika or if I'm missing something extremely obvious (which is
possible I'm extremely new to Solr) This is my data configuration.
TextContentURL is the filepath!

dataConfig 
  dataSource name=ds-db type=JdbcDataSource
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost:3306/EDMS_Metadata user=root
password=** / 
  dataSource name=ds-file type=BinFileDataSource/ 

 document name=doc1 
entity name=db-data dataSource=ds-db  query=select TextContentURL 
as
'id',ID,Title,AuthorCreator from MasterIndex  
field column=TextContentURL name=id / 
field column=Title name=title / 
/entity 
entity name=file dataSource=ds-file processor=TikaEntityProcessor
url=${db-data.TextContentURL} format=text
 field column=text name=text /
/entity 
  /document 
/dataConfig 

I'd like to note that when I delete the second entity and just run the
database draw it works fine. I can run and query and I get this output when
I run a faceted search

 response: {
numFound: 283,
start: 0,
docs: [
  {
id: /home/paden/Documents/LWP_Files/BIGDATA/6220106.pdf,
title: ENGINEERING INITIATION,
  },

This means that it is pulling the document filepath JUST FINE. The id is the
correct filepath. But when I re-add the second entity it logs errors saying
it can't find the file? Am I missing something obvious? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
We has some business logic to search the user query in user intent or
finding the exact matching products.

e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)

As we can see it is phrase query so it will took more time than the single
stemmed token query. There are also 5-7 words phrase query. So we want to
reduce the search time by implementing this feature.

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 Can I ask you why you need to concatenate the tokens ? Maybe we can find a
 better solution to concat all the tokens in one single big token .
 I find it difficult to understand the reasons behind tokenising, token
 filtering and then un-tokenizing again :)
 It would be great if you explain a little bit better what you would like to
 do !


 Cheers

 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I have a requirement to create the concatenated token of all the tokens
  created from the last item of my analyzer chain.
 
  *Suppose my analyzer chain is :*
 
 
 
 
 
  * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
  class=solr.WordDelimiterFilterFactory catenateAll=1
 splitOnNumerics=1
  preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=15 side=front /filter
  class=solr.PorterStemmerFilterFactory/*
  I want to create a concatenated token plugin to add at concatenated token
  along with the last token.
 
  e.g. Solr training
 
  *Porter:-*  solr  train
Position 1 2
 
  *Concatenated :-*   solr  train
 solrtrain
 Position 1  2
 
  Please help me out. How to create custom filter for this requirement.
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: mapreduce job using soirj 5

2015-06-16 Thread Shawn Heisey
On 6/16/2015 9:24 AM, Erick Erickson wrote:
 Sounds like a question better asked in one of the Cloudera support
 forums, 'cause all I can do is guess ;).

 I suppose, theoretically, that you could check out the Solr5
 code and substitute the httpclient-4.2.5.jar in the build system,
 recompile and go, but that's totally a guess based on zero knowledge
 of whether compiling Solr with an earlier httpclient would even work.
 Frankly, though, that sounds like more work than distributing the older
 jar to the data nodes.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 7:23 AM, adfel70 adfe...@gmail.com wrote:
 Hi,

 We recently started testing solr 5, our indexer creates mapreduce job that
 uses solrj5 to index documents to our SolrCloud. Until now, we used solr
 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

 The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
 with httpclient-4.2.5

In addition to what Erick said:  When I upgraded the build system in
Solr to from HttpClient 4.2 to 4.3, no code changes were required.  It
worked immediately, and all tests passed.  It is likely that you can
simply use HttpClient 4.3.1 everywhere and hadoop will work properly. 
This is one of Apache's design goals for software libraries.  It's not
always possible to achieve it, but it is something we always try to do.

Thanks,
Shawn



Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Shenghua(Daniel) Wan
Hi, Toke,
Did you try MapReduce with solr? I think it should be a good fit for your
use case.

On Tue, Jun 16, 2015 at 5:02 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Shenghua(Daniel) Wan wansheng...@gmail.com wrote:
  Actually, I am currently interested in how to boost merging/optimizing
  performance of single solr instance.

 We have the same challenge (we build static 900GB shards one at a time and
 the final optimization takes 8 hours with only 1 CPU core at 100%). I know
 that there is code for detecting SSDs, which should make merging faster (by
 running more merges in parallel?), but I am afraid that optimize (a single
 merge) is always single threaded.

 It seems to me that at least some of the different files making up a
 segment could be created in parallel, but I do not know how hard it would
 be to do so.

 - Toke Eskildsen




-- 

Regards,
Shenghua (Daniel) Wan


Re: phrase matches returning near matches

2015-06-16 Thread Terry Rhodes
This might be an issue with your stemmer. management being stemmed to 
manage, changes being stemmed to change then the terms match. You 
can use the solr admin UI to test your indexing and query analysis 
chains to see if this is happening.



On 6/16/2015 3:22 AM, Alistair Young wrote:

Hiya,

I've been looking for documentation that would point to where I could modify or 
explain why 'near neighbours' are returned from a phrase search. If I search 
for:

manage change

I get back a document that contains this will help in your management of lots more 
words... changes. It's relevant but I'd like to understand why solr is returning it. 
Is it a combination of fuzzy/slop? The distance between the two variations of the two words in 
the document is quite large.

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
I thought it might be useful to list the logging errors as well. Here they
are. There are just three. 


WARN   FileDataSourceFileDataSource.basePath is empty. Resolving to:
/home/paden/Downloads/solr-5.1.0/server/.

ERRORDocBuilder

 Exception while processing: file document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.

ERROR  DataImporter

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241p4212252.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread Chris Hostetter
: For the _version_ field in the schema.xml, do we need to set it be
: docValues=true?

you *can* add docValues, but it is not required.

There is an open discussion about wether we should add docValues to 
the _version_ field (or even switch completely to indexed=false) in this 
jira...

https://issues.apache.org/jira/browse/SOLR-6337

...if you try it out and find it works better for you, please post a 
comment with your experiences and any annecdotal performance impacts you 
notice.  (real world use cases/observations are always helpful)



-Hoss
http://www.lucidworks.com/


Re: mapreduce job using soirj 5

2015-06-16 Thread Shenghua(Daniel) Wan
Hadoop has a switch that lets you use your jar rather than the one hadoop
carries.
google for HADOOP_OPTS
good luck.

On Tue, Jun 16, 2015 at 7:23 AM, adfel70 adfe...@gmail.com wrote:

 Hi,

 We recently started testing solr 5, our indexer creates mapreduce job that
 uses solrj5 to index documents to our SolrCloud. Until now, we used solr
 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

 The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
 with httpclient-4.2.5
 and that causing us jar-hell because hadoop jars are being loaded first and
 solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5

 Does anyone encounter that? and have a solution? or a workaround?

 Right now we are replacing the jar physically in each data node





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 

Regards,
Shenghua (Daniel) Wan


Re: Facet on same field in different ways

2015-06-16 Thread Chris Hostetter

: Have you tried this syntax ?
: 
: facet=truefacet.field={!ex=st key=terms facet.limit=5
: facet.prefix=ap}query_termsfacet.field={!key=terms2
: facet.limit=1}query_termsrows=0facet.mincount=1
: 
: This seems the proper syntax, I found it here :

yeah, local params are supported for specifying facet options like this.  
Aparently it never got documented, but i've added a comment to the 
Faceting page with techproducts example anyone can try with solr out ofthe 
box...

https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=58851733#comment-58851733




-Hoss
http://www.lucidworks.com/


Re: Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Upayavira
I think it makes it bold on bold, which won't be particularly visible.

On Tue, Jun 16, 2015, at 06:52 AM, Sznajder ForMailingList wrote:
 Hi,
 
 I was testing the highlight feature and played with the techproducts
 example.
 It appears that the highlighting works on Mozilla Firefox, but not on
 Google Chrome.
 
 For your information
 
 Benjamin


Re: Facet on same field in different ways

2015-06-16 Thread Phanindra R
Thanks guys. The syntax  facet.field={!key=abc
facet.limit=10}facetFieldName works.

On Tue, Jun 16, 2015 at 11:22 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Have you tried this syntax ?
 :
 : facet=truefacet.field={!ex=st key=terms facet.limit=5
 : facet.prefix=ap}query_termsfacet.field={!key=terms2
 : facet.limit=1}query_termsrows=0facet.mincount=1
 :
 : This seems the proper syntax, I found it here :

 yeah, local params are supported for specifying facet options like this.
 Aparently it never got documented, but i've added a comment to the
 Faceting page with techproducts example anyone can try with solr out ofthe
 box...


 https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=58851733#comment-58851733




 -Hoss
 http://www.lucidworks.com/



Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi,

Any guesses, how could I achieve this behaviour.

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com
wrote:

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)


 typo error
 e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training)

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com
 wrote:

 We has some business logic to search the user query in user intent or
 finding the exact matching products.

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)

 As we can see it is phrase query so it will took more time than the
 single stemmed token query. There are also 5-7 words phrase query. So we
 want to reduce the search time by implementing this feature.

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 Can I ask you why you need to concatenate the tokens ? Maybe we can find
 a
 better solution to concat all the tokens in one single big token .
 I find it difficult to understand the reasons behind tokenising, token
 filtering and then un-tokenizing again :)
 It would be great if you explain a little bit better what you would like
 to
 do !


 Cheers

 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I have a requirement to create the concatenated token of all the tokens
  created from the last item of my analyzer chain.
 
  *Suppose my analyzer chain is :*
 
 
 
 
 
  * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
  class=solr.WordDelimiterFilterFactory catenateAll=1
 splitOnNumerics=1
  preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=15 side=front /filter
  class=solr.PorterStemmerFilterFactory/*
  I want to create a concatenated token plugin to add at concatenated
 token
  along with the last token.
 
  e.g. Solr training
 
  *Porter:-*  solr  train
Position 1 2
 
  *Concatenated :-*   solr  train
 solrtrain
 Position 1  2
 
  Please help me out. How to create custom filter for this requirement.
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England






Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
The long content is from when I tried to index PDF files. As some PDF files
has alot of words in the content, it will lead to the *UTF8 encoding is
longer than the max length 32766 error.*

I think the problem is the content size of the PDF file exceed 32766
characters?

I'm trying to accomplish to be able to index documents that can be of any
size (even those with very large contents), and build the suggester from
there. Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence.



Regards,
Edwin


On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote:

 The suggesters are built to return whole fields. You _might_
 be able to add multiple fragments to a multiValued
 entry and get fragments, I haven't tried that though
 and I suspect that actually you'd get the same thing..

 This is an XY problem IMO. Please describe exactly what
 you're trying to accomplish, with examples rather than
 continue to pursue this path. It sounds like you want
 spellcheck or similar. The _point_ behind the
 suggesters is that they handle multiple-word suggestions
 by returning he whole field. So putting long text fields
 into them is not going to work.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  in line :
 
  2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  Thanks Benedetti,
 
  I've change to the AnalyzingInfixLookup approach, and it is able to
 start
  searching from the middle of the field.
 
  However, is it possible to make the suggester to show only part of the
  content of the field (like 2 or 3 fields after), instead of the entire
  content/sentence, which can be quite long?
 
 
  I assume you use fields in the place of tokens.
  The answer is yes, I already said that in my previous mail, I invite you
 to
  read carefully the answers and the documentation linked !
 
  Related the excessive dimensions of tokens. This is weird, what are you
  trying to autocomplete ?
  I really doubt would be useful for a user to see super long auto
 completed
  terms.
 
  Cheers
 
 
 
  Regards,
  Edwin
 
 
 
  On 15 June 2015 at 17:33, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
  wrote:
 
   ehehe Edwin, I think you should read again the document I linked time
  ago :
  
   http://lucidworks.com/blog/solr-suggester/
  
   The suggester you used is not meant to provide infix suggestions.
   The fuzzy suggester is working on a fuzzy basis , with the *starting*
  terms
   of a field content.
  
   What you are looking for is actually one of the Infix Suggesters.
   For example the AnalyzingInfixLookup approach.
  
   When working with Suggesters is important first to make a distinction
 :
  
   1) Returning the full content of the field ( analysisInfix or Fuzzy)
  
   2) Returning token(s) ( Free Text Suggester)
  
   Then the second difference is :
  
   1) Infix suggestions ( from the middle of the field content)
   2) Classic suggester ( from the beginning of the field content)
  
   Clarified that, will be quite simple to work with suggesters.
  
   Cheers
  
   2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
I've indexed a rich-text documents with the following content:
   
This is a testing rich text documents to test the uploading of
 files to
Solr
   
   
When I tried to use the suggestion, it return me the entire field in
  the
content once I enter suggest?q=t. However, when I tried to search
 for
q='rich', I don't get any results returned.
   
This is my current configuration for the suggester:
searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
str name=namemySuggester/str
str name=lookupImplFuzzyLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=suggestAnalyzerFieldTypesuggestType/str
str name=buildOnStartuptrue/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent
   
requestHandler name=/suggest class=solr.SearchHandler
   startup=lazy 
  lst name=defaults
str name=wtjson/str
str name=indenttrue/str
   
str name=suggesttrue/str
str name=suggest.count10/str
str name=suggest.dictionarymySuggester/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler
   
Is it possible to allow the suggester to return something even from
 the
middle of the sentence, and also not to return the entire sentence
 if
  the
sentence. Perhaps it should just suggest the next 2 or 3 fields,
 and to
return more fields as the users type.
   
For example,
When user type 'this', it should return 'This is a testing'
When user type 'this is a testing', it should return 'This is a
 testing
rich text documents'.
   
   
Regards,
Edwin
   
  
  
  
   --
   --
  
   Benedetti 

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
Have you looked at spellchecker? Because that sound much more like
what you're asking about than suggester.

Spell checking is more what you're asking for, have you even looked at that
after it was suggested?

bq: Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence

This is what highlighting is built for.

Really, I recommend you take the time to do some familiarization with the
whole search space and Solr. The excellent book here:

http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E

will give you the grounding you need to get the most out of Solr.

Best,
Erick

On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 The long content is from when I tried to index PDF files. As some PDF files
 has alot of words in the content, it will lead to the *UTF8 encoding is
 longer than the max length 32766 error.*

 I think the problem is the content size of the PDF file exceed 32766
 characters?

 I'm trying to accomplish to be able to index documents that can be of any
 size (even those with very large contents), and build the suggester from
 there. Also, when I do a search, it shouldn't be returning whole fields,
 but just to return a portion of the sentence.



 Regards,
 Edwin


 On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com wrote:

 The suggesters are built to return whole fields. You _might_
 be able to add multiple fragments to a multiValued
 entry and get fragments, I haven't tried that though
 and I suspect that actually you'd get the same thing..

 This is an XY problem IMO. Please describe exactly what
 you're trying to accomplish, with examples rather than
 continue to pursue this path. It sounds like you want
 spellcheck or similar. The _point_ behind the
 suggesters is that they handle multiple-word suggestions
 by returning he whole field. So putting long text fields
 into them is not going to work.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 benedetti.ale...@gmail.com wrote:
  in line :
 
  2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
  Thanks Benedetti,
 
  I've change to the AnalyzingInfixLookup approach, and it is able to
 start
  searching from the middle of the field.
 
  However, is it possible to make the suggester to show only part of the
  content of the field (like 2 or 3 fields after), instead of the entire
  content/sentence, which can be quite long?
 
 
  I assume you use fields in the place of tokens.
  The answer is yes, I already said that in my previous mail, I invite you
 to
  read carefully the answers and the documentation linked !
 
  Related the excessive dimensions of tokens. This is weird, what are you
  trying to autocomplete ?
  I really doubt would be useful for a user to see super long auto
 completed
  terms.
 
  Cheers
 
 
 
  Regards,
  Edwin
 
 
 
  On 15 June 2015 at 17:33, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  
  wrote:
 
   ehehe Edwin, I think you should read again the document I linked time
  ago :
  
   http://lucidworks.com/blog/solr-suggester/
  
   The suggester you used is not meant to provide infix suggestions.
   The fuzzy suggester is working on a fuzzy basis , with the *starting*
  terms
   of a field content.
  
   What you are looking for is actually one of the Infix Suggesters.
   For example the AnalyzingInfixLookup approach.
  
   When working with Suggesters is important first to make a distinction
 :
  
   1) Returning the full content of the field ( analysisInfix or Fuzzy)
  
   2) Returning token(s) ( Free Text Suggester)
  
   Then the second difference is :
  
   1) Infix suggestions ( from the middle of the field content)
   2) Classic suggester ( from the beginning of the field content)
  
   Clarified that, will be quite simple to work with suggesters.
  
   Cheers
  
   2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
I've indexed a rich-text documents with the following content:
   
This is a testing rich text documents to test the uploading of
 files to
Solr
   
   
When I tried to use the suggestion, it return me the entire field in
  the
content once I enter suggest?q=t. However, when I tried to search
 for
q='rich', I don't get any results returned.
   
This is my current configuration for the suggester:
searchComponent name=suggest class=solr.SuggestComponent
  lst name=suggester
str name=namemySuggester/str
str name=lookupImplFuzzyLookupFactory/str
str name=dictionaryImplDocumentDictionaryFactory/str
str name=fieldSuggestion/str
str name=suggestAnalyzerFieldTypesuggestType/str
str name=buildOnStartuptrue/str
str name=buildOnCommitfalse/str
  /lst
/searchComponent
   
requestHandler name=/suggest class=solr.SearchHandler
   startup=lazy 
  lst 

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
Yes I've looked at that before, but I was told that the newer version of
Solr has its own suggester, and does not need to use spellchecker anymore?

So it's not necessary to use the spellechecker inside suggester anymore?

Regards,
Edwin


On 17 June 2015 at 11:56, Erick Erickson erickerick...@gmail.com wrote:

 Have you looked at spellchecker? Because that sound much more like
 what you're asking about than suggester.

 Spell checking is more what you're asking for, have you even looked at that
 after it was suggested?

 bq: Also, when I do a search, it shouldn't be returning whole fields,
 but just to return a portion of the sentence

 This is what highlighting is built for.

 Really, I recommend you take the time to do some familiarization with the
 whole search space and Solr. The excellent book here:


 http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8qid=1434513284sr=8-1keywords=apache+solrpebp=1434513287267perid=0YRK508J0HJ1N3BAX20E

 will give you the grounding you need to get the most out of Solr.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  The long content is from when I tried to index PDF files. As some PDF
 files
  has alot of words in the content, it will lead to the *UTF8 encoding is
  longer than the max length 32766 error.*
 
  I think the problem is the content size of the PDF file exceed 32766
  characters?
 
  I'm trying to accomplish to be able to index documents that can be of any
  size (even those with very large contents), and build the suggester from
  there. Also, when I do a search, it shouldn't be returning whole fields,
  but just to return a portion of the sentence.
 
 
 
  Regards,
  Edwin
 
 
  On 16 June 2015 at 23:02, Erick Erickson erickerick...@gmail.com
 wrote:
 
  The suggesters are built to return whole fields. You _might_
  be able to add multiple fragments to a multiValued
  entry and get fragments, I haven't tried that though
  and I suspect that actually you'd get the same thing..
 
  This is an XY problem IMO. Please describe exactly what
  you're trying to accomplish, with examples rather than
  continue to pursue this path. It sounds like you want
  spellcheck or similar. The _point_ behind the
  suggesters is that they handle multiple-word suggestions
  by returning he whole field. So putting long text fields
  into them is not going to work.
 
  Best,
  Erick
 
  On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
  benedetti.ale...@gmail.com wrote:
   in line :
  
   2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
   Thanks Benedetti,
  
   I've change to the AnalyzingInfixLookup approach, and it is able to
  start
   searching from the middle of the field.
  
   However, is it possible to make the suggester to show only part of
 the
   content of the field (like 2 or 3 fields after), instead of the
 entire
   content/sentence, which can be quite long?
  
  
   I assume you use fields in the place of tokens.
   The answer is yes, I already said that in my previous mail, I invite
 you
  to
   read carefully the answers and the documentation linked !
  
   Related the excessive dimensions of tokens. This is weird, what are
 you
   trying to autocomplete ?
   I really doubt would be useful for a user to see super long auto
  completed
   terms.
  
   Cheers
  
  
  
   Regards,
   Edwin
  
  
  
   On 15 June 2015 at 17:33, Alessandro Benedetti 
  benedetti.ale...@gmail.com
   
   wrote:
  
ehehe Edwin, I think you should read again the document I linked
 time
   ago :
   
http://lucidworks.com/blog/solr-suggester/
   
The suggester you used is not meant to provide infix suggestions.
The fuzzy suggester is working on a fuzzy basis , with the
 *starting*
   terms
of a field content.
   
What you are looking for is actually one of the Infix Suggesters.
For example the AnalyzingInfixLookup approach.
   
When working with Suggesters is important first to make a
 distinction
  :
   
1) Returning the full content of the field ( analysisInfix or
 Fuzzy)
   
2) Returning token(s) ( Free Text Suggester)
   
Then the second difference is :
   
1) Infix suggestions ( from the middle of the field content)
2) Classic suggester ( from the beginning of the field content)
   
Clarified that, will be quite simple to work with suggesters.
   
Cheers
   
2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo 
 edwinye...@gmail.com:
   
 I've indexed a rich-text documents with the following content:

 This is a testing rich text documents to test the uploading of
  files to
 Solr


 When I tried to use the suggestion, it return me the entire
 field in
   the
 content once I enter suggest?q=t. However, when I tried to search
  for
 q='rich', I don't get any results returned.

 This is my current configuration for the suggester:
 searchComponent name=suggest 

Joins with comma separated values

2015-06-16 Thread Advait Suhas Pandit
Hi,

We have some master data and some content data. Master data would be things 
like userid, name, email id etc.
Our content data for example is a blog.
The blog has certain fields which are comma separated ids that point to the 
master data.
E.g. UserIDs of people who have commented on a particular blog can be found in 
the blog table in a comma separated field of userids. Similarly userids of 
people who have liked the blog can be found in a comma separated field of 
userids.

How do I join this comma separated list of userids with the master data so that 
I can get the other details of the user such as name, email, picture etc?

Thanks,
Advait



Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi Erick,

Thank you so much, it will be helpful for me to learn how to save the state
of token. I has no idea of how to save state of previous tokens due to this
it was difficult to generate a concatenated token in the last.

So is there anything should I read to learn more about it.

With Regards
Aman Tandon

On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson erickerick...@gmail.com
wrote:

 I really question the premise, but have a look at:
 https://issues.apache.org/jira/browse/SOLR-7193

 Note that this is not committed and I haven't reviewed
 it so I don't have anything to say about that. And you'd
 have to implement it as a custom Filter.

 Best,
 Erick

 On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com
 wrote:
  Hi,
 
  Any guesses, how could I achieve this behaviour.
 
  With Regards
  Aman Tandon
 
  On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr
 training)
 
 
  typo error
  e.g. Intent for solr training: fq=id:(234 456 545) title:(solr
 training)
 
  With Regards
  Aman Tandon
 
  On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com
  wrote:
 
  We has some business logic to search the user query in user intent or
  finding the exact matching products.
 
  e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr
 training)
 
  As we can see it is phrase query so it will took more time than the
  single stemmed token query. There are also 5-7 words phrase query. So
 we
  want to reduce the search time by implementing this feature.
 
  With Regards
  Aman Tandon
 
  On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
  benedetti.ale...@gmail.com wrote:
 
  Can I ask you why you need to concatenate the tokens ? Maybe we can
 find
  a
  better solution to concat all the tokens in one single big token .
  I find it difficult to understand the reasons behind tokenising, token
  filtering and then un-tokenizing again :)
  It would be great if you explain a little bit better what you would
 like
  to
  do !
 
 
  Cheers
 
  2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:
 
   Hi,
  
   I have a requirement to create the concatenated token of all the
 tokens
   created from the last item of my analyzer chain.
  
   *Suppose my analyzer chain is :*
  
  
  
  
  
   * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
   class=solr.WordDelimiterFilterFactory catenateAll=1
  splitOnNumerics=1
   preserveOriginal=1/filter
 class=solr.EdgeNGramFilterFactory
   minGramSize=2 maxGramSize=15 side=front /filter
   class=solr.PorterStemmerFilterFactory/*
   I want to create a concatenated token plugin to add at concatenated
  token
   along with the last token.
  
   e.g. Solr training
  
   *Porter:-*  solr  train
 Position 1 2
  
   *Concatenated :-*   solr  train
  solrtrain
  Position 1  2
  
   Please help me out. How to create custom filter for this
 requirement.
  
   With Regards
   Aman Tandon
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 
 
 
 



Re: How to create concatenated token

2015-06-16 Thread Erick Erickson
I really question the premise, but have a look at:
https://issues.apache.org/jira/browse/SOLR-7193

Note that this is not committed and I haven't reviewed
it so I don't have anything to say about that. And you'd
have to implement it as a custom Filter.

Best,
Erick

On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon amantandon...@gmail.com wrote:
 Hi,

 Any guesses, how could I achieve this behaviour.

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com
 wrote:

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)


 typo error
 e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training)

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com
 wrote:

 We has some business logic to search the user query in user intent or
 finding the exact matching products.

 e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)

 As we can see it is phrase query so it will took more time than the
 single stemmed token query. There are also 5-7 words phrase query. So we
 want to reduce the search time by implementing this feature.

 With Regards
 Aman Tandon

 On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

 Can I ask you why you need to concatenate the tokens ? Maybe we can find
 a
 better solution to concat all the tokens in one single big token .
 I find it difficult to understand the reasons behind tokenising, token
 filtering and then un-tokenizing again :)
 It would be great if you explain a little bit better what you would like
 to
 do !


 Cheers

 2015-06-16 13:26 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I have a requirement to create the concatenated token of all the tokens
  created from the last item of my analyzer chain.
 
  *Suppose my analyzer chain is :*
 
 
 
 
 
  * tokenizer class=solr.WhitespaceTokenizerFactory /  filter
  class=solr.WordDelimiterFilterFactory catenateAll=1
 splitOnNumerics=1
  preserveOriginal=1/filter class=solr.EdgeNGramFilterFactory
  minGramSize=2 maxGramSize=15 side=front /filter
  class=solr.PorterStemmerFilterFactory/*
  I want to create a concatenated token plugin to add at concatenated
 token
  along with the last token.
 
  e.g. Solr training
 
  *Porter:-*  solr  train
Position 1 2
 
  *Concatenated :-*   solr  train
 solrtrain
 Position 1  2
 
  Please help me out. How to create custom filter for this requirement.
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England






Re: Raw lucene query for a given solr query

2015-06-16 Thread Chris Hostetter

: You can get raw query (and other debug information) with debug=true
: paramter.

more specifically -- if you are writting a custom SearchComponent, and 
want to access the underlying Query object produced by the parsers that 
SolrIndexSearcher has executed, you can do so the same way the debug 
component does...

https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/core/src/java/org/apache/solr/handler/component/DebugComponent.java?view=markup#l98

:  Hi,
: 
:   We have a few custom solrcloud components that act as value sources inside
:  solrcloud for boosting items in the index.  I want to get the final raw
:  lucene query used by solr for querying the index (for debugging purposes).
: 
:  Is it possible to get that information?
: 
:  Kindly advise
: 
:  Thanks,
:  Nitin
: 
: 

-Hoss
http://www.lucidworks.com/


Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Shenghua(Daniel) Wan
​I think your advice on future incremental update is very useful. I will
keep eye on that.

Actually, I am currently interested in how to boost merging/optimizing
performance of single solr instance.
Parallelism at MapReduce level does not help merging/optimizing much,
unless Solr/Lucene internally has distributed indexing mechanism like
threading.

Specifically, I am talking about the parameters in
//  ((TieredMergePolicy) mergePolicy).setMaxMergeAtOnceExplicit(
*1*);
//  ((TieredMergePolicy) mergePolicy).setMaxMergeAtOnce(*1*);

//  ((TieredMergePolicy) mergePolicy).setSegmentsPerTier(*1*);
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L119-121
Do you know how they affect merging/optimizing the performance? or do you
know any doc about them?
I tried to uncomment them, and the performance improved. And I am
considering further tune the parameters.

As you mentioned, IndexWriter.forceMerge does exist in line 153 of
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/TreeMergeOutputFormat.java#L153

I am very grateful for your advice. Thanks a lot.
​

On Mon, Jun 15, 2015 at 10:39 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Ah, OK. For very slowly changing indexes optimize can makes sense.

 Do note, though, that if you incrementally index after the full build, and
 especially if you update documents, you're laying a trap for the future.
 Let's
 say you optimize down to a single segment. The default TieredMergePolicy
 tries to merge similar size segments. But now you have one huge segment
 and docs will be marked as deleted from that segment, but not cleaned up
 until that segment is merged, which won't happen for a long time since it
 is so much bigger (I'm assuming) than the segments the incremental indexing
 will create.

 Now, the percentage of deleted documents weighs quite heavily in the
 decision
 what segments to merge, so it might not matter. It's just something to
 be aware of.
 Surely benchmarking is in order as you indicated.

 The Lucene-level IndexWriter.forceMerge method seems to be what you need
 though, although if you're working over HDFS I'm in unfamiliar territory.
 But
 the constructors to IndexWriter take a Directory, and the HdfsDirectory
 extends BaseDirectory which extends Directory so if you can set up
 an HdfsDIrectory it should just work. I haven't personally tried it
 though.

 I saw something recently where optimization helped considerably in a
 sharded situation where the rows parameter was 400 (10 shards). My
 belief is that what was really happening was that the first-pass of a
 distributed search was getting slowed by disk seeks across multiple
 smaller segments. I'm waiting for SOLR-6810 which should impact that
 problem. Don't know if it applies to your situation or not though.

 HTH,
 Erick


 On Mon, Jun 15, 2015 at 8:30 PM, Shenghua(Daniel) Wan
 wansheng...@gmail.com wrote:
  Hi, Erick,
  First thanks for sharing the ideas. I am further giving more context here
  accordingly.
 
  1. why optimize? I have done some experiments to compare the query
 response
  time, and there is some difference. In addition, the searcher will be
  customer-facing. I think any performance boost will be worthwhile unless
  the indexing will be more frequent. However, more benchmark will be
  necessary to quantize the margin.
 
  2. Why embedded solr server? I adopted the idea from Mark Miller's
  map-reduce indexing and build on top of its original contribution to
 Solr.
  It launches an embedded solr server at the end of reducer stages.
 Basically
  a solr instance is brought up and fed with documents. Then the index is
  generated at each reducer. Then the indexes are merged, and optimized if
  desired.
 
  Thanks.
 
  On Mon, Jun 15, 2015 at 5:06 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  The first question is why you're optimizing at all. It's not recommended
  unless you can demonstrate that an optimized index is giving you enough
  of a performance boost to be worth the effort.
 
  And why are you using embedded solr server? That's kind of unusual
  so I wonder if you've gone down a wrong path somewhere. In other
  words this feels like an XY problem, you're specifically asking about
  a task without explaining the problem you're trying to solve, there may
  be better alternatives.
 
  Best,
  Erick
 
  On Mon, Jun 15, 2015 at 4:56 PM, Shenghua(Daniel) Wan
  wansheng...@gmail.com wrote:
   Hi,
   Do you have any suggestions to improve the performance for merging and
   optimizing index?
   I have been using embedded solr server to merge and optimize the
 index. I
   am looking for the right parameters to tune. My use case have about
 300
   fields plus 250 copyfields, and moderate doc size (about 65K each doc
   averagely)
  
   https://wiki.apache.org/solr/MergingSolrIndexes does not help 

Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread forest_soup
For the _version_ field in the schema.xml, do we need to set it be
docValues=true?
   field name=_version_ type=long indexed=true stored=true/

As we noticed there are FieldCache for _version_ in the solr stats:
http://lucene.472066.n3.nabble.com/file/n4212123/IMAGE%245A8381797719FDA9.jpg 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Do-we-need-to-add-docValues-true-to-version-field-in-schema-xml-tp4212123.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Raw lucene query for a given solr query

2015-06-16 Thread Tomoko Uchida
Hi,

You can get raw query (and other debug information) with debug=true
paramter.

Regards,
Tomoko

2015-06-16 8:10 GMT+09:00 KNitin nitin.t...@gmail.com:

 Hi,

  We have a few custom solrcloud components that act as value sources inside
 solrcloud for boosting items in the index.  I want to get the final raw
 lucene query used by solr for querying the index (for debugging purposes).

 Is it possible to get that information?

 Kindly advise

 Thanks,
 Nitin



Re: Solr's suggester results

2015-06-16 Thread Alessandro Benedetti
in line :

2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thanks Benedetti,

 I've change to the AnalyzingInfixLookup approach, and it is able to start
 searching from the middle of the field.

 However, is it possible to make the suggester to show only part of the
 content of the field (like 2 or 3 fields after), instead of the entire
 content/sentence, which can be quite long?


I assume you use fields in the place of tokens.
The answer is yes, I already said that in my previous mail, I invite you to
read carefully the answers and the documentation linked !

Related the excessive dimensions of tokens. This is weird, what are you
trying to autocomplete ?
I really doubt would be useful for a user to see super long auto completed
terms.

Cheers



 Regards,
 Edwin



 On 15 June 2015 at 17:33, Alessandro Benedetti benedetti.ale...@gmail.com
 
 wrote:

  ehehe Edwin, I think you should read again the document I linked time
 ago :
 
  http://lucidworks.com/blog/solr-suggester/
 
  The suggester you used is not meant to provide infix suggestions.
  The fuzzy suggester is working on a fuzzy basis , with the *starting*
 terms
  of a field content.
 
  What you are looking for is actually one of the Infix Suggesters.
  For example the AnalyzingInfixLookup approach.
 
  When working with Suggesters is important first to make a distinction :
 
  1) Returning the full content of the field ( analysisInfix or Fuzzy)
 
  2) Returning token(s) ( Free Text Suggester)
 
  Then the second difference is :
 
  1) Infix suggestions ( from the middle of the field content)
  2) Classic suggester ( from the beginning of the field content)
 
  Clarified that, will be quite simple to work with suggesters.
 
  Cheers
 
  2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   I've indexed a rich-text documents with the following content:
  
   This is a testing rich text documents to test the uploading of files to
   Solr
  
  
   When I tried to use the suggestion, it return me the entire field in
 the
   content once I enter suggest?q=t. However, when I tried to search for
   q='rich', I don't get any results returned.
  
   This is my current configuration for the suggester:
   searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namemySuggester/str
   str name=lookupImplFuzzyLookupFactory/str
   str name=dictionaryImplDocumentDictionaryFactory/str
   str name=fieldSuggestion/str
   str name=suggestAnalyzerFieldTypesuggestType/str
   str name=buildOnStartuptrue/str
   str name=buildOnCommitfalse/str
 /lst
   /searchComponent
  
   requestHandler name=/suggest class=solr.SearchHandler
  startup=lazy 
 lst name=defaults
   str name=wtjson/str
   str name=indenttrue/str
  
   str name=suggesttrue/str
   str name=suggest.count10/str
   str name=suggest.dictionarymySuggester/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler
  
   Is it possible to allow the suggester to return something even from the
   middle of the sentence, and also not to return the entire sentence if
 the
   sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
   return more fields as the users type.
  
   For example,
   When user type 'this', it should return 'This is a testing'
   When user type 'this is a testing', it should return 'This is a testing
   rich text documents'.
  
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread ariya bala
Ok. Thank you Chris.
It is a custom Query parser.
I will check my Query parser on where it inject the slop 1.

On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : I encounter this peculiar case with solr 4.10.2 where the parsed query
 : doesnt seem to be logical.
 :
 : PHRASE23(reduce workforce) ==
 : SpanNearQuery(spanNear([spanNear([Contents:reduceä,
 : Contents:workforceä], 1, true)], 23, true))

 1) that does not appear to be a parser syntax of any parser that comes
 with Solr (that i know of) so it's possible that whatever custom parser
 you are using has a bug in it.

 2) IIRC, with span queries (which unlike PhraseQueries explicitly support
 both in-order, and out of order nearness) a slop of 0 is going to
 require that the 2 spans overlap and occupy the exact same position -- a
 span of 1 means that they differ by a single position.



 -Hoss
 http://www.lucidworks.com/




-- 
*Ariya *


Re: Facet on same field in different ways

2015-06-16 Thread Alessandro Benedetti
Hi Phanindra,
Have you tried this syntax ?

facet=truefacet.field={!ex=st key=terms facet.limit=5
facet.prefix=ap}query_termsfacet.field={!key=terms2
facet.limit=1}query_termsrows=0facet.mincount=1

This seems the proper syntax, I found it here :
https://issues.apache.org/jira/browse/SOLR-4717

Is this solving your problem ?

Cheers

2015-06-16 0:05 GMT+01:00 Phanindra R phani...@gmail.com:

 Hi guys,
Is there a way to facet on same field in *different ways?* For
 example, using a different facet.prefix. Here are the details

 facet.field={!key=myKey}myFieldfacet.prefix=p   == works
 facet.field={!key=myKey}myFieldf.myField.facet.prefix=p   == works
 facet.field={!key=myKey}myFieldf.myKey.facet.prefix=p   ==* doesn't work
  (ref: Solr-1351)*

 In addition, when I try *f.myKey.facet.range.gap=2.0.* it actually doesn't
 recognize it and throws the error: Missing required parameter:
 f.myField.facet.range.gap (or default: facet.range.gap)

 I'm using Solr 4.10

 Ref: https://issues.apache.org/jira/browse/SOLR-1351

 Thanks




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread Alessandro Benedetti
Hi Ariya,
I think Hossman specified you that the slop 1 is fine in your use case :)
Of course in the case using span queries was what you were expecting !

Cheers

2015-06-16 10:13 GMT+01:00 ariya bala ariya...@gmail.com:

 Ok. Thank you Chris.
 It is a custom Query parser.
 I will check my Query parser on where it inject the slop 1.

 On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter hossman_luc...@fucit.org
 
 wrote:

 
  : I encounter this peculiar case with solr 4.10.2 where the parsed query
  : doesnt seem to be logical.
  :
  : PHRASE23(reduce workforce) ==
  : SpanNearQuery(spanNear([spanNear([Contents:reduceä,
  : Contents:workforceä], 1, true)], 23, true))
 
  1) that does not appear to be a parser syntax of any parser that comes
  with Solr (that i know of) so it's possible that whatever custom parser
  you are using has a bug in it.
 
  2) IIRC, with span queries (which unlike PhraseQueries explicitly support
  both in-order, and out of order nearness) a slop of 0 is going to
  require that the 2 spans overlap and occupy the exact same position --
 a
  span of 1 means that they differ by a single position.
 
 
 
  -Hoss
  http://www.lucidworks.com/




 --
 *Ariya *




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


phrase matches returning near matches

2015-06-16 Thread Alistair Young
Hiya,

I've been looking for documentation that would point to where I could modify or 
explain why 'near neighbours' are returned from a phrase search. If I search 
for:

manage change

I get back a document that contains this will help in your management of lots 
more words... changes. It's relevant but I'd like to understand why solr is 
returning it. Is it a combination of fuzzy/slop? The distance between the two 
variations of the two words in the document is quite large.

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h


Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
Can you show us how the query is parsed ?
You didn't tell us nothing about the query parser you are using.
Enable the debugQuery=true will show you how the query is parsed and this
will be quite useful for us.


Cheers

2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 Hiya,

 I've been looking for documentation that would point to where I could
 modify or explain why 'near neighbours' are returned from a phrase search.
 If I search for:

 manage change

 I get back a document that contains this will help in your management of
 lots more words... changes. It's relevant but I'd like to understand why
 solr is returning it. Is it a combination of fuzzy/slop? The distance
 between the two variations of the two words in the document is quite large.

 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


What contribute to a Solr core's FieldCache entry_count?

2015-06-16 Thread forest_soup
For the fieldCache, what determines the entries_count? 

Is each search request containing a sort on an non-docValues field
contribute one entry to the entries_count?

For example, search A ( q=owner:1sort=maildate asc ) and search b (
q=owner:2sort=maildate asc ) will contribute 2 field cache entries ?

I have a collection containing only one core, and there is only one doc
within it, why there are so many lucene fieldCache? 

http://lucene.472066.n3.nabble.com/file/n4212148/%244FA9F550C60D3BA2.jpg 
http://lucene.472066.n3.nabble.com/file/n4212148/Untitled.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-contribute-to-a-Solr-core-s-FieldCache-entry-count-tp4212148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
the document is relevant. debug output is:

lst name=debug
  str name=rawquerystringdc.description:manage change/str
  str name=querystringdc.description:manage change/str
  str name=parsedqueryPhraseQuery(dc.description:manag chang)/str
  str name=parsedquery_toStringdc.description:manag chang/str
  lst name=explain
str name=tst:test
1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
[DefaultSimilarity], result of:
  1.2008798 = fieldWeight in 221, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
9.6070385 = idf(), sum of:
  4.0365543 = idf(docFreq=101, maxDocs=2125)
  5.5704846 = idf(docFreq=21, maxDocs=2125)
0.125 = fieldNorm(doc=221)
/str
  /lst
  str name=QParserLuceneQParser/str
  lst name=timing
double name=time41.0/double
lst name=prepare
  double name=time3.0/double
  lst name=query
double name=time0.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time0.0/double
  /lst
/lst
lst name=process
  double name=time35.0/double
  lst name=query
double name=time0.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time35.0/double
  /lst
/lst
  /lst
/lst


thanks,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

Can you show us how the query is parsed ?
You didn't tell us nothing about the query parser you are using.
Enable the debugQuery=true will show you how the query is parsed and this
will be quite useful for us.


Cheers

2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 Hiya,

 I've been looking for documentation that would point to where I could
 modify or explain why 'near neighbours' are returned from a phrase
search.
 If I search for:

 manage change

 I get back a document that contains this will help in your management
of
 lots more words... changes. It's relevant but I'd like to understand
why
 solr is returning it. Is it a combination of fuzzy/slop? The distance
 between the two variations of the two words in the document is quite
large.

 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England



Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
According to your debug you are using a default Lucene Query Parser.
This surprise me as i would expect with that query a match with distance 0
between the 2 terms .

Are you sure nothing else is that field that matches the phrase query ?

From the documentation

Lucene supports finding words are a within a specific distance away. To do
a proximity search use the tilde, ~, symbol at the end of a Phrase. For
example to search for a apache and jakarta within 10 words of each
other in a document use the search:

jakarta apache~10 


Cheers


2015-06-16 11:33 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:

 it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
 the document is relevant. debug output is:

 lst name=debug
   str name=rawquerystringdc.description:manage change/str
   str name=querystringdc.description:manage change/str
   str name=parsedqueryPhraseQuery(dc.description:manag chang)/str
   str name=parsedquery_toStringdc.description:manag chang/str
   lst name=explain
 str name=tst:test
 1.2008798 = (MATCH) weight(dc.description:manag chang in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 /str
   /lst
   str name=QParserLuceneQParser/str
   lst name=timing
 double name=time41.0/double
 lst name=prepare
   double name=time3.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time0.0/double
   /lst
 /lst
 lst name=process
   double name=time35.0/double
   lst name=query
 double name=time0.0/double
   /lst
   lst name=facet
 double name=time0.0/double
   /lst
   lst name=mlt
 double name=time0.0/double
   /lst
   lst name=highlight
 double name=time0.0/double
   /lst
   lst name=stats
 double name=time0.0/double
   /lst
   lst name=debug
 double name=time35.0/double
   /lst
 /lst
   /lst
 /lst


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

 Can you show us how the query is parsed ?
 You didn't tell us nothing about the query parser you are using.
 Enable the debugQuery=true will show you how the query is parsed and this
 will be quite useful for us.
 
 
 Cheers
 
 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:
 
  Hiya,
 
  I've been looking for documentation that would point to where I could
  modify or explain why 'near neighbours' are returned from a phrase
 search.
  If I search for:
 
  manage change
 
  I get back a document that contains this will help in your management
 of
  lots more words... changes. It's relevant but I'd like to understand
 why
  solr is returning it. Is it a combination of fuzzy/slop? The distance
  between the two variations of the two words in the document is quite
 large.
 
  thanks,
 
  Alistair
 
  --
  mov eax,1
  mov ebx,0
  int 80h
 
 
 
 
 --
 --
 
 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti
 
 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?
 
 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England