Re: Joins with comma separated values

2015-06-16 Thread Upayavira
You can potentially just use a text_general field, in which case your
comma separated words will be effectively a multi-valued field. I
believe that will work.

As to how you want to use joins, that isn't possible. They are pseudo
joins, not full joins. They will not be able to include data from the
joined field in the result.

Upayavira

On jun 6, Advait Suhas Pandit wrote:
> Hi,
> 
> We have some master data and some content data. Master data would be
> things like userid, name, email id etc.
> Our content data for example is a blog.
> The blog has certain fields which are comma separated ids that point to
> the master data.
> E.g. UserIDs of people who have commented on a particular blog can be
> found in the blog "table" in a comma separated field of userids.
> Similarly userids of people who have liked the blog can be found in a
> comma separated field of userids.
> 
> How do I join this comma separated list of userids with the master data
> so that I can get the other details of the user such as name, email,
> picture etc?
> 
> Thanks,
> Advait
> 


Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
Yes I've looked at that before, but I was told that the newer version of
Solr has its own suggester, and does not need to use spellchecker anymore?

So it's not necessary to use the spellechecker inside suggester anymore?

Regards,
Edwin


On 17 June 2015 at 11:56, Erick Erickson  wrote:

> Have you looked at spellchecker? Because that sound much more like
> what you're asking about than suggester.
>
> Spell checking is more what you're asking for, have you even looked at that
> after it was suggested?
>
> bq: Also, when I do a search, it shouldn't be returning whole fields,
> but just to return a portion of the sentence
>
> This is what highlighting is built for.
>
> Really, I recommend you take the time to do some familiarization with the
> whole search space and Solr. The excellent book here:
>
>
> http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E
>
> will give you the grounding you need to get the most out of Solr.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
>  wrote:
> > The long content is from when I tried to index PDF files. As some PDF
> files
> > has alot of words in the content, it will lead to the *UTF8 encoding is
> > longer than the max length 32766 error.*
> >
> > I think the problem is the content size of the PDF file exceed 32766
> > characters?
> >
> > I'm trying to accomplish to be able to index documents that can be of any
> > size (even those with very large contents), and build the suggester from
> > there. Also, when I do a search, it shouldn't be returning whole fields,
> > but just to return a portion of the sentence.
> >
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 16 June 2015 at 23:02, Erick Erickson 
> wrote:
> >
> >> The suggesters are built to return whole fields. You _might_
> >> be able to add multiple fragments to a multiValued
> >> entry and get fragments, I haven't tried that though
> >> and I suspect that actually you'd get the same thing..
> >>
> >> This is an XY problem IMO. Please describe exactly what
> >> you're trying to accomplish, with examples rather than
> >> continue to pursue this path. It sounds like you want
> >> spellcheck or similar. The _point_ behind the
> >> suggesters is that they handle multiple-word suggestions
> >> by returning he whole field. So putting long text fields
> >> into them is not going to work.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
> >>  wrote:
> >> > in line :
> >> >
> >> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
> >> >
> >> >> Thanks Benedetti,
> >> >>
> >> >> I've change to the AnalyzingInfixLookup approach, and it is able to
> >> start
> >> >> searching from the middle of the field.
> >> >>
> >> >> However, is it possible to make the suggester to show only part of
> the
> >> >> content of the field (like 2 or 3 fields after), instead of the
> entire
> >> >> content/sentence, which can be quite long?
> >> >>
> >> >
> >> > I assume you use "fields" in the place of tokens.
> >> > The answer is yes, I already said that in my previous mail, I invite
> you
> >> to
> >> > read carefully the answers and the documentation linked !
> >> >
> >> > Related the excessive dimensions of tokens. This is weird, what are
> you
> >> > trying to autocomplete ?
> >> > I really doubt would be useful for a user to see super long auto
> >> completed
> >> > terms.
> >> >
> >> > Cheers
> >> >
> >> >>
> >> >>
> >> >> Regards,
> >> >> Edwin
> >> >>
> >> >>
> >> >>
> >> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
> >> benedetti.ale...@gmail.com
> >> >> >
> >> >> wrote:
> >> >>
> >> >> > ehehe Edwin, I think you should read again the document I linked
> time
> >> >> ago :
> >> >> >
> >> >> > http://lucidworks.com/blog/solr-suggester/
> >> >> >
> >> >> > The suggester you used is not meant to provide infix suggestions.
> >> >> > The fuzzy suggester is working on a fuzzy basis , with the
> *starting*
> >> >> terms
> >> >> > of a field content.
> >> >> >
> >> >> > What you are looking for is actually one of the Infix Suggesters.
> >> >> > For example the AnalyzingInfixLookup approach.
> >> >> >
> >> >> > When working with Suggesters is important first to make a
> distinction
> >> :
> >> >> >
> >> >> > 1) Returning the full content of the field ( analysisInfix or
> Fuzzy)
> >> >> >
> >> >> > 2) Returning token(s) ( Free Text Suggester)
> >> >> >
> >> >> > Then the second difference is :
> >> >> >
> >> >> > 1) Infix suggestions ( from the "middle" of the field content)
> >> >> > 2) Classic suggester ( from the beginning of the field content)
> >> >> >
> >> >> > Clarified that, will be quite simple to work with suggesters.
> >> >> >
> >> >> > Cheers
> >> >> >
> >> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>:
> >> >> >
> >> >> > > I've indexed a rich-text documents with the following content:
> >> >> > >
> >> >> > > Th

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi Erick,

Thank you so much, it will be helpful for me to learn how to save the state
of token. I has no idea of how to save state of previous tokens due to this
it was difficult to generate a concatenated token in the last.

So is there anything should I read to learn more about it.

With Regards
Aman Tandon

On Wed, Jun 17, 2015 at 9:20 AM, Erick Erickson 
wrote:

> I really question the premise, but have a look at:
> https://issues.apache.org/jira/browse/SOLR-7193
>
> Note that this is not committed and I haven't reviewed
> it so I don't have anything to say about that. And you'd
> have to implement it as a custom Filter.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon 
> wrote:
> > Hi,
> >
> > Any guesses, how could I achieve this behaviour.
> >
> > With Regards
> > Aman Tandon
> >
> > On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon 
> > wrote:
> >
> >> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr
> training")
> >>
> >>
> >> typo error
> >> e.g. Intent for solr training: fq=id:(234 456 545) title:("solr
> training")
> >>
> >> With Regards
> >> Aman Tandon
> >>
> >> On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon 
> >> wrote:
> >>
> >>> We has some business logic to search the user query in "user intent" or
> >>> "finding the exact matching products".
> >>>
> >>> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr
> training")
> >>>
> >>> As we can see it is phrase query so it will took more time than the
> >>> single stemmed token query. There are also 5-7 words phrase query. So
> we
> >>> want to reduce the search time by implementing this feature.
> >>>
> >>> With Regards
> >>> Aman Tandon
> >>>
> >>> On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
> >>> benedetti.ale...@gmail.com> wrote:
> >>>
>  Can I ask you why you need to concatenate the tokens ? Maybe we can
> find
>  a
>  better solution to concat all the tokens in one single big token .
>  I find it difficult to understand the reasons behind tokenising, token
>  filtering and then un-tokenizing again :)
>  It would be great if you explain a little bit better what you would
> like
>  to
>  do !
> 
> 
>  Cheers
> 
>  2015-06-16 13:26 GMT+01:00 Aman Tandon :
> 
>  > Hi,
>  >
>  > I have a requirement to create the concatenated token of all the
> tokens
>  > created from the last item of my analyzer chain.
>  >
>  > *Suppose my analyzer chain is :*
>  >
>  >
>  >
>  >
>  >
>  > * > class="solr.WordDelimiterFilterFactory" catenateAll="1"
>  splitOnNumerics="1"
>  > preserveOriginal="1"/> class="solr.EdgeNGramFilterFactory"
>  > minGramSize="2" maxGramSize="15" side="front" />  > class="solr.PorterStemmerFilterFactory"/>*
>  > I want to create a concatenated token plugin to add at concatenated
>  token
>  > along with the last token.
>  >
>  > e.g. Solr training
>  >
>  > *Porter:-*  "solr"  "train"
>  >   Position 1 2
>  >
>  > *Concatenated :-*   "solr"  "train"
>  >"solrtrain"
>  >Position 1  2
>  >
>  > Please help me out. How to create custom filter for this
> requirement.
>  >
>  > With Regards
>  > Aman Tandon
>  >
> 
> 
> 
>  --
>  --
> 
>  Benedetti Alessandro
>  Visiting card : http://about.me/alessandro_benedetti
> 
>  "Tyger, tyger burning bright
>  In the forests of the night,
>  What immortal hand or eye
>  Could frame thy fearful symmetry?"
> 
>  William Blake - Songs of Experience -1794 England
> 
> >>>
> >>>
> >>
>


Joins with comma separated values

2015-06-16 Thread Advait Suhas Pandit
Hi,

We have some master data and some content data. Master data would be things 
like userid, name, email id etc.
Our content data for example is a blog.
The blog has certain fields which are comma separated ids that point to the 
master data.
E.g. UserIDs of people who have commented on a particular blog can be found in 
the blog "table" in a comma separated field of userids. Similarly userids of 
people who have liked the blog can be found in a comma separated field of 
userids.

How do I join this comma separated list of userids with the master data so that 
I can get the other details of the user such as name, email, picture etc?

Thanks,
Advait



Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
Have you looked at spellchecker? Because that sound much more like
what you're asking about than suggester.

Spell checking is more what you're asking for, have you even looked at that
after it was suggested?

bq: Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence

This is what highlighting is built for.

Really, I recommend you take the time to do some familiarization with the
whole search space and Solr. The excellent book here:

http://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021/ref=sr_1_1?ie=UTF8&qid=1434513284&sr=8-1&keywords=apache+solr&pebp=1434513287267&perid=0YRK508J0HJ1N3BAX20E

will give you the grounding you need to get the most out of Solr.

Best,
Erick

On Tue, Jun 16, 2015 at 8:27 PM, Zheng Lin Edwin Yeo
 wrote:
> The long content is from when I tried to index PDF files. As some PDF files
> has alot of words in the content, it will lead to the *UTF8 encoding is
> longer than the max length 32766 error.*
>
> I think the problem is the content size of the PDF file exceed 32766
> characters?
>
> I'm trying to accomplish to be able to index documents that can be of any
> size (even those with very large contents), and build the suggester from
> there. Also, when I do a search, it shouldn't be returning whole fields,
> but just to return a portion of the sentence.
>
>
>
> Regards,
> Edwin
>
>
> On 16 June 2015 at 23:02, Erick Erickson  wrote:
>
>> The suggesters are built to return whole fields. You _might_
>> be able to add multiple fragments to a multiValued
>> entry and get fragments, I haven't tried that though
>> and I suspect that actually you'd get the same thing..
>>
>> This is an XY problem IMO. Please describe exactly what
>> you're trying to accomplish, with examples rather than
>> continue to pursue this path. It sounds like you want
>> spellcheck or similar. The _point_ behind the
>> suggesters is that they handle multiple-word suggestions
>> by returning he whole field. So putting long text fields
>> into them is not going to work.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
>>  wrote:
>> > in line :
>> >
>> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
>> >
>> >> Thanks Benedetti,
>> >>
>> >> I've change to the AnalyzingInfixLookup approach, and it is able to
>> start
>> >> searching from the middle of the field.
>> >>
>> >> However, is it possible to make the suggester to show only part of the
>> >> content of the field (like 2 or 3 fields after), instead of the entire
>> >> content/sentence, which can be quite long?
>> >>
>> >
>> > I assume you use "fields" in the place of tokens.
>> > The answer is yes, I already said that in my previous mail, I invite you
>> to
>> > read carefully the answers and the documentation linked !
>> >
>> > Related the excessive dimensions of tokens. This is weird, what are you
>> > trying to autocomplete ?
>> > I really doubt would be useful for a user to see super long auto
>> completed
>> > terms.
>> >
>> > Cheers
>> >
>> >>
>> >>
>> >> Regards,
>> >> Edwin
>> >>
>> >>
>> >>
>> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
>> benedetti.ale...@gmail.com
>> >> >
>> >> wrote:
>> >>
>> >> > ehehe Edwin, I think you should read again the document I linked time
>> >> ago :
>> >> >
>> >> > http://lucidworks.com/blog/solr-suggester/
>> >> >
>> >> > The suggester you used is not meant to provide infix suggestions.
>> >> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
>> >> terms
>> >> > of a field content.
>> >> >
>> >> > What you are looking for is actually one of the Infix Suggesters.
>> >> > For example the AnalyzingInfixLookup approach.
>> >> >
>> >> > When working with Suggesters is important first to make a distinction
>> :
>> >> >
>> >> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>> >> >
>> >> > 2) Returning token(s) ( Free Text Suggester)
>> >> >
>> >> > Then the second difference is :
>> >> >
>> >> > 1) Infix suggestions ( from the "middle" of the field content)
>> >> > 2) Classic suggester ( from the beginning of the field content)
>> >> >
>> >> > Clarified that, will be quite simple to work with suggesters.
>> >> >
>> >> > Cheers
>> >> >
>> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
>> >> >
>> >> > > I've indexed a rich-text documents with the following content:
>> >> > >
>> >> > > This is a testing rich text documents to test the uploading of
>> files to
>> >> > > Solr
>> >> > >
>> >> > >
>> >> > > When I tried to use the suggestion, it return me the entire field in
>> >> the
>> >> > > content once I enter suggest?q=t. However, when I tried to search
>> for
>> >> > > q='rich', I don't get any results returned.
>> >> > >
>> >> > > This is my current configuration for the suggester:
>> >> > > 
>> >> > >   
>> >> > > mySuggester
>> >> > > FuzzyLookupFactory
>> >> > > DocumentDictionaryFactory
>> >> > > Suggestion
>> >> > > suggestType
>> >> > > true
>> >> > > false
>> >> > 

Re: How to create concatenated token

2015-06-16 Thread Erick Erickson
I really question the premise, but have a look at:
https://issues.apache.org/jira/browse/SOLR-7193

Note that this is not committed and I haven't reviewed
it so I don't have anything to say about that. And you'd
have to implement it as a custom Filter.

Best,
Erick

On Tue, Jun 16, 2015 at 5:55 PM, Aman Tandon  wrote:
> Hi,
>
> Any guesses, how could I achieve this behaviour.
>
> With Regards
> Aman Tandon
>
> On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon 
> wrote:
>
>> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")
>>
>>
>> typo error
>> e.g. Intent for solr training: fq=id:(234 456 545) title:("solr training")
>>
>> With Regards
>> Aman Tandon
>>
>> On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon 
>> wrote:
>>
>>> We has some business logic to search the user query in "user intent" or
>>> "finding the exact matching products".
>>>
>>> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")
>>>
>>> As we can see it is phrase query so it will took more time than the
>>> single stemmed token query. There are also 5-7 words phrase query. So we
>>> want to reduce the search time by implementing this feature.
>>>
>>> With Regards
>>> Aman Tandon
>>>
>>> On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
>>> benedetti.ale...@gmail.com> wrote:
>>>
 Can I ask you why you need to concatenate the tokens ? Maybe we can find
 a
 better solution to concat all the tokens in one single big token .
 I find it difficult to understand the reasons behind tokenising, token
 filtering and then un-tokenizing again :)
 It would be great if you explain a little bit better what you would like
 to
 do !


 Cheers

 2015-06-16 13:26 GMT+01:00 Aman Tandon :

 > Hi,
 >
 > I have a requirement to create the concatenated token of all the tokens
 > created from the last item of my analyzer chain.
 >
 > *Suppose my analyzer chain is :*
 >
 >
 >
 >
 >
 > *   >>> > class="solr.WordDelimiterFilterFactory" catenateAll="1"
 splitOnNumerics="1"
 > preserveOriginal="1"/>>>> > minGramSize="2" maxGramSize="15" side="front" />>>> > class="solr.PorterStemmerFilterFactory"/>*
 > I want to create a concatenated token plugin to add at concatenated
 token
 > along with the last token.
 >
 > e.g. Solr training
 >
 > *Porter:-*  "solr"  "train"
 >   Position 1 2
 >
 > *Concatenated :-*   "solr"  "train"
 >"solrtrain"
 >Position 1  2
 >
 > Please help me out. How to create custom filter for this requirement.
 >
 > With Regards
 > Aman Tandon
 >



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 "Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?"

 William Blake - Songs of Experience -1794 England

>>>
>>>
>>


Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
The long content is from when I tried to index PDF files. As some PDF files
has alot of words in the content, it will lead to the *UTF8 encoding is
longer than the max length 32766 error.*

I think the problem is the content size of the PDF file exceed 32766
characters?

I'm trying to accomplish to be able to index documents that can be of any
size (even those with very large contents), and build the suggester from
there. Also, when I do a search, it shouldn't be returning whole fields,
but just to return a portion of the sentence.



Regards,
Edwin


On 16 June 2015 at 23:02, Erick Erickson  wrote:

> The suggesters are built to return whole fields. You _might_
> be able to add multiple fragments to a multiValued
> entry and get fragments, I haven't tried that though
> and I suspect that actually you'd get the same thing..
>
> This is an XY problem IMO. Please describe exactly what
> you're trying to accomplish, with examples rather than
> continue to pursue this path. It sounds like you want
> spellcheck or similar. The _point_ behind the
> suggesters is that they handle multiple-word suggestions
> by returning he whole field. So putting long text fields
> into them is not going to work.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
>  wrote:
> > in line :
> >
> > 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> >> Thanks Benedetti,
> >>
> >> I've change to the AnalyzingInfixLookup approach, and it is able to
> start
> >> searching from the middle of the field.
> >>
> >> However, is it possible to make the suggester to show only part of the
> >> content of the field (like 2 or 3 fields after), instead of the entire
> >> content/sentence, which can be quite long?
> >>
> >
> > I assume you use "fields" in the place of tokens.
> > The answer is yes, I already said that in my previous mail, I invite you
> to
> > read carefully the answers and the documentation linked !
> >
> > Related the excessive dimensions of tokens. This is weird, what are you
> > trying to autocomplete ?
> > I really doubt would be useful for a user to see super long auto
> completed
> > terms.
> >
> > Cheers
> >
> >>
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >>
> >> On 15 June 2015 at 17:33, Alessandro Benedetti <
> benedetti.ale...@gmail.com
> >> >
> >> wrote:
> >>
> >> > ehehe Edwin, I think you should read again the document I linked time
> >> ago :
> >> >
> >> > http://lucidworks.com/blog/solr-suggester/
> >> >
> >> > The suggester you used is not meant to provide infix suggestions.
> >> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
> >> terms
> >> > of a field content.
> >> >
> >> > What you are looking for is actually one of the Infix Suggesters.
> >> > For example the AnalyzingInfixLookup approach.
> >> >
> >> > When working with Suggesters is important first to make a distinction
> :
> >> >
> >> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
> >> >
> >> > 2) Returning token(s) ( Free Text Suggester)
> >> >
> >> > Then the second difference is :
> >> >
> >> > 1) Infix suggestions ( from the "middle" of the field content)
> >> > 2) Classic suggester ( from the beginning of the field content)
> >> >
> >> > Clarified that, will be quite simple to work with suggesters.
> >> >
> >> > Cheers
> >> >
> >> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
> >> >
> >> > > I've indexed a rich-text documents with the following content:
> >> > >
> >> > > This is a testing rich text documents to test the uploading of
> files to
> >> > > Solr
> >> > >
> >> > >
> >> > > When I tried to use the suggestion, it return me the entire field in
> >> the
> >> > > content once I enter suggest?q=t. However, when I tried to search
> for
> >> > > q='rich', I don't get any results returned.
> >> > >
> >> > > This is my current configuration for the suggester:
> >> > > 
> >> > >   
> >> > > mySuggester
> >> > > FuzzyLookupFactory
> >> > > DocumentDictionaryFactory
> >> > > Suggestion
> >> > > suggestType
> >> > > true
> >> > > false
> >> > >   
> >> > > 
> >> > >
> >> > >  >> > startup="lazy" >
> >> > >   
> >> > > json
> >> > > true
> >> > >
> >> > > true
> >> > > 10
> >> > > mySuggester
> >> > >   
> >> > >   
> >> > > suggest
> >> > >   
> >> > > 
> >> > >
> >> > > Is it possible to allow the suggester to return something even from
> the
> >> > > middle of the sentence, and also not to return the entire sentence
> if
> >> the
> >> > > sentence. Perhaps it should just suggest the next 2 or 3 fields,
> and to
> >> > > return more fields as the users type.
> >> > >
> >> > > For example,
> >> > > When user type 'this', it should return 'This is a testing'
> >> > > When user type 'this is a testing', it should return 'This is a
> testing
> >> > > rich text documents'.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Edwin
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --
> >> >
> >> > Benedetti Alessandro
> >> > Visiting card : http://about.me/a

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi,

Any guesses, how could I achieve this behaviour.

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon 
wrote:

> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")
>
>
> typo error
> e.g. Intent for solr training: fq=id:(234 456 545) title:("solr training")
>
> With Regards
> Aman Tandon
>
> On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon 
> wrote:
>
>> We has some business logic to search the user query in "user intent" or
>> "finding the exact matching products".
>>
>> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")
>>
>> As we can see it is phrase query so it will took more time than the
>> single stemmed token query. There are also 5-7 words phrase query. So we
>> want to reduce the search time by implementing this feature.
>>
>> With Regards
>> Aman Tandon
>>
>> On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
>> benedetti.ale...@gmail.com> wrote:
>>
>>> Can I ask you why you need to concatenate the tokens ? Maybe we can find
>>> a
>>> better solution to concat all the tokens in one single big token .
>>> I find it difficult to understand the reasons behind tokenising, token
>>> filtering and then un-tokenizing again :)
>>> It would be great if you explain a little bit better what you would like
>>> to
>>> do !
>>>
>>>
>>> Cheers
>>>
>>> 2015-06-16 13:26 GMT+01:00 Aman Tandon :
>>>
>>> > Hi,
>>> >
>>> > I have a requirement to create the concatenated token of all the tokens
>>> > created from the last item of my analyzer chain.
>>> >
>>> > *Suppose my analyzer chain is :*
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > *   >> > class="solr.WordDelimiterFilterFactory" catenateAll="1"
>>> splitOnNumerics="1"
>>> > preserveOriginal="1"/>>> > minGramSize="2" maxGramSize="15" side="front" />>> > class="solr.PorterStemmerFilterFactory"/>*
>>> > I want to create a concatenated token plugin to add at concatenated
>>> token
>>> > along with the last token.
>>> >
>>> > e.g. Solr training
>>> >
>>> > *Porter:-*  "solr"  "train"
>>> >   Position 1 2
>>> >
>>> > *Concatenated :-*   "solr"  "train"
>>> >"solrtrain"
>>> >Position 1  2
>>> >
>>> > Please help me out. How to create custom filter for this requirement.
>>> >
>>> > With Regards
>>> > Aman Tandon
>>> >
>>>
>>>
>>>
>>> --
>>> --
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>
>>
>


Re: Raw lucene query for a given solr query

2015-06-16 Thread Chris Hostetter

: You can get raw query (and other debug information) with debug=true
: paramter.

more specifically -- if you are writting a custom SearchComponent, and 
want to access the underlying Query object produced by the parsers that 
SolrIndexSearcher has executed, you can do so the same way the debug 
component does...

https://svn.apache.org/viewvc/lucene/dev/branches/branch_5x/solr/core/src/java/org/apache/solr/handler/component/DebugComponent.java?view=markup#l98

: > Hi,
: >
: >  We have a few custom solrcloud components that act as value sources inside
: > solrcloud for boosting items in the index.  I want to get the final raw
: > lucene query used by solr for querying the index (for debugging purposes).
: >
: > Is it possible to get that information?
: >
: > Kindly advise
: >
: > Thanks,
: > Nitin
: >
: 

-Hoss
http://www.lucidworks.com/


Re: Facet on same field in different ways

2015-06-16 Thread Phanindra R
Thanks guys. The syntax  "facet.field={!key=abc
facet.limit=10}facetFieldName" works.

On Tue, Jun 16, 2015 at 11:22 AM, Chris Hostetter 
wrote:

>
> : Have you tried this syntax ?
> :
> : &facet=true&facet.field={!ex=st key=terms facet.limit=5
> : facet.prefix=ap}query_terms&facet.field={!key=terms2
> : facet.limit=1}query_terms&rows=0&facet.mincount=1
> :
> : This seems the proper syntax, I found it here :
>
> yeah, local params are supported for specifying facet "options" like this.
> Aparently it never got documented, but i've added a comment to the
> Faceting page with techproducts example anyone can try with solr out ofthe
> box...
>
>
> https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=58851733#comment-58851733
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Upayavira
I think it makes it bold on bold, which won't be particularly visible.

On Tue, Jun 16, 2015, at 06:52 AM, Sznajder ForMailingList wrote:
> Hi,
> 
> I was testing the highlight feature and played with the techproducts
> example.
> It appears that the highlighting works on Mozilla Firefox, but not on
> Google Chrome.
> 
> For your information
> 
> Benjamin


Re: Facet on same field in different ways

2015-06-16 Thread Chris Hostetter

: Have you tried this syntax ?
: 
: &facet=true&facet.field={!ex=st key=terms facet.limit=5
: facet.prefix=ap}query_terms&facet.field={!key=terms2
: facet.limit=1}query_terms&rows=0&facet.mincount=1
: 
: This seems the proper syntax, I found it here :

yeah, local params are supported for specifying facet "options" like this.  
Aparently it never got documented, but i've added a comment to the 
Faceting page with techproducts example anyone can try with solr out ofthe 
box...

https://cwiki.apache.org/confluence/display/solr/Faceting?focusedCommentId=58851733#comment-58851733




-Hoss
http://www.lucidworks.com/


Re: Do we need to add docValues="true" to "_version_" field in schema.xml?

2015-06-16 Thread Chris Hostetter
: For the "_version_" field in the schema.xml, do we need to set it be
: docValues="true"?

you *can* add docValues, but it is not required.

There is an open discussion about wether we should add docValues to 
the _version_ field (or even switch completely to indexed="false") in this 
jira...

https://issues.apache.org/jira/browse/SOLR-6337

...if you try it out and find it works better for you, please post a 
comment with your experiences and any annecdotal performance impacts you 
notice.  (real world use cases/observations are always helpful)



-Hoss
http://www.lucidworks.com/


Re: mapreduce job using soirj 5

2015-06-16 Thread Shenghua(Daniel) Wan
Hadoop has a switch that lets you use your jar rather than the one hadoop
carries.
google for HADOOP_OPTS
good luck.

On Tue, Jun 16, 2015 at 7:23 AM, adfel70  wrote:

> Hi,
>
> We recently started testing solr 5, our indexer creates mapreduce job that
> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>
> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
> with httpclient-4.2.5
> and that causing us jar-hell because hadoop jars are being loaded first and
> solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5
>
> Does anyone encounter that? and have a solution? or a workaround?
>
> Right now we are replacing the jar physically in each data node
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 

Regards,
Shenghua (Daniel) Wan


Re: phrase matches returning near matches

2015-06-16 Thread Terry Rhodes
This might be an issue with your stemmer. "management" being stemmed to 
"manage", "changes" being stemmed to "change" then the terms match. You 
can use the solr admin UI to test your indexing and query analysis 
chains to see if this is happening.



On 6/16/2015 3:22 AM, Alistair Young wrote:

Hiya,

I've been looking for documentation that would point to where I could modify or 
explain why 'near neighbours' are returned from a phrase search. If I search 
for:

"manage change"

I get back a document that contains "this will help in your management of  changes". It's relevant but I'd like to understand why solr is returning it. 
Is it a combination of fuzzy/slop? The distance between the two variations of the two words in 
the document is quite large.

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h





Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Shenghua(Daniel) Wan
Hi, Toke,
Did you try MapReduce with solr? I think it should be a good fit for your
use case.

On Tue, Jun 16, 2015 at 5:02 AM, Toke Eskildsen 
wrote:

> Shenghua(Daniel) Wan  wrote:
> > Actually, I am currently interested in how to boost merging/optimizing
> > performance of single solr instance.
>
> We have the same challenge (we build static 900GB shards one at a time and
> the final optimization takes 8 hours with only 1 CPU core at 100%). I know
> that there is code for detecting SSDs, which should make merging faster (by
> running more merges in parallel?), but I am afraid that optimize (a single
> merge) is always single threaded.
>
> It seems to me that at least some of the different files making up a
> segment could be created in parallel, but I do not know how hard it would
> be to do so.
>
> - Toke Eskildsen
>



-- 

Regards,
Shenghua (Daniel) Wan


Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
I thought it might be useful to list the logging errors as well. Here they
are. There are just three. 


WARN   FileDataSourceFileDataSource.basePath is empty. Resolving to:
/home/paden/Downloads/solr-5.1.0/server/.

ERRORDocBuilder

 Exception while processing: file document : SolrInputDocument(fields:
[]):org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.

ERROR  DataImporter

Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.RuntimeException: java.io.FileNotFoundException: Could not find
file: (resolved to: /home/paden/Downloads/solr-5.1.0/server/.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241p4212252.html
Sent from the Solr - User mailing list archive at Nabble.com.


TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
Hi, there's a guy who's already asked a question similar to this and I'm
basically going off what he did here. It's exactly what I'm doing which is
taking a file path from a database and using TikaEntityProcessor to analyze
the document. The link to his question is here. 

http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html#a3524905

  

His problem was version issues with Tika but I'm using a version that is
about five years older so I'm not sure if it's still issues with the current
version of Tika or if I'm missing something extremely obvious (which is
possible I'm extremely new to Solr) This is my data configuration.
TextContentURL is the filepath!

 
   
   

  
 
 
 
 

 
 
   
 

I'd like to note that when I delete the second entity and just run the
database draw it works fine. I can run and query and I get this output when
I run a faceted search

 "response": {
"numFound": 283,
"start": 0,
"docs": [
  {
"id": "/home/paden/Documents/LWP_Files/BIGDATA/6220106.pdf",
"title": "ENGINEERING INITIATION",
  },

This means that it is pulling the document filepath JUST FINE. The id is the
correct filepath. But when I re-add the second entity it logs errors saying
it can't find the file? Am I missing something obvious? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-Not-Finding-My-Files-tp4212241.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: mapreduce job using soirj 5

2015-06-16 Thread Shawn Heisey
On 6/16/2015 9:24 AM, Erick Erickson wrote:
> Sounds like a question better asked in one of the Cloudera support
> forums, 'cause all I can do is guess ;).
>
> I suppose, theoretically, that you could check out the Solr5
> code and substitute the httpclient-4.2.5.jar in the build system,
> recompile and go, but that's totally a guess based on zero knowledge
> of whether compiling Solr with an earlier httpclient would even work.
> Frankly, though, that sounds like more work than distributing the older
> jar to the data nodes.
>
> Best,
> Erick
>
> On Tue, Jun 16, 2015 at 7:23 AM, adfel70  wrote:
>> Hi,
>>
>> We recently started testing solr 5, our indexer creates mapreduce job that
>> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
>> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>>
>> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
>> with httpclient-4.2.5

In addition to what Erick said:  When I upgraded the build system in
Solr to from HttpClient 4.2 to 4.3, no code changes were required.  It
worked immediately, and all tests passed.  It is likely that you can
simply use HttpClient 4.3.1 everywhere and hadoop will work properly. 
This is one of Apache's design goals for software libraries.  It's not
always possible to achieve it, but it is something we always try to do.

Thanks,
Shawn



Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yep seems that’s the answer. The highlighting is done separately by the
rails app, so I’ll look into proper solr highlighting.

thanks a lot for the use of your ears, much improved understanding!

cheers,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:33, "Erick Erickson"  wrote:

>Hmmm. First, highlighting should work here. If you have it configured
>to work  on the dc.description field.
>
>As to whether the phrase "management changes" is near enough, I
>pretty much guarantee it is. This is where the admin/analysis page can
>answer this type of question authoritatively since it's based exactly
>on your particular analysis chain.
>
>Best,
>Erick
>
>On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young
> wrote:
>> yes prolly not a bug. The highlighting is on but nothing is highlighted.
>> Perhaps this text is triggering it?
>>
>> 'consider the impacts of land management changes’
>>
>> that would seem reasonable. It’s not a direct match so no highlighting
>> (the highlighting does work on a direct match) but 'management changes’
>> must be near enough ‘manage change’ to trigger a result.
>>
>> Alistair
>>
>> --
>> mov eax,1
>> mov ebx,0
>> int 80h
>>
>>
>>
>>
>> On 16/06/2015 16:18, "Erick Erickson"  wrote:
>>
>>>I agree with Allesandro the behavior you're describing
>>>is _not_ correct at all given your description. So either
>>>
>>>1> There's something "interesting" about your configuration
>>>  that doesn't seem important that you haven't told us,
>>>  although what it could be is a mystery to me  too ;)
>>>
>>>2> it's matching on something else. Note that the
>>> phrase has been stemmed, so something in there
>>> besides management might stem to manag and/or
>>>something other than changes might stem to chang
>>>and the two of _them_ happen to be next to each
>>>other. "are managers changing?" for instance. Or
>>>even something less likely. Perhaps turn on
>>>highlighting and see if it pops out?
>>>
>>>
>>>3> you've uncovered a bug. Although I suspect others
>>>would have reported it and the unit tests would have
>>>barfed all over the place.
>>>
>>>One other thing you can do. Go to the admin/analysis
>>>page and turn on the "verbose" check box. Put
>>>management is undergoing many changes
>>>in both the query and index boxes. The result (it's
>>>kind of hard to read I'll admit) will include the position
>>>of each token after all the analysis is done. Phrase
>>>queries (without slop) should only be matching adjacent
>>>positions. So the question is whether the position info
>>>"looks correct"
>>>
>>>Best,
>>>Erick
>>>
>>>On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
>>> wrote:
 According to your debug you are using a default Lucene Query Parser.
 This surprise me as i would expect with that query a match with
distance 0
 between the 2 terms .

 Are you sure nothing else is that field that matches the phrase query
?

 From the documentation

 "Lucene supports finding words are a within a specific distance away.
To do
 a proximity search use the tilde, "~", symbol at the end of a Phrase.
For
 example to search for a "apache" and "jakarta" within 10 words of each
 other in a document use the search:

 "jakarta apache"~10 "


 Cheers


 2015-06-16 11:33 GMT+01:00 Alistair Young :

> it¹s a useful behaviour. I¹d just like to understand where it¹s
>deciding
> the document is relevant. debug output is:
>
> 
>   dc.description:"manage change"
>   dc.description:"manage change"
>   PhraseQuery(dc.description:"manag
>chang")
>   dc.description:"manag chang"
>   
> 
> 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
> [DefaultSimilarity], result of:
>   1.2008798 = fieldWeight in 221, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = phraseFreq=1.0
> 9.6070385 = idf(), sum of:
>   4.0365543 = idf(docFreq=101, maxDocs=2125)
>   5.5704846 = idf(docFreq=21, maxDocs=2125)
> 0.125 = fieldNorm(doc=221)
> 
>   
>   LuceneQParser
>   
> 41.0
> 
>   3.0
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
> 
> 
>   35.0
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 35.0
>   
> 
>   
> 
>
>
> thanks,
>
> Alistair
>
> --
> mov eax,1
> mov ebx,0
> int 80h
>
>
>
>
> On 16/06/2015 1

Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
Hmmm. First, highlighting should work here. If you have it configured
to work  on the dc.description field.

As to whether the phrase "management changes" is near enough, I
pretty much guarantee it is. This is where the admin/analysis page can
answer this type of question authoritatively since it's based exactly
on your particular analysis chain.

Best,
Erick

On Tue, Jun 16, 2015 at 8:25 AM, Alistair Young
 wrote:
> yes prolly not a bug. The highlighting is on but nothing is highlighted.
> Perhaps this text is triggering it?
>
> 'consider the impacts of land management changes’
>
> that would seem reasonable. It’s not a direct match so no highlighting
> (the highlighting does work on a direct match) but 'management changes’
> must be near enough ‘manage change’ to trigger a result.
>
> Alistair
>
> --
> mov eax,1
> mov ebx,0
> int 80h
>
>
>
>
> On 16/06/2015 16:18, "Erick Erickson"  wrote:
>
>>I agree with Allesandro the behavior you're describing
>>is _not_ correct at all given your description. So either
>>
>>1> There's something "interesting" about your configuration
>>  that doesn't seem important that you haven't told us,
>>  although what it could be is a mystery to me  too ;)
>>
>>2> it's matching on something else. Note that the
>> phrase has been stemmed, so something in there
>> besides management might stem to manag and/or
>>something other than changes might stem to chang
>>and the two of _them_ happen to be next to each
>>other. "are managers changing?" for instance. Or
>>even something less likely. Perhaps turn on
>>highlighting and see if it pops out?
>>
>>
>>3> you've uncovered a bug. Although I suspect others
>>would have reported it and the unit tests would have
>>barfed all over the place.
>>
>>One other thing you can do. Go to the admin/analysis
>>page and turn on the "verbose" check box. Put
>>management is undergoing many changes
>>in both the query and index boxes. The result (it's
>>kind of hard to read I'll admit) will include the position
>>of each token after all the analysis is done. Phrase
>>queries (without slop) should only be matching adjacent
>>positions. So the question is whether the position info
>>"looks correct"
>>
>>Best,
>>Erick
>>
>>On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
>> wrote:
>>> According to your debug you are using a default Lucene Query Parser.
>>> This surprise me as i would expect with that query a match with
>>>distance 0
>>> between the 2 terms .
>>>
>>> Are you sure nothing else is that field that matches the phrase query ?
>>>
>>> From the documentation
>>>
>>> "Lucene supports finding words are a within a specific distance away.
>>>To do
>>> a proximity search use the tilde, "~", symbol at the end of a Phrase.
>>>For
>>> example to search for a "apache" and "jakarta" within 10 words of each
>>> other in a document use the search:
>>>
>>> "jakarta apache"~10 "
>>>
>>>
>>> Cheers
>>>
>>>
>>> 2015-06-16 11:33 GMT+01:00 Alistair Young :
>>>
 it¹s a useful behaviour. I¹d just like to understand where it¹s
deciding
 the document is relevant. debug output is:

 
   dc.description:"manage change"
   dc.description:"manage change"
   PhraseQuery(dc.description:"manag
chang")
   dc.description:"manag chang"
   
 
 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
 [DefaultSimilarity], result of:
   1.2008798 = fieldWeight in 221, product of:
 1.0 = tf(freq=1.0), with freq of:
   1.0 = phraseFreq=1.0
 9.6070385 = idf(), sum of:
   4.0365543 = idf(docFreq=101, maxDocs=2125)
   5.5704846 = idf(docFreq=21, maxDocs=2125)
 0.125 = fieldNorm(doc=221)
 
   
   LuceneQParser
   
 41.0
 
   3.0
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 0.0
   
 
 
   35.0
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 0.0
   
   
 35.0
   
 
   
 


 thanks,

 Alistair

 --
 mov eax,1
 mov ebx,0
 int 80h




 On 16/06/2015 11:26, "Alessandro Benedetti"

 wrote:

 >Can you show us how the query is parsed ?
 >You didn't tell us nothing about the query parser you are using.
 >Enable the debugQuery=true will show you how the query is parsed and
this
 >will be quite useful for us.
 >
 >
 >Cheers
 >
 >2015-06-16 11:22 GMT+01:00 Alistair Young :
 >
 >> Hiya,
 >>
 >> I've been looking for documentation that would point to where I
coul

Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yes prolly not a bug. The highlighting is on but nothing is highlighted.
Perhaps this text is triggering it?

'consider the impacts of land management changes’

that would seem reasonable. It’s not a direct match so no highlighting
(the highlighting does work on a direct match) but 'management changes’
must be near enough ‘manage change’ to trigger a result.

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 16:18, "Erick Erickson"  wrote:

>I agree with Allesandro the behavior you're describing
>is _not_ correct at all given your description. So either
>
>1> There's something "interesting" about your configuration
>  that doesn't seem important that you haven't told us,
>  although what it could be is a mystery to me  too ;)
>
>2> it's matching on something else. Note that the
> phrase has been stemmed, so something in there
> besides management might stem to manag and/or
>something other than changes might stem to chang
>and the two of _them_ happen to be next to each
>other. "are managers changing?" for instance. Or
>even something less likely. Perhaps turn on
>highlighting and see if it pops out?
>
>
>3> you've uncovered a bug. Although I suspect others
>would have reported it and the unit tests would have
>barfed all over the place.
>
>One other thing you can do. Go to the admin/analysis
>page and turn on the "verbose" check box. Put
>management is undergoing many changes
>in both the query and index boxes. The result (it's
>kind of hard to read I'll admit) will include the position
>of each token after all the analysis is done. Phrase
>queries (without slop) should only be matching adjacent
>positions. So the question is whether the position info
>"looks correct"
>
>Best,
>Erick
>
>On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
> wrote:
>> According to your debug you are using a default Lucene Query Parser.
>> This surprise me as i would expect with that query a match with
>>distance 0
>> between the 2 terms .
>>
>> Are you sure nothing else is that field that matches the phrase query ?
>>
>> From the documentation
>>
>> "Lucene supports finding words are a within a specific distance away.
>>To do
>> a proximity search use the tilde, "~", symbol at the end of a Phrase.
>>For
>> example to search for a "apache" and "jakarta" within 10 words of each
>> other in a document use the search:
>>
>> "jakarta apache"~10 "
>>
>>
>> Cheers
>>
>>
>> 2015-06-16 11:33 GMT+01:00 Alistair Young :
>>
>>> it¹s a useful behaviour. I¹d just like to understand where it¹s
>>>deciding
>>> the document is relevant. debug output is:
>>>
>>> 
>>>   dc.description:"manage change"
>>>   dc.description:"manage change"
>>>   PhraseQuery(dc.description:"manag
>>>chang")
>>>   dc.description:"manag chang"
>>>   
>>> 
>>> 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
>>> [DefaultSimilarity], result of:
>>>   1.2008798 = fieldWeight in 221, product of:
>>> 1.0 = tf(freq=1.0), with freq of:
>>>   1.0 = phraseFreq=1.0
>>> 9.6070385 = idf(), sum of:
>>>   4.0365543 = idf(docFreq=101, maxDocs=2125)
>>>   5.5704846 = idf(docFreq=21, maxDocs=2125)
>>> 0.125 = fieldNorm(doc=221)
>>> 
>>>   
>>>   LuceneQParser
>>>   
>>> 41.0
>>> 
>>>   3.0
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>> 
>>> 
>>>   35.0
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 0.0
>>>   
>>>   
>>> 35.0
>>>   
>>> 
>>>   
>>> 
>>>
>>>
>>> thanks,
>>>
>>> Alistair
>>>
>>> --
>>> mov eax,1
>>> mov ebx,0
>>> int 80h
>>>
>>>
>>>
>>>
>>> On 16/06/2015 11:26, "Alessandro Benedetti"
>>>
>>> wrote:
>>>
>>> >Can you show us how the query is parsed ?
>>> >You didn't tell us nothing about the query parser you are using.
>>> >Enable the debugQuery=true will show you how the query is parsed and
>>>this
>>> >will be quite useful for us.
>>> >
>>> >
>>> >Cheers
>>> >
>>> >2015-06-16 11:22 GMT+01:00 Alistair Young :
>>> >
>>> >> Hiya,
>>> >>
>>> >> I've been looking for documentation that would point to where I
>>>could
>>> >> modify or explain why 'near neighbours' are returned from a phrase
>>> >>search.
>>> >> If I search for:
>>> >>
>>> >> "manage change"
>>> >>
>>> >> I get back a document that contains "this will help in your
>>>management
>>> >>of
>>> >>  changes". It's relevant but I'd like to
>>>understand
>>> >>why
>>> >> solr is returning it. Is it a combination of fuzzy/slop? The
>>>distance
>>> >> between the two variations of the two words in the document is quite
>>> >>large.
>>> >>
>>> >> thanks,
>>> >>
>>> >> Alistair
>>> >>
>>> >> --
>>> >> mov eax,1
>>> >> mov ebx,0
>>> >> int 80h
>>> >>
>>> >
>>> >
>>

Re: mapreduce job using soirj 5

2015-06-16 Thread Erick Erickson
Sounds like a question better asked in one of the Cloudera support
forums, 'cause all I can do is guess ;).

I suppose, theoretically, that you could check out the Solr5
code and substitute the httpclient-4.2.5.jar in the build system,
recompile and go, but that's totally a guess based on zero knowledge
of whether compiling Solr with an earlier httpclient would even work.
Frankly, though, that sounds like more work than distributing the older
jar to the data nodes.

Best,
Erick

On Tue, Jun 16, 2015 at 7:23 AM, adfel70  wrote:
> Hi,
>
> We recently started testing solr 5, our indexer creates mapreduce job that
> uses solrj5 to index documents to our SolrCloud. Until now, we used solr
> 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
>
> The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
> with httpclient-4.2.5
> and that causing us jar-hell because hadoop jars are being loaded first and
> solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5
>
> Does anyone encounter that? and have a solution? or a workaround?
>
> Right now we are replacing the jar physically in each data node
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either

1> There's something "interesting" about your configuration
  that doesn't seem important that you haven't told us,
  although what it could be is a mystery to me  too ;)

2> it's matching on something else. Note that the
 phrase has been stemmed, so something in there
 besides management might stem to manag and/or
something other than changes might stem to chang
and the two of _them_ happen to be next to each
other. "are managers changing?" for instance. Or
even something less likely. Perhaps turn on
highlighting and see if it pops out?


3> you've uncovered a bug. Although I suspect others
would have reported it and the unit tests would have
barfed all over the place.

One other thing you can do. Go to the admin/analysis
page and turn on the "verbose" check box. Put
management is undergoing many changes
in both the query and index boxes. The result (it's
kind of hard to read I'll admit) will include the position
of each token after all the analysis is done. Phrase
queries (without slop) should only be matching adjacent
positions. So the question is whether the position info
"looks correct"

Best,
Erick

On Tue, Jun 16, 2015 at 4:40 AM, Alessandro Benedetti
 wrote:
> According to your debug you are using a default Lucene Query Parser.
> This surprise me as i would expect with that query a match with distance 0
> between the 2 terms .
>
> Are you sure nothing else is that field that matches the phrase query ?
>
> From the documentation
>
> "Lucene supports finding words are a within a specific distance away. To do
> a proximity search use the tilde, "~", symbol at the end of a Phrase. For
> example to search for a "apache" and "jakarta" within 10 words of each
> other in a document use the search:
>
> "jakarta apache"~10 "
>
>
> Cheers
>
>
> 2015-06-16 11:33 GMT+01:00 Alistair Young :
>
>> it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
>> the document is relevant. debug output is:
>>
>> 
>>   dc.description:"manage change"
>>   dc.description:"manage change"
>>   PhraseQuery(dc.description:"manag chang")
>>   dc.description:"manag chang"
>>   
>> 
>> 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
>> [DefaultSimilarity], result of:
>>   1.2008798 = fieldWeight in 221, product of:
>> 1.0 = tf(freq=1.0), with freq of:
>>   1.0 = phraseFreq=1.0
>> 9.6070385 = idf(), sum of:
>>   4.0365543 = idf(docFreq=101, maxDocs=2125)
>>   5.5704846 = idf(docFreq=21, maxDocs=2125)
>> 0.125 = fieldNorm(doc=221)
>> 
>>   
>>   LuceneQParser
>>   
>> 41.0
>> 
>>   3.0
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>> 
>> 
>>   35.0
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 0.0
>>   
>>   
>> 35.0
>>   
>> 
>>   
>> 
>>
>>
>> thanks,
>>
>> Alistair
>>
>> --
>> mov eax,1
>> mov ebx,0
>> int 80h
>>
>>
>>
>>
>> On 16/06/2015 11:26, "Alessandro Benedetti" 
>> wrote:
>>
>> >Can you show us how the query is parsed ?
>> >You didn't tell us nothing about the query parser you are using.
>> >Enable the debugQuery=true will show you how the query is parsed and this
>> >will be quite useful for us.
>> >
>> >
>> >Cheers
>> >
>> >2015-06-16 11:22 GMT+01:00 Alistair Young :
>> >
>> >> Hiya,
>> >>
>> >> I've been looking for documentation that would point to where I could
>> >> modify or explain why 'near neighbours' are returned from a phrase
>> >>search.
>> >> If I search for:
>> >>
>> >> "manage change"
>> >>
>> >> I get back a document that contains "this will help in your management
>> >>of
>> >>  changes". It's relevant but I'd like to understand
>> >>why
>> >> solr is returning it. Is it a combination of fuzzy/slop? The distance
>> >> between the two variations of the two words in the document is quite
>> >>large.
>> >>
>> >> thanks,
>> >>
>> >> Alistair
>> >>
>> >> --
>> >> mov eax,1
>> >> mov ebx,0
>> >> int 80h
>> >>
>> >
>> >
>> >
>> >--
>> >--
>> >
>> >Benedetti Alessandro
>> >Visiting card : http://about.me/alessandro_benedetti
>> >
>> >"Tyger, tyger burning bright
>> >In the forests of the night,
>> >What immortal hand or eye
>> >Could frame thy fearful symmetry?"
>> >
>> >William Blake - Songs of Experience -1794 England
>>
>>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Do we need to add docValues="true" to "_version_" field in schema.xml?

2015-06-16 Thread Erick Erickson
Did you look in the example schema files? None of them have
_version_ set as docValues.

Best,
Erick

On Tue, Jun 16, 2015 at 1:44 AM, forest_soup  wrote:
> For the "_version_" field in the schema.xml, do we need to set it be
> docValues="true"?
>
>
> As we noticed there are FieldCache for "_version_" in the solr stats:
> 
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Do-we-need-to-add-docValues-true-to-version-field-in-schema-xml-tp4212123.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..

This is an XY problem IMO. Please describe exactly what
you're trying to accomplish, with examples rather than
continue to pursue this path. It sounds like you want
spellcheck or similar. The _point_ behind the
suggesters is that they handle multiple-word suggestions
by returning he whole field. So putting long text fields
into them is not going to work.

Best,
Erick

On Tue, Jun 16, 2015 at 1:46 AM, Alessandro Benedetti
 wrote:
> in line :
>
> 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :
>
>> Thanks Benedetti,
>>
>> I've change to the AnalyzingInfixLookup approach, and it is able to start
>> searching from the middle of the field.
>>
>> However, is it possible to make the suggester to show only part of the
>> content of the field (like 2 or 3 fields after), instead of the entire
>> content/sentence, which can be quite long?
>>
>
> I assume you use "fields" in the place of tokens.
> The answer is yes, I already said that in my previous mail, I invite you to
> read carefully the answers and the documentation linked !
>
> Related the excessive dimensions of tokens. This is weird, what are you
> trying to autocomplete ?
> I really doubt would be useful for a user to see super long auto completed
> terms.
>
> Cheers
>
>>
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On 15 June 2015 at 17:33, Alessandro Benedetti > >
>> wrote:
>>
>> > ehehe Edwin, I think you should read again the document I linked time
>> ago :
>> >
>> > http://lucidworks.com/blog/solr-suggester/
>> >
>> > The suggester you used is not meant to provide infix suggestions.
>> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
>> terms
>> > of a field content.
>> >
>> > What you are looking for is actually one of the Infix Suggesters.
>> > For example the AnalyzingInfixLookup approach.
>> >
>> > When working with Suggesters is important first to make a distinction :
>> >
>> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
>> >
>> > 2) Returning token(s) ( Free Text Suggester)
>> >
>> > Then the second difference is :
>> >
>> > 1) Infix suggestions ( from the "middle" of the field content)
>> > 2) Classic suggester ( from the beginning of the field content)
>> >
>> > Clarified that, will be quite simple to work with suggesters.
>> >
>> > Cheers
>> >
>> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
>> >
>> > > I've indexed a rich-text documents with the following content:
>> > >
>> > > This is a testing rich text documents to test the uploading of files to
>> > > Solr
>> > >
>> > >
>> > > When I tried to use the suggestion, it return me the entire field in
>> the
>> > > content once I enter suggest?q=t. However, when I tried to search for
>> > > q='rich', I don't get any results returned.
>> > >
>> > > This is my current configuration for the suggester:
>> > > 
>> > >   
>> > > mySuggester
>> > > FuzzyLookupFactory
>> > > DocumentDictionaryFactory
>> > > Suggestion
>> > > suggestType
>> > > true
>> > > false
>> > >   
>> > > 
>> > >
>> > > > > startup="lazy" >
>> > >   
>> > > json
>> > > true
>> > >
>> > > true
>> > > 10
>> > > mySuggester
>> > >   
>> > >   
>> > > suggest
>> > >   
>> > > 
>> > >
>> > > Is it possible to allow the suggester to return something even from the
>> > > middle of the sentence, and also not to return the entire sentence if
>> the
>> > > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
>> > > return more fields as the users type.
>> > >
>> > > For example,
>> > > When user type 'this', it should return 'This is a testing'
>> > > When user type 'this is a testing', it should return 'This is a testing
>> > > rich text documents'.
>> > >
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> >
>> >
>> >
>> > --
>> > --
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>> >
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
>
> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")


typo error
e.g. Intent for solr training: fq=id:(234 456 545) title:("solr training")

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon 
wrote:

> We has some business logic to search the user query in "user intent" or
> "finding the exact matching products".
>
> e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")
>
> As we can see it is phrase query so it will took more time than the single
> stemmed token query. There are also 5-7 words phrase query. So we want to
> reduce the search time by implementing this feature.
>
> With Regards
> Aman Tandon
>
> On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
>> Can I ask you why you need to concatenate the tokens ? Maybe we can find a
>> better solution to concat all the tokens in one single big token .
>> I find it difficult to understand the reasons behind tokenising, token
>> filtering and then un-tokenizing again :)
>> It would be great if you explain a little bit better what you would like
>> to
>> do !
>>
>>
>> Cheers
>>
>> 2015-06-16 13:26 GMT+01:00 Aman Tandon :
>>
>> > Hi,
>> >
>> > I have a requirement to create the concatenated token of all the tokens
>> > created from the last item of my analyzer chain.
>> >
>> > *Suppose my analyzer chain is :*
>> >
>> >
>> >
>> >
>> >
>> > *   > > class="solr.WordDelimiterFilterFactory" catenateAll="1"
>> splitOnNumerics="1"
>> > preserveOriginal="1"/>> > minGramSize="2" maxGramSize="15" side="front" />> > class="solr.PorterStemmerFilterFactory"/>*
>> > I want to create a concatenated token plugin to add at concatenated
>> token
>> > along with the last token.
>> >
>> > e.g. Solr training
>> >
>> > *Porter:-*  "solr"  "train"
>> >   Position 1 2
>> >
>> > *Concatenated :-*   "solr"  "train"
>> >"solrtrain"
>> >Position 1  2
>> >
>> > Please help me out. How to create custom filter for this requirement.
>> >
>> > With Regards
>> > Aman Tandon
>> >
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>


Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
We has some business logic to search the user query in "user intent" or
"finding the exact matching products".

e.g. Intent for solr training: fq=id: 234, 456, 545 title("solr training")

As we can see it is phrase query so it will took more time than the single
stemmed token query. There are also 5-7 words phrase query. So we want to
reduce the search time by implementing this feature.

With Regards
Aman Tandon

On Tue, Jun 16, 2015 at 6:42 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Can I ask you why you need to concatenate the tokens ? Maybe we can find a
> better solution to concat all the tokens in one single big token .
> I find it difficult to understand the reasons behind tokenising, token
> filtering and then un-tokenizing again :)
> It would be great if you explain a little bit better what you would like to
> do !
>
>
> Cheers
>
> 2015-06-16 13:26 GMT+01:00 Aman Tandon :
>
> > Hi,
> >
> > I have a requirement to create the concatenated token of all the tokens
> > created from the last item of my analyzer chain.
> >
> > *Suppose my analyzer chain is :*
> >
> >
> >
> >
> >
> > *> class="solr.WordDelimiterFilterFactory" catenateAll="1"
> splitOnNumerics="1"
> > preserveOriginal="1"/> > minGramSize="2" maxGramSize="15" side="front" /> > class="solr.PorterStemmerFilterFactory"/>*
> > I want to create a concatenated token plugin to add at concatenated token
> > along with the last token.
> >
> > e.g. Solr training
> >
> > *Porter:-*  "solr"  "train"
> >   Position 1 2
> >
> > *Concatenated :-*   "solr"  "train"
> >"solrtrain"
> >Position 1  2
> >
> > Please help me out. How to create custom filter for this requirement.
> >
> > With Regards
> > Aman Tandon
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


mapreduce job using soirj 5

2015-06-16 Thread adfel70
Hi, 

We recently started testing solr 5, our indexer creates mapreduce job that
uses solrj5 to index documents to our SolrCloud. Until now, we used solr
4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.

The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
with httpclient-4.2.5
and that causing us jar-hell because hadoop jars are being loaded first and
solrj is using closeablehttpclient class which is in 4.3.1 but not in 4.2.5

Does anyone encounter that? and have a solution? or a workaround?

Right now we are replacing the jar physically in each data node





--
View this message in context: 
http://lucene.472066.n3.nabble.com/mapreduce-job-using-soirj-5-tp4212199.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Sznajder ForMailingList
Hi,

I was testing the highlight feature and played with the techproducts
example.
It appears that the highlighting works on Mozilla Firefox, but not on
Google Chrome.

For your information

Benjamin


Re: How to create concatenated token

2015-06-16 Thread Alessandro Benedetti
Can I ask you why you need to concatenate the tokens ? Maybe we can find a
better solution to concat all the tokens in one single big token .
I find it difficult to understand the reasons behind tokenising, token
filtering and then un-tokenizing again :)
It would be great if you explain a little bit better what you would like to
do !


Cheers

2015-06-16 13:26 GMT+01:00 Aman Tandon :

> Hi,
>
> I have a requirement to create the concatenated token of all the tokens
> created from the last item of my analyzer chain.
>
> *Suppose my analyzer chain is :*
>
>
>
>
>
> *class="solr.WordDelimiterFilterFactory" catenateAll="1" splitOnNumerics="1"
> preserveOriginal="1"/> minGramSize="2" maxGramSize="15" side="front" /> class="solr.PorterStemmerFilterFactory"/>*
> I want to create a concatenated token plugin to add at concatenated token
> along with the last token.
>
> e.g. Solr training
>
> *Porter:-*  "solr"  "train"
>   Position 1 2
>
> *Concatenated :-*   "solr"  "train"
>"solrtrain"
>Position 1  2
>
> Please help me out. How to create custom filter for this requirement.
>
> With Regards
> Aman Tandon
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi,

I have a requirement to create the concatenated token of all the tokens
created from the last item of my analyzer chain.

*Suppose my analyzer chain is :*





*   *
I want to create a concatenated token plugin to add at concatenated token
along with the last token.

e.g. Solr training

*Porter:-*  "solr"  "train"
  Position 1 2

*Concatenated :-*   "solr"  "train"
   "solrtrain"
   Position 1  2

Please help me out. How to create custom filter for this requirement.

With Regards
Aman Tandon


Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Toke Eskildsen
Shenghua(Daniel) Wan  wrote:
> Actually, I am currently interested in how to boost merging/optimizing
> performance of single solr instance.

We have the same challenge (we build static 900GB shards one at a time and the 
final optimization takes 8 hours with only 1 CPU core at 100%). I know that 
there is code for detecting SSDs, which should make merging faster (by running 
more merges in parallel?), but I am afraid that optimize (a single merge) is 
always single threaded.

It seems to me that at least some of the different files making up a segment 
could be created in parallel, but I do not know how hard it would be to do so.

- Toke Eskildsen


Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
According to your debug you are using a default Lucene Query Parser.
This surprise me as i would expect with that query a match with distance 0
between the 2 terms .

Are you sure nothing else is that field that matches the phrase query ?

>From the documentation

"Lucene supports finding words are a within a specific distance away. To do
a proximity search use the tilde, "~", symbol at the end of a Phrase. For
example to search for a "apache" and "jakarta" within 10 words of each
other in a document use the search:

"jakarta apache"~10 "


Cheers


2015-06-16 11:33 GMT+01:00 Alistair Young :

> it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
> the document is relevant. debug output is:
>
> 
>   dc.description:"manage change"
>   dc.description:"manage change"
>   PhraseQuery(dc.description:"manag chang")
>   dc.description:"manag chang"
>   
> 
> 1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
> [DefaultSimilarity], result of:
>   1.2008798 = fieldWeight in 221, product of:
> 1.0 = tf(freq=1.0), with freq of:
>   1.0 = phraseFreq=1.0
> 9.6070385 = idf(), sum of:
>   4.0365543 = idf(docFreq=101, maxDocs=2125)
>   5.5704846 = idf(docFreq=21, maxDocs=2125)
> 0.125 = fieldNorm(doc=221)
> 
>   
>   LuceneQParser
>   
> 41.0
> 
>   3.0
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
> 
> 
>   35.0
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 0.0
>   
>   
> 35.0
>   
> 
>   
> 
>
>
> thanks,
>
> Alistair
>
> --
> mov eax,1
> mov ebx,0
> int 80h
>
>
>
>
> On 16/06/2015 11:26, "Alessandro Benedetti" 
> wrote:
>
> >Can you show us how the query is parsed ?
> >You didn't tell us nothing about the query parser you are using.
> >Enable the debugQuery=true will show you how the query is parsed and this
> >will be quite useful for us.
> >
> >
> >Cheers
> >
> >2015-06-16 11:22 GMT+01:00 Alistair Young :
> >
> >> Hiya,
> >>
> >> I've been looking for documentation that would point to where I could
> >> modify or explain why 'near neighbours' are returned from a phrase
> >>search.
> >> If I search for:
> >>
> >> "manage change"
> >>
> >> I get back a document that contains "this will help in your management
> >>of
> >>  changes". It's relevant but I'd like to understand
> >>why
> >> solr is returning it. Is it a combination of fuzzy/slop? The distance
> >> between the two variations of the two words in the document is quite
> >>large.
> >>
> >> thanks,
> >>
> >> Alistair
> >>
> >> --
> >> mov eax,1
> >> mov ebx,0
> >> int 80h
> >>
> >
> >
> >
> >--
> >--
> >
> >Benedetti Alessandro
> >Visiting card : http://about.me/alessandro_benedetti
> >
> >"Tyger, tyger burning bright
> >In the forests of the night,
> >What immortal hand or eye
> >Could frame thy fearful symmetry?"
> >
> >William Blake - Songs of Experience -1794 England
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
the document is relevant. debug output is:


  dc.description:"manage change"
  dc.description:"manage change"
  PhraseQuery(dc.description:"manag chang")
  dc.description:"manag chang"
  

1.2008798 = (MATCH) weight(dc.description:"manag chang" in 221)
[DefaultSimilarity], result of:
  1.2008798 = fieldWeight in 221, product of:
1.0 = tf(freq=1.0), with freq of:
  1.0 = phraseFreq=1.0
9.6070385 = idf(), sum of:
  4.0365543 = idf(docFreq=101, maxDocs=2125)
  5.5704846 = idf(docFreq=21, maxDocs=2125)
0.125 = fieldNorm(doc=221)

  
  LuceneQParser
  
41.0

  3.0
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  


  35.0
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
0.0
  
  
35.0
  

  



thanks,

Alistair

-- 
mov eax,1
mov ebx,0
int 80h




On 16/06/2015 11:26, "Alessandro Benedetti" 
wrote:

>Can you show us how the query is parsed ?
>You didn't tell us nothing about the query parser you are using.
>Enable the debugQuery=true will show you how the query is parsed and this
>will be quite useful for us.
>
>
>Cheers
>
>2015-06-16 11:22 GMT+01:00 Alistair Young :
>
>> Hiya,
>>
>> I've been looking for documentation that would point to where I could
>> modify or explain why 'near neighbours' are returned from a phrase
>>search.
>> If I search for:
>>
>> "manage change"
>>
>> I get back a document that contains "this will help in your management
>>of
>>  changes". It's relevant but I'd like to understand
>>why
>> solr is returning it. Is it a combination of fuzzy/slop? The distance
>> between the two variations of the two words in the document is quite
>>large.
>>
>> thanks,
>>
>> Alistair
>>
>> --
>> mov eax,1
>> mov ebx,0
>> int 80h
>>
>
>
>
>-- 
>--
>
>Benedetti Alessandro
>Visiting card : http://about.me/alessandro_benedetti
>
>"Tyger, tyger burning bright
>In the forests of the night,
>What immortal hand or eye
>Could frame thy fearful symmetry?"
>
>William Blake - Songs of Experience -1794 England



Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
Can you show us how the query is parsed ?
You didn't tell us nothing about the query parser you are using.
Enable the debugQuery=true will show you how the query is parsed and this
will be quite useful for us.


Cheers

2015-06-16 11:22 GMT+01:00 Alistair Young :

> Hiya,
>
> I've been looking for documentation that would point to where I could
> modify or explain why 'near neighbours' are returned from a phrase search.
> If I search for:
>
> "manage change"
>
> I get back a document that contains "this will help in your management of
>  changes". It's relevant but I'd like to understand why
> solr is returning it. Is it a combination of fuzzy/slop? The distance
> between the two variations of the two words in the document is quite large.
>
> thanks,
>
> Alistair
>
> --
> mov eax,1
> mov ebx,0
> int 80h
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


phrase matches returning near matches

2015-06-16 Thread Alistair Young
Hiya,

I've been looking for documentation that would point to where I could modify or 
explain why 'near neighbours' are returned from a phrase search. If I search 
for:

"manage change"

I get back a document that contains "this will help in your management of  changes". It's relevant but I'd like to understand why solr is 
returning it. Is it a combination of fuzzy/slop? The distance between the two 
variations of the two words in the document is quite large.

thanks,

Alistair

--
mov eax,1
mov ebx,0
int 80h


What contribute to a Solr core's FieldCache entry_count?

2015-06-16 Thread forest_soup
For the fieldCache, what determines the entries_count? 

Is each search request containing a sort on an non-docValues field
contribute one entry to the entries_count?

For example, search A ( q=owner:1&sort=maildate asc ) and search b (
q=owner:2&sort=maildate asc ) will contribute 2 field cache entries ?

I have a collection containing only one core, and there is only one doc
within it, why there are so many lucene fieldCache? 

 
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-contribute-to-a-Solr-core-s-FieldCache-entry-count-tp4212148.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread Alessandro Benedetti
Hi Ariya,
I think Hossman specified you that the slop 1 is fine in your use case :)
Of course in the case using span queries was what you were expecting !

Cheers

2015-06-16 10:13 GMT+01:00 ariya bala :

> Ok. Thank you Chris.
> It is a custom Query parser.
> I will check my Query parser on where it inject the slop 1.
>
> On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter  >
> wrote:
>
> >
> > : I encounter this peculiar case with solr 4.10.2 where the parsed query
> > : doesnt seem to be logical.
> > :
> > : PHRASE23("reduce workforce") ==>
> > : SpanNearQuery(spanNear([spanNear([Contents:reduceä,
> > : Contents:workforceä], 1, true)], 23, true))
> >
> > 1) that does not appear to be a parser syntax of any parser that comes
> > with Solr (that i know of) so it's possible that whatever custom parser
> > you are using has a bug in it.
> >
> > 2) IIRC, with span queries (which unlike PhraseQueries explicitly support
> > both in-order, and out of order nearness) a slop of "0" is going to
> > require that the 2 spans "overlap" and occupy the exact same position --
> a
> > span of 1 means that they differ by a single position.
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
>
>
>
>
> --
> *Ariya *
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Facet on same field in different ways

2015-06-16 Thread Alessandro Benedetti
Hi Phanindra,
Have you tried this syntax ?

&facet=true&facet.field={!ex=st key=terms facet.limit=5
facet.prefix=ap}query_terms&facet.field={!key=terms2
facet.limit=1}query_terms&rows=0&facet.mincount=1

This seems the proper syntax, I found it here :
https://issues.apache.org/jira/browse/SOLR-4717

Is this solving your problem ?

Cheers

2015-06-16 0:05 GMT+01:00 Phanindra R :

> Hi guys,
>Is there a way to facet on same field in *different ways?* For
> example, using a different facet.prefix. Here are the details
>
> facet.field={!key=myKey}myField&facet.prefix=p   ==> works
> facet.field={!key=myKey}myField&f.myField.facet.prefix=p   ==> works
> facet.field={!key=myKey}myField&f.myKey.facet.prefix=p   ==>* doesn't work
>  (ref: Solr-1351)*
>
> In addition, when I try *f.myKey.facet.range.gap=2.0.* it actually doesn't
> recognize it and throws the error: "Missing required parameter:
> f.myField.facet.range.gap (or default: facet.range.gap)"
>
> I'm using Solr 4.10
>
> Ref: https://issues.apache.org/jira/browse/SOLR-1351
>
> Thanks
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread ariya bala
Ok. Thank you Chris.
It is a custom Query parser.
I will check my Query parser on where it inject the slop 1.

On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter 
wrote:

>
> : I encounter this peculiar case with solr 4.10.2 where the parsed query
> : doesnt seem to be logical.
> :
> : PHRASE23("reduce workforce") ==>
> : SpanNearQuery(spanNear([spanNear([Contents:reduceä,
> : Contents:workforceä], 1, true)], 23, true))
>
> 1) that does not appear to be a parser syntax of any parser that comes
> with Solr (that i know of) so it's possible that whatever custom parser
> you are using has a bug in it.
>
> 2) IIRC, with span queries (which unlike PhraseQueries explicitly support
> both in-order, and out of order nearness) a slop of "0" is going to
> require that the 2 spans "overlap" and occupy the exact same position -- a
> span of 1 means that they differ by a single position.
>
>
>
> -Hoss
> http://www.lucidworks.com/




-- 
*Ariya *


Re: Solr's suggester results

2015-06-16 Thread Alessandro Benedetti
in line :

2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo :

> Thanks Benedetti,
>
> I've change to the AnalyzingInfixLookup approach, and it is able to start
> searching from the middle of the field.
>
> However, is it possible to make the suggester to show only part of the
> content of the field (like 2 or 3 fields after), instead of the entire
> content/sentence, which can be quite long?
>

I assume you use "fields" in the place of tokens.
The answer is yes, I already said that in my previous mail, I invite you to
read carefully the answers and the documentation linked !

Related the excessive dimensions of tokens. This is weird, what are you
trying to autocomplete ?
I really doubt would be useful for a user to see super long auto completed
terms.

Cheers

>
>
> Regards,
> Edwin
>
>
>
> On 15 June 2015 at 17:33, Alessandro Benedetti  >
> wrote:
>
> > ehehe Edwin, I think you should read again the document I linked time
> ago :
> >
> > http://lucidworks.com/blog/solr-suggester/
> >
> > The suggester you used is not meant to provide infix suggestions.
> > The fuzzy suggester is working on a fuzzy basis , with the *starting*
> terms
> > of a field content.
> >
> > What you are looking for is actually one of the Infix Suggesters.
> > For example the AnalyzingInfixLookup approach.
> >
> > When working with Suggesters is important first to make a distinction :
> >
> > 1) Returning the full content of the field ( analysisInfix or Fuzzy)
> >
> > 2) Returning token(s) ( Free Text Suggester)
> >
> > Then the second difference is :
> >
> > 1) Infix suggestions ( from the "middle" of the field content)
> > 2) Classic suggester ( from the beginning of the field content)
> >
> > Clarified that, will be quite simple to work with suggesters.
> >
> > Cheers
> >
> > 2015-06-15 9:28 GMT+01:00 Zheng Lin Edwin Yeo :
> >
> > > I've indexed a rich-text documents with the following content:
> > >
> > > This is a testing rich text documents to test the uploading of files to
> > > Solr
> > >
> > >
> > > When I tried to use the suggestion, it return me the entire field in
> the
> > > content once I enter suggest?q=t. However, when I tried to search for
> > > q='rich', I don't get any results returned.
> > >
> > > This is my current configuration for the suggester:
> > > 
> > >   
> > > mySuggester
> > > FuzzyLookupFactory
> > > DocumentDictionaryFactory
> > > Suggestion
> > > suggestType
> > > true
> > > false
> > >   
> > > 
> > >
> > >  > startup="lazy" >
> > >   
> > > json
> > > true
> > >
> > > true
> > > 10
> > > mySuggester
> > >   
> > >   
> > > suggest
> > >   
> > > 
> > >
> > > Is it possible to allow the suggester to return something even from the
> > > middle of the sentence, and also not to return the entire sentence if
> the
> > > sentence. Perhaps it should just suggest the next 2 or 3 fields, and to
> > > return more fields as the users type.
> > >
> > > For example,
> > > When user type 'this', it should return 'This is a testing'
> > > When user type 'this is a testing', it should return 'This is a testing
> > > rich text documents'.
> > >
> > >
> > > Regards,
> > > Edwin
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Do we need to add docValues="true" to "_version_" field in schema.xml?

2015-06-16 Thread forest_soup
For the "_version_" field in the schema.xml, do we need to set it be
docValues="true"?
   

As we noticed there are FieldCache for "_version_" in the solr stats:
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Do-we-need-to-add-docValues-true-to-version-field-in-schema-xml-tp4212123.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Raw lucene query for a given solr query

2015-06-16 Thread Tomoko Uchida
Hi,

You can get raw query (and other debug information) with debug=true
paramter.

Regards,
Tomoko

2015-06-16 8:10 GMT+09:00 KNitin :

> Hi,
>
>  We have a few custom solrcloud components that act as value sources inside
> solrcloud for boosting items in the index.  I want to get the final raw
> lucene query used by solr for querying the index (for debugging purposes).
>
> Is it possible to get that information?
>
> Kindly advise
>
> Thanks,
> Nitin
>