Read payload in Solr

2016-02-24 Thread Andrea Roggerone
Hi all,
I am indexing the payload in Solr as advised in
https://lucidworks.com/blog/2014/06/13/end-to-end-payload-example-in-solr/
and I am also able to search for it.
What I want to do now is getting the payload within my Solr custom function
to do some calculation however I can see just methods to get the
FieldValuesthat obviously I want to avoid since I have the payload and
reading from a postinglist is better performance wise.
Can you guys please point me to a resource that explains how to read the
payload or sharing with me a code snippet? Thanks!!!


Are fieldCache and/or DocValues used by Function Queries

2016-02-11 Thread Andrea Roggerone
Hi,
I need to evaluate different boost solutions performance and I can't find
any relevant documentation about it. Are fieldCache and/or DocValues used
by Function Queries?


Does bf for eDismax use DocValue or FieldCache?

2016-02-10 Thread Andrea Roggerone
Hi,
I need to boost documents at runtime according to a set of roles and
related ids. For instance I would have the fields:
ceo:1234-abcd-5678-poiu
tl:-abcd-5678-abc

and a set of boosts to apply a runtime, for instance
ceo = 10
tl = 5

I don't want to do any complex operation with the weights and I am happy of
boosting by the value of the most relevant role, in the previous case would
be ceo.
Since I use eDismax parser, the syntax I'd like to use would be:

*bf*=if(*termfreq*
(ceo,"85a09bd5-2ff2-464c-9bc5-33a38a7f1234"),3,if(termfreq(tl,"85a09bd5-2ff2-464c-9bc5-33a38a7123456"),2,1))


however I am worried about performance.
My questions are:
- In the bf parameter are FieldCache and DocValues used?
- Is termfreq calculated all the time or we simply read the existing value?
- increasing the number of clauses (for instance adding more nested if)
what kind of impact would have on my performance?
- My alternative would be to use the payload. Is that a better option and
why?

Thanks!!


Boost query at search time according set of roles with least performance impact

2015-11-06 Thread Andrea Roggerone
Hi all,
I am working on a mechanism that applies additional boosts to documents
according to the role covered by the author. For instance we have

CEO|5 Architect|3 Developer|1 TeamLeader|2

keeping in mind that an author could cover multiple roles (e.g. for a
design document, a Team Leader could be also a Developer).

I am aware that is possible to implement a function that leverages
payloads, however the weights need to be configurable so I can't store the
payload at index time.
Passing all the weights at query time is not an option as we have more than
20 roles and query readability and performance would be heavily affected.

Do we have any "out of the box mechanism" in Solr to implement the
described behavior? If not, what other options do we have?


Re: How to show some documents ahead of others

2015-10-08 Thread Andrea Roggerone
Hi guys,
I don't think that sorting is a good solution in this case as it doesn't
allow any meaningful customization.I believe that the advised
QueryElevationComponent is one of the viable alternative. Another one would
be to boost at query time a particular field, like for instance paid. That
would allow you to assign different boosts to different values using a
function.

On Thu, Oct 8, 2015 at 1:48 PM, Upayavira  wrote:

> Or just have a field in your index -
>
> paid: true/false
>
> Then sort=paid desc, score desc
>
> (you may need to sort paid asc, not sure which way a boolean would sort)
>
> Question is whether you want to show ALL paid posts, or just a set of
> them. For the latter you could use result grouping on the paid field.
>
> Upayavira
>
> On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> > Hi Christian,
> >
> > You can take a look at Solr's  QueryElevationComponent
> >   .
> >
> > It will allow you to configure the top results for a given query
> > regardless
> > of the normal lucene scoring. Also you can specify exclude document list
> > to
> > exclude certain results for perticular query.
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to show some documents ahead of others

2015-10-08 Thread Andrea Roggerone
Sure. Let's say that as Upayavira was saying you have in your index:

"paid: true/false
Then sort=paid desc, score desc"

In that case, paid=true and higher score would come up first.
After that you decide that you want to add a set of offers:
Offer 1: cost 1000 euros
Offer 2: cost 100 euros
Offer 3: cost 10 euros
and you expect that user1 (that pays more) appears before user 2 and 3. In
such case the field true/false won't be enough as you don't have any way to
sort user to have offer1 before offer2.

Let's say for sake of conversation that you decide to replace "paid" with a
numeric value paid=1 or 2 or 3. This solution would work better until you
decide to improve relevancy...at that point your new solution wouldn't suit
you anymore.
So "as it doesn't allow any meaningful customization " meant that such
solution is too rigid. Hope it makes sense.



On Thu, Oct 8, 2015 at 3:39 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Is it possible to understand better this : "as it doesn't
> allow any meaningful customization " ?
>
> Cheers
>
> On 8 October 2015 at 15:27, Andrea Roggerone <
> andrearoggerone.o...@gmail.com
> > wrote:
>
> > Hi guys,
> > I don't think that sorting is a good solution in this case as it doesn't
> > allow any meaningful customization.I believe that the advised
> > QueryElevationComponent is one of the viable alternative. Another one
> would
> > be to boost at query time a particular field, like for instance paid.
> That
> > would allow you to assign different boosts to different values using a
> > function.
> >
> > On Thu, Oct 8, 2015 at 1:48 PM, Upayavira <u...@odoko.co.uk> wrote:
> >
> > > Or just have a field in your index -
> > >
> > > paid: true/false
> > >
> > > Then sort=paid desc, score desc
> > >
> > > (you may need to sort paid asc, not sure which way a boolean would
> sort)
> > >
> > > Question is whether you want to show ALL paid posts, or just a set of
> > > them. For the latter you could use result grouping on the paid field.
> > >
> > > Upayavira
> > >
> > > On Thu, Oct 8, 2015, at 01:34 PM, NutchDev wrote:
> > > > Hi Christian,
> > > >
> > > > You can take a look at Solr's  QueryElevationComponent
> > > > <https://wiki.apache.org/solr/QueryElevationComponent>  .
> > > >
> > > > It will allow you to configure the top results for a given query
> > > > regardless
> > > > of the normal lucene scoring. Also you can specify exclude document
> > list
> > > > to
> > > > exclude certain results for perticular query.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/How-to-show-some-documents-ahead-of-others-tp4233481p4233490.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Reverse query?

2015-10-02 Thread Andrea Roggerone
Hi Remy,
The question is not really clear, could you explain a little bit better
what you need? Reading your email I understand that you want to get
documents containing all the search terms typed. For instance if you search
for "Mad Max", you wanna get documents containing both Mad and Max. If
that's your need, you can use a phrase query like:

*"*Mad Max*"~2*

where enclosing your keywords between double quotes means that you want to
get both Mad and Max and the optional parameter ~2 is an example of *slop*.
If you need more info you can look for *Phrase Query* in
https://wiki.apache.org/solr/SolrRelevancyFAQ

On Fri, Oct 2, 2015 at 2:33 PM, remi tassing  wrote:

> Hi,
> I have medium-low experience on Solr and I have a question I couldn't quite
> solve yet.
>
> Typically we have quite short query strings (a couple of words) and the
> search is done through a set of bigger documents. What if the logic is
> turned a little bit around. I have a document and I need to find out what
> strings appear in the document. A string here could be a person name
> (including space for example) or a location...which are indexed in Solr.
>
> A concrete example, we take this text from wikipedia (Mad Max):
> "*Mad Max is a 1979 Australian dystopian action film directed by George
> Miller .
> Written by Miller and James McCausland from a story by Miller and producer
> Byron Kennedy , it tells a
> story of societal breakdown
> , murder, and vengeance
> . The film, starring the
> then-little-known Mel Gibson ,
> was released internationally in 1980. It became a top-grossing Australian
> film, while holding the record in the Guinness Book of Records
>  for decades as
> the
> most profitable film ever created,[1]
>  and
> has
> been credited for further opening the global market to Australian New Wave
>  films.*
> 
> "
>
> I would like it to match "Mad Max" but not "Mad" or "Max" seperately, and
> "George Miller", "global market" ...
>
> I've tried the keywordTokenizer but it didn't work. I suppose it's ok for
> the index time but not query time (in this specific case)
>
> I had a look at Luwak but it's not what I'm looking for (
>
> http://www.flax.co.uk/blog/2013/12/06/introducing-luwak-a-library-for-high-performance-stored-queries/
> )
>
> The typical name search doesn't seem to work either,
> https://dzone.com/articles/tips-name-search-solr
>
> I was thinking this problem must have already be solved...or?
>
> Remi
>


Re: Reverse query?

2015-10-02 Thread Andrea Roggerone
Hi, the phrase query format would be:
"Mad Max"~2
The * has been added by the mail aggregator around the chars in Bold for
some reason. That wasn't a wildcard.

On Friday, October 2, 2015, Roman Chyla <roman.ch...@gmail.com> wrote:

> I'd like to offer another option:
>
> you say you want to match long query into a document - but maybe you
> won't know whether to pick "Mad Max" or "Max is" (not mentioning the
> performance hit of "*mad max*" search - or is it not the case
> anymore?). Take a look at the NGram tokenizer (say size of 2; or
> bigger). What it does, it splits the input into overlapping segments
> of 'X' words (words, not characters - however, characters work too -
> just pick bigger N)
>
> mad max
> max 1979
> 1979 australian
>
> i'd recommend placing stopfilter before the ngram
>
>  - then for the long query string of "Hey Mad Max is 1979" you
> wold search "hey mad" OR "mad max" OR "max 1979"... (perhaps the query
> tokenizer could be convinced to the search for you automatically). And
> voila, the more overlapping segments there, the higher the search
> result.
>
> hth,
>
> roman
>
>
>
> On Fri, Oct 2, 2015 at 12:03 PM, Erick Erickson <erickerick...@gmail.com
> <javascript:;>> wrote:
> > The admin/analysis page is your friend here, find it and use it ;)
> > Note you have to select a core on the admin UI screen before you can
> > see the choice.
> >
> > Because apart from the other comments, KeywordTokenizer is a red flag.
> > It does NOT break anything up into tokens, so if your doc contains:
> > Mad Max is a 1979 Australian
> > as the whole field, the _only_ match you'll ever get is if you search
> exactly
> > "Mad Max is a 1979 Australian"
> > Not Mad, not mad, not Max, exactly all 6 words separated by exactly one
> space.
> >
> > Andrea's suggestion is the one you want, but be sure you use one of
> > the tokenizing analysis chains, perhaps start with text_en (in the
> > stock distro). Be sure to completely remove your node/data directory
> > (as in rm -rf data) after you make the change.
> >
> > And really, explore the admin/analysis page; it's where a LOT of these
> > kinds of problems find solutions ;)
> >
> > Best,
> > Erick
> >
> > On Fri, Oct 2, 2015 at 7:57 AM, Ravi Solr <ravis...@gmail.com
> <javascript:;>> wrote:
> >> Hello Remi,
> >> Iam assuming the field where you store the data is analyzed.
> >> The field definition might help us answer your question better. If you
> are
> >> using edismax handler for your search requests, I believe you can
> achieve
> >> you goal by setting set your "mm" to 100%, phrase slop "ps" and query
> slop
> >> "qs" parameters to zero. I think that will force exact matches.
> >>
> >> Thanks
> >>
> >> Ravi Kiran Bhaskar
> >>
> >> On Fri, Oct 2, 2015 at 9:48 AM, Andrea Roggerone <
> >> andrearoggerone.o...@gmail.com <javascript:;>> wrote:
> >>
> >>> Hi Remy,
> >>> The question is not really clear, could you explain a little bit better
> >>> what you need? Reading your email I understand that you want to get
> >>> documents containing all the search terms typed. For instance if you
> search
> >>> for "Mad Max", you wanna get documents containing both Mad and Max. If
> >>> that's your need, you can use a phrase query like:
> >>>
> >>> *"*Mad Max*"~2*
> >>>
> >>> where enclosing your keywords between double quotes means that you
> want to
> >>> get both Mad and Max and the optional parameter ~2 is an example of
> *slop*.
> >>> If you need more info you can look for *Phrase Query* in
> >>> https://wiki.apache.org/solr/SolrRelevancyFAQ
> >>>
> >>> On Fri, Oct 2, 2015 at 2:33 PM, remi tassing <tassingr...@gmail.com
> <javascript:;>>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> > I have medium-low experience on Solr and I have a question I couldn't
> >>> quite
> >>> > solve yet.
> >>> >
> >>> > Typically we have quite short query strings (a couple of words) and
> the
> >>> > search is done through a set of bigger documents. What if the logic
> is
> >>> > turned a little bit around. I have a document and I need to find out
> what
> >>> &g