Re: Suggestions about writing / extending QueryParsers

2014-03-09 Thread Tommaso Teofili
Hi Tim,

2014-03-07 15:20 GMT+01:00 Allison, Timothy B. talli...@mitre.org:

  Tommaso,

   Ah, now I see.  If you want to add new operators, you'll have to modify
 the javacc files.  For the SpanQueryParser, I added a handful of new
 operators and chose to go with regexes instead of javacc...not sure that was
 the right decision, but given my lack of knowledge of javacc, it was
 expedient.  If you have time or already know javacc, it shouldn't be
 difficult.


thanks, I've used javacc in the past, but I'm definitely not experienced
with it, I'll see what fits best.


As for nobrainer on the Solr side, y, it shouldn't be a problem.
 However, as of now the basic queryparser is a copy and paste job between
 Lucene and Solr, so you'll just have to redo your code in Solrunless you
 do something smarter.


uh ok, that seems to be something to fix though, don't know if there're
specific reasons for copy pasting instead of reusing...


If you'd be willing to wait for LUCENE-5205 to be brought into Lucene,
 I'd consider adding this functionality into the SpanQueryParser as a later
 step.


cool, thanks Tim, that'd be really nice.
Thanks,
Tommaso




   Cheers,



  Tim



 *From:* Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
 *Sent:* Friday, March 07, 2014 3:17 AM
 *To:* dev@lucene.apache.org
 *Subject:* Re: Suggestions about writing / extending QueryParsers



 Thanks Tim and Upayavira for your replies.



 I still need to decide what the final syntax could be, however generally
 speaking the ideal would be that I am able to extend the current Lucene
 syntax with a new expression which will trigger the creation of a more like
 this query with something like +title:foo +text for similar docs%2 where
 the phrase between quotes will generate a MoreLikeThisQuery on that text if
 it's followed by the % character (and the number 2 may control the MLT
 configuration, e.g. min document freq == min term freq = 2), similarly to
 what it's done for proximity search (not sure about using %, it's just a
 syntax example).

 I guess then I'd need to extend the classic query parser, as per Tim's
 suggestions and I'd assume that if this goes into the classic qp it should
 be a no brainer on the Solr side.

 Does it sound correct / feasible?



 Regards,

 Tommaso

 2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.uk:

 Tommaso,



 Do say more about what you're thinking of. I'm currently getting my dev
 environment up to look into enhancing the MoreLikeThisHandler to be able
 handle function query boosts. This should be eminently possible from my
 initial research. However, if you're thinking of something more powerful,
 perhaps we can work together.



 Upayavira





 On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:

  Hi all,



 I'm thinking about writing/extending a QueryParser for MLT queries; I've
 never really looked into that code too much, while I'm doing that now, I'm
 wondering if anyone has suggestions on how to start with such a topic.

 Should I write a new grammar for that ? Or can I just extend an existing
 grammar / class?



 Thanks in advance,

 Tommaso





Re: Suggestions about writing / extending QueryParsers

2014-03-09 Thread Tommaso Teofili
thanks Jack for the reference, I didn't know it.
Regards,
Tommaso


2014-03-08 1:25 GMT+01:00 Jack Krupansky j...@basetechnology.com:

   For reference, the LucidWorks Search query parser has two MLT features:

 1. Like terms - does MLT on a list of terms.

 For example:

 like:(Four score and seven years ago our fathers brought forth)

 See:
 http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option

 This is effectively an OR operator on the terms.

 2. Like document - does MLT on a Solr document, given it's id:

 For example:

 Washington like:http://cnn.com; -New York

 See:
 http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option

 -- Jack Krupansky

  *From:* Tommaso Teofili tommaso.teof...@gmail.com
 *Sent:* Thursday, March 6, 2014 6:23 AM
 *To:* dev@lucene.apache.org
 *Subject:* Suggestions about writing / extending QueryParsers

  Hi all,

 I'm thinking about writing/extending a QueryParser for MLT queries; I've
 never really looked into that code too much, while I'm doing that now, I'm
 wondering if anyone has suggestions on how to start with such a topic.
 Should I write a new grammar for that ? Or can I just extend an existing
 grammar / class?

 Thanks in advance,
 Tommaso



Re: Suggestions about writing / extending QueryParsers

2014-03-07 Thread Tommaso Teofili
Thanks Tim and Upayavira for your replies.

I still need to decide what the final syntax could be, however generally
speaking the ideal would be that I am able to extend the current Lucene
syntax with a new expression which will trigger the creation of a more like
this query with something like +title:foo +text for similar docs%2 where
the phrase between quotes will generate a MoreLikeThisQuery on that text if
it's followed by the % character (and the number 2 may control the MLT
configuration, e.g. min document freq == min term freq = 2), similarly to
what it's done for proximity search (not sure about using %, it's just a
syntax example).
I guess then I'd need to extend the classic query parser, as per Tim's
suggestions and I'd assume that if this goes into the classic qp it should
be a no brainer on the Solr side.
Does it sound correct / feasible?

Regards,
Tommaso

2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.uk:

  Tommaso,

 Do say more about what you're thinking of. I'm currently getting my dev
 environment up to look into enhancing the MoreLikeThisHandler to be able
 handle function query boosts. This should be eminently possible from my
 initial research. However, if you're thinking of something more powerful,
 perhaps we can work together.

 Upayavira


 On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:

 Hi all,

 I'm thinking about writing/extending a QueryParser for MLT queries; I've
 never really looked into that code too much, while I'm doing that now, I'm
 wondering if anyone has suggestions on how to start with such a topic.
  Should I write a new grammar for that ? Or can I just extend an existing
 grammar / class?

 Thanks in advance,
 Tommaso




RE: Suggestions about writing / extending QueryParsers

2014-03-07 Thread Allison, Timothy B.
Tommaso,
  Ah, now I see.  If you want to add new operators, you'll have to modify the 
javacc files.  For the SpanQueryParser, I added a handful of new operators and 
chose to go with regexes instead of javacc...not sure that was the right 
decision, but given my lack of knowledge of javacc, it was expedient.  If you 
have time or already know javacc, it shouldn't be difficult.
  As for nobrainer on the Solr side, y, it shouldn't be a problem.  However, as 
of now the basic queryparser is a copy and paste job between Lucene and Solr, 
so you'll just have to redo your code in Solrunless you do something 
smarter.
  If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd 
consider adding this functionality into the SpanQueryParser as a later step.

  Cheers,

 Tim

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Friday, March 07, 2014 3:17 AM
To: dev@lucene.apache.org
Subject: Re: Suggestions about writing / extending QueryParsers

Thanks Tim and Upayavira for your replies.

I still need to decide what the final syntax could be, however generally 
speaking the ideal would be that I am able to extend the current Lucene syntax 
with a new expression which will trigger the creation of a more like this query 
with something like +title:foo +text for similar docs%2 where the phrase 
between quotes will generate a MoreLikeThisQuery on that text if it's followed 
by the % character (and the number 2 may control the MLT configuration, e.g. 
min document freq == min term freq = 2), similarly to what it's done for 
proximity search (not sure about using %, it's just a syntax example).
I guess then I'd need to extend the classic query parser, as per Tim's 
suggestions and I'd assume that if this goes into the classic qp it should be a 
no brainer on the Solr side.
Does it sound correct / feasible?

Regards,
Tommaso
2014-03-06 15:08 GMT+01:00 Upayavira 
u...@odoko.co.ukmailto:u...@odoko.co.uk:
Tommaso,

Do say more about what you're thinking of. I'm currently getting my dev 
environment up to look into enhancing the MoreLikeThisHandler to be able handle 
function query boosts. This should be eminently possible from my initial 
research. However, if you're thinking of something more powerful, perhaps we 
can work together.

Upayavira


On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:
Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso



Re: Suggestions about writing / extending QueryParsers

2014-03-07 Thread Jack Krupansky
For reference, the LucidWorks Search query parser has two MLT features:

1. Like terms – does MLT on a list of terms.

For example:

like:(Four score and seven years ago our fathers brought forth)

See:
http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option

This is effectively an OR operator on the terms.

2. Like document – does MLT on a Solr document, given it’s id:

For example:

Washington like:http://cnn.com; -New York

See:
http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option

-- Jack Krupansky

From: Tommaso Teofili 
Sent: Thursday, March 6, 2014 6:23 AM
To: dev@lucene.apache.org 
Subject: Suggestions about writing / extending QueryParsers

Hi all, 

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso

Suggestions about writing / extending QueryParsers

2014-03-06 Thread Tommaso Teofili
Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've
never really looked into that code too much, while I'm doing that now, I'm
wondering if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing
grammar / class?

Thanks in advance,
Tommaso


RE: Suggestions about writing / extending QueryParsers

2014-03-06 Thread Allison, Timothy B.
Hi Tommaso,

  It will depend on how different your target syntax will be.  If you extend 
the classic parser (or, QueryParserBase), there is a fair amount of overhead 
and extras that you might not want or need.  On the other hand, the query 
syntax and the methods will be familiar to the Lucene community, and there is a 
large number of test cases already built for you.  On the third hand, if you 
need not modify the low level parsing stuff, you'll have to be familiar with 
javacc.

  There's the flexible family that should allow for easy modifications, and 
the xml family could offer an easy interface between a custom lexer and a 
parser.   The SimpleQueryParser offers a model of building something fairly 
simple and yet very elegant from scratch.

  In deciding where to start, another consideration might include how easy it 
will be to integrate at the Solr level.  Make sure to include field-based hooks 
for processing multiterms, prefix and range queries.

  For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to 
override  a fair amount of code because every terminal had to be a SpanQuery - 
most of the queryparser infrastructure is built for traditional queries.

  So, what features do you want to add for mlt?  What capabilities do you need?

  Cheers,

  Tim



From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Thursday, March 06, 2014 6:23 AM
To: dev@lucene.apache.org
Subject: Suggestions about writing / extending QueryParsers

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso


Re: Suggestions about writing / extending QueryParsers

2014-03-06 Thread Upayavira
Tommaso,



Do say more about what you're thinking of. I'm currently getting my dev
environment up to look into enhancing the MoreLikeThisHandler to be
able handle function query boosts. This should be eminently possible
from my initial research. However, if you're thinking of something more
powerful, perhaps we can work together.



Upayavira





On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries;
I've never really looked into that code too much, while I'm doing that
now, I'm wondering if anyone has suggestions on how to start with such
a topic.
Should I write a new grammar for that ? Or can I just extend an
existing grammar / class?

Thanks in advance,
Tommaso