Re: Suggestions about writing / extending QueryParsers
Hi Tim, 2014-03-07 15:20 GMT+01:00 Allison, Timothy B. talli...@mitre.org: Tommaso, Ah, now I see. If you want to add new operators, you'll have to modify the javacc files. For the SpanQueryParser, I added a handful of new operators and chose to go with regexes instead of javacc...not sure that was the right decision, but given my lack of knowledge of javacc, it was expedient. If you have time or already know javacc, it shouldn't be difficult. thanks, I've used javacc in the past, but I'm definitely not experienced with it, I'll see what fits best. As for nobrainer on the Solr side, y, it shouldn't be a problem. However, as of now the basic queryparser is a copy and paste job between Lucene and Solr, so you'll just have to redo your code in Solrunless you do something smarter. uh ok, that seems to be something to fix though, don't know if there're specific reasons for copy pasting instead of reusing... If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd consider adding this functionality into the SpanQueryParser as a later step. cool, thanks Tim, that'd be really nice. Thanks, Tommaso Cheers, Tim *From:* Tommaso Teofili [mailto:tommaso.teof...@gmail.com] *Sent:* Friday, March 07, 2014 3:17 AM *To:* dev@lucene.apache.org *Subject:* Re: Suggestions about writing / extending QueryParsers Thanks Tim and Upayavira for your replies. I still need to decide what the final syntax could be, however generally speaking the ideal would be that I am able to extend the current Lucene syntax with a new expression which will trigger the creation of a more like this query with something like +title:foo +text for similar docs%2 where the phrase between quotes will generate a MoreLikeThisQuery on that text if it's followed by the % character (and the number 2 may control the MLT configuration, e.g. min document freq == min term freq = 2), similarly to what it's done for proximity search (not sure about using %, it's just a syntax example). I guess then I'd need to extend the classic query parser, as per Tim's suggestions and I'd assume that if this goes into the classic qp it should be a no brainer on the Solr side. Does it sound correct / feasible? Regards, Tommaso 2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.uk: Tommaso, Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together. Upayavira On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote: Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
Re: Suggestions about writing / extending QueryParsers
thanks Jack for the reference, I didn't know it. Regards, Tommaso 2014-03-08 1:25 GMT+01:00 Jack Krupansky j...@basetechnology.com: For reference, the LucidWorks Search query parser has two MLT features: 1. Like terms - does MLT on a list of terms. For example: like:(Four score and seven years ago our fathers brought forth) See: http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option This is effectively an OR operator on the terms. 2. Like document - does MLT on a Solr document, given it's id: For example: Washington like:http://cnn.com; -New York See: http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option -- Jack Krupansky *From:* Tommaso Teofili tommaso.teof...@gmail.com *Sent:* Thursday, March 6, 2014 6:23 AM *To:* dev@lucene.apache.org *Subject:* Suggestions about writing / extending QueryParsers Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
Re: Suggestions about writing / extending QueryParsers
Thanks Tim and Upayavira for your replies. I still need to decide what the final syntax could be, however generally speaking the ideal would be that I am able to extend the current Lucene syntax with a new expression which will trigger the creation of a more like this query with something like +title:foo +text for similar docs%2 where the phrase between quotes will generate a MoreLikeThisQuery on that text if it's followed by the % character (and the number 2 may control the MLT configuration, e.g. min document freq == min term freq = 2), similarly to what it's done for proximity search (not sure about using %, it's just a syntax example). I guess then I'd need to extend the classic query parser, as per Tim's suggestions and I'd assume that if this goes into the classic qp it should be a no brainer on the Solr side. Does it sound correct / feasible? Regards, Tommaso 2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.uk: Tommaso, Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together. Upayavira On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote: Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
RE: Suggestions about writing / extending QueryParsers
Tommaso, Ah, now I see. If you want to add new operators, you'll have to modify the javacc files. For the SpanQueryParser, I added a handful of new operators and chose to go with regexes instead of javacc...not sure that was the right decision, but given my lack of knowledge of javacc, it was expedient. If you have time or already know javacc, it shouldn't be difficult. As for nobrainer on the Solr side, y, it shouldn't be a problem. However, as of now the basic queryparser is a copy and paste job between Lucene and Solr, so you'll just have to redo your code in Solrunless you do something smarter. If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd consider adding this functionality into the SpanQueryParser as a later step. Cheers, Tim From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Friday, March 07, 2014 3:17 AM To: dev@lucene.apache.org Subject: Re: Suggestions about writing / extending QueryParsers Thanks Tim and Upayavira for your replies. I still need to decide what the final syntax could be, however generally speaking the ideal would be that I am able to extend the current Lucene syntax with a new expression which will trigger the creation of a more like this query with something like +title:foo +text for similar docs%2 where the phrase between quotes will generate a MoreLikeThisQuery on that text if it's followed by the % character (and the number 2 may control the MLT configuration, e.g. min document freq == min term freq = 2), similarly to what it's done for proximity search (not sure about using %, it's just a syntax example). I guess then I'd need to extend the classic query parser, as per Tim's suggestions and I'd assume that if this goes into the classic qp it should be a no brainer on the Solr side. Does it sound correct / feasible? Regards, Tommaso 2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.ukmailto:u...@odoko.co.uk: Tommaso, Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together. Upayavira On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote: Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
Re: Suggestions about writing / extending QueryParsers
For reference, the LucidWorks Search query parser has two MLT features: 1. Like terms – does MLT on a list of terms. For example: like:(Four score and seven years ago our fathers brought forth) See: http://docs.lucidworks.com/display/lweug/Like+Term+Keyword+Option This is effectively an OR operator on the terms. 2. Like document – does MLT on a Solr document, given it’s id: For example: Washington like:http://cnn.com; -New York See: http://docs.lucidworks.com/display/lweug/Like+Document+Term+Keyword+Option -- Jack Krupansky From: Tommaso Teofili Sent: Thursday, March 6, 2014 6:23 AM To: dev@lucene.apache.org Subject: Suggestions about writing / extending QueryParsers Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
Suggestions about writing / extending QueryParsers
Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
RE: Suggestions about writing / extending QueryParsers
Hi Tommaso, It will depend on how different your target syntax will be. If you extend the classic parser (or, QueryParserBase), there is a fair amount of overhead and extras that you might not want or need. On the other hand, the query syntax and the methods will be familiar to the Lucene community, and there is a large number of test cases already built for you. On the third hand, if you need not modify the low level parsing stuff, you'll have to be familiar with javacc. There's the flexible family that should allow for easy modifications, and the xml family could offer an easy interface between a custom lexer and a parser. The SimpleQueryParser offers a model of building something fairly simple and yet very elegant from scratch. In deciding where to start, another consideration might include how easy it will be to integrate at the Solr level. Make sure to include field-based hooks for processing multiterms, prefix and range queries. For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to override a fair amount of code because every terminal had to be a SpanQuery - most of the queryparser infrastructure is built for traditional queries. So, what features do you want to add for mlt? What capabilities do you need? Cheers, Tim From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Thursday, March 06, 2014 6:23 AM To: dev@lucene.apache.org Subject: Suggestions about writing / extending QueryParsers Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
Re: Suggestions about writing / extending QueryParsers
Tommaso, Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together. Upayavira On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote: Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso