RE: 6.6.2 Release
Sounds good. Thank you! From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com] Sent: Friday, October 13, 2017 5:25 PM To: dev@lucene.apache.org Subject: Re: 6.6.2 Release > Any chance we could get SOLR-11450 in? I understand if the answer is no. Currently, I want to have this release out as soon as possible so as to mitigate the risk exposure of the security vulnerability. Since this is not committed yet, I'd vote for leaving this out and possibly having it included in a later release, if needed. +1 to SOLR-11297. On Sat, Oct 14, 2017 at 2:32 AM, David Smiley <david.w.smi...@gmail.com<mailto:david.w.smi...@gmail.com>> wrote: Suggested criteria for bug-fix release issues: * fixes a bug :-) and doesn't harm backwards-compatibility in the process * helps users upgrade to later versions * documentation +1 to SOLR-11297 I'm not sure on SOLR-11450. Seems it might introduce a back-compat issue? On Fri, Oct 13, 2017 at 4:40 PM Erick Erickson <erickerick...@gmail.com<mailto:erickerick...@gmail.com>> wrote: I'd also like to get SOLR-11297 in if there are no objections. Ditto if the answer is no It's quite a safe fix though. On Fri, Oct 13, 2017 at 1:26 PM, Allison, Timothy B. <talli...@mitre.org<mailto:talli...@mitre.org>> wrote: Any chance we could get SOLR-11450 in? I understand if the answer is no. Thank you! From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com<mailto:ichattopadhy...@gmail.com>] Sent: Friday, October 13, 2017 4:23 PM To: dev@lucene.apache.org<mailto:dev@lucene.apache.org> Subject: 6.6.2 Release Hi, In light of [0], we need a 6.6.2 release as soon as possible. I'd like to volunteer to RM for this release, unless someone else wants to do so or has an objection. Regards, Ishan [0] - https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list -- Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
RE: 6.6.2 Release
Any chance we could get SOLR-11450 in? I understand if the answer is no. Thank you! From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com] Sent: Friday, October 13, 2017 4:23 PM To: dev@lucene.apache.org Subject: 6.6.2 Release Hi, In light of [0], we need a 6.6.2 release as soon as possible. I'd like to volunteer to RM for this release, unless someone else wants to do so or has an objection. Regards, Ishan [0] - https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list
RE: GSOC2017: Call to Solr and Tika/Nutch/Camel/NiFi/Zeppelin/etc mentors
Alex, I'm more than happy to chip in on the Tika side. Thank you for leading this effort. Cheers, Tim -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Sunday, March 26, 2017 9:09 PM To: dev@lucene.apache.org Subject: Re: GSOC2017: Call to Solr and Tika/Nutch/Camel/NiFi/Zeppelin/etc mentors Sounds good. Let's see what happens. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 26 March 2017 at 07:27, Dmitry Kanwrote: > Hi Alexandre, > > Forwarded your call to luke's google group: > https://groups.google.com/forum/#!topic/luke-discuss/rmZo7R3gDdc > > There might be a potential for solr/lucene/luke projects, like adding > capability to open solr/lucene index in luke from a remote server: > https://github.com/DmitryKey/luke/issues/68 > > Good luck with SOC! > > Regards, > Dmitry > -- > Dmitry Kan > Luke Toolbox: http://github.com/DmitryKey/luke > Blog: http://dmitrykan.blogspot.com > Twitter: http://twitter.com/dmitrykan > > On 17 March 2017 at 16:25, Alexandre Rafalovitch wrote: >> >> I am mentoring in this year's Google Summer of Code (first timer!). I >> know there is a couple of us from Solr project, but I also noticed >> some mentors from upstream/downstream projects. >> >> I am proposing that we put at least a couple of integration projects >> to improve/upgrade Solr integration with both upstream (Tika) and >> downstream (Nutch/Camel/NiFi) projects. And maybe even propose some >> new projects, such as Solr backend engine for Zeppelin (so we could >> have a Python Notebook-like interface to Solr, not just via JDBC >> bridge, we do already). >> >> I am not sure whose JIRAs they should go into, but the project idea >> tag spans all ASF projects, so it is more important to figure out >> mentor-level agreement first. >> >> In any case, to push this forward, if we get several students all >> working on Solr, I could run a Solr bootcamp class for students, >> mentors, and whoever else in the sister communities wants to >> participate and get more familiar with Solr. We could also run a >> parallel mini-list (or Gitter room or whatever) where multiple new >> implementors of Solr integrations can hang out together and progress >> together. >> >> >> Regards, >>Alex. >> P.s. I am also working on redoing Solr examples (starting from DIH >> ones at: SOLR-10311). If anybody has comments on what kind of >> examples would make integrations easier, I am very receptive. >> P.p.s. Feel free to forward this to the other mailing lists for other >> relevant sister Apache communities. >> >> http://www.solr-start.com/ - Resources for Solr users, new and >> experienced >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For >> additional commands, e-mail: dev-h...@lucene.apache.org >> > > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Apache Tika's public regression corpus
All, I recently blogged about some of the work we're doing with a large scale regression corpus to make Tika, POI and PDFBox more robust and to identify regressions before release. If you'd like to chip in with recommendations, requests or Hadoop/Spark clusters (why not shoot for the stars), please do! http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/ Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 for most of our files! Cheers, Tim - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [jira] [Commented] (SOLR-8017) solr.PointType can't deal with coordination in format like (0.9504547, 1.0, 1.0890503)
>> so that means that using tika metadata indexing with schemaless mode > is, well, useless ? Yes. >I know of nobody using "schemaless" for production for the simple >reason that >it makes the best guess it can based on the _first_ time it >sees a particular >field. There's absolutely no way to guarantee that that >doc is representative >of all docs. > And if you want to really get weird, some programs allow custom attributes. Agreed. It makes no sense to go schemaless with Tika's metadata. >In the Tika case you've also got the problem that there's no universal >metadata definition. What's "author" >in one type of doc might be "editor" in >another. Or "most_recent_edit" might be "last_edited" and even if >these are >dates the format won't necessarily be the same. We do try to normalize across file formats to Dublin Core when possible -- dc:creator, dc:created. We also try to normalize date formats for those metadata items that we know are dates (dc:created, etc.). If you find issues with normalization or can recommend areas for improvement, please do! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene ancient greek normalization
ICU looks promising: Μῆνιν ἄειδε, θεὰ, Πηληϊάδεω Ἀχιλλῆος - 1.μηνιν 2.αειδε 3.θεα 4.πηληιαδεω 5.αχιλληοσ -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, November 21, 2014 3:08 PM To: dev@lucene.apache.org Subject: Re: Lucene ancient greek normalization Are you sure that's not something that's already addressed by the ICU Filter? http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/icu/ICUTransformFilterFactory.html If you follow the links to what's possible, the page talks about Greek, though not ancient: http://userguide.icu-project.org/transforms/general#TOC-Greek There was also some discussion on: https://issues.apache.org/jira/browse/LUCENE-1343 Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 21 November 2014 14:14, paolo anghileri paolo.anghil...@codegeneration.it wrote: For development purposes I need the ability in lucene to normalize ancient greek characters for al the cases of grammatical details such as accents, diacritics and so on. My need is to retrieve ancient greek words with accents and other grammatical details by the input of the string without accents. For example the input of οργανον (organon) should to retrieve also Ὄργανον, I am not a lucene commiter and I a new to this so my question is about the best practice to implement this in Lucene, and possibile submit a commit proposal to Lucene A project management committee. I have made some searches and found this file in Lucene-soir: It contains normalization for some chars. My thought would be to add extra normalization here, including all unicode ancient greek chars with all grammatical details. I already have all the unicode values for that chars so It should not be difficult for me to include them If my understanding is correct, this should add to lucene the features described above. As I am new to this, my needs are: To be sure that this is the correct place in Lucene for doing normalization How to post commit proposal Any help appreciated Kind regards Paolo - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Bug in AnalyzingQueryParser Pattern
Thank you, Dennis: https://issues.apache.org/jira/i#browse/LUCENE-5839 From: Dennis Walter [mailto:dennis.wal...@gmail.com] Sent: Sunday, July 20, 2014 2:52 PM To: dev@lucene.apache.org Subject: Bug in AnalyzingQueryParser Pattern Hi there, While reading the source code of AnalyzingQueryParser to understand what it does, I think I found a bug in the regular expression used to detect wildcards. It is defined as // gobble escaped chars or find a wildcard character private final Pattern wildcardPattern = Pattern.compile((\\.)|([?*]+))file:///\\.)|([%3f*]+)%22); The first group will match a literal dot (.), while its intention seems to be to match a backslash and a single character. So the expression should instead be (.)|([?*]+)file:///\\\.)|([%3f*]+) Best Regards Dennis
ensuring codec can index offsets in test framework
This is similar to David Smiley's question on Feb 16th, but SuppressCodecs would be too broad of a solution, I think. I'm using LuceneTestCase's newIndexWriterConfig, and I have a test that requires IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS. The test passes quite often (famous last words), but I occasionally get an UnsupportedOperationException: this codec cannot index offsets. Is there a way to have LuceneTestCase randomly select a codec (with particular subcomponents/configurations) that supports indexing offsets? The codec that fails: codec=Lucene46: {f1:MockVariableIntBlock(baseBlockSize=71)}, docValues:{}, ... Most often, however Lucene46 does not fail.
RE: ensuring codec can index offsets in test framework
Perfect. As always, thank you. From: Robert Muir [rcm...@gmail.com] Sent: Wednesday, March 19, 2014 10:29 AM To: dev@lucene.apache.org Subject: Re: ensuring codec can index offsets in test framework for now you can use an assume, there is a helper in LuceneTestCase: String pf = TestUtil.getPostingsFormat(dummy); boolean supportsOffsets = !doesntSupportOffsets.contains(pf); another option is to suppress the codecs that don't support it (anything using Sep layout). This is annoying though, maybe we should remove Sep layout? Realistically it was the precursor to the block layout that Lucene41 introduced, which was a big change, but i am unsure if its really helping us anymore, because it just falls behind on things and i dont think has any interesting qualities for real use or that would be useful in testing, either.. On Wed, Mar 19, 2014 at 8:53 AM, Allison, Timothy B. talli...@mitre.org wrote: This is similar to David Smiley's question on Feb 16th, but SuppressCodecs would be too broad of a solution, I think. I'm using LuceneTestCase's newIndexWriterConfig, and I have a test that requires IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS. The test passes quite often (famous last words), but I occasionally get an UnsupportedOperationException: this codec cannot index offsets. Is there a way to have LuceneTestCase randomly select a codec (with particular subcomponents/configurations) that supports indexing offsets? The codec that fails: codec=Lucene46: {f1:MockVariableIntBlock(baseBlockSize=71)}, docValues:{}, ... Most often, however Lucene46 does not fail. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Suggestions about writing / extending QueryParsers
Tommaso, Ah, now I see. If you want to add new operators, you'll have to modify the javacc files. For the SpanQueryParser, I added a handful of new operators and chose to go with regexes instead of javacc...not sure that was the right decision, but given my lack of knowledge of javacc, it was expedient. If you have time or already know javacc, it shouldn't be difficult. As for nobrainer on the Solr side, y, it shouldn't be a problem. However, as of now the basic queryparser is a copy and paste job between Lucene and Solr, so you'll just have to redo your code in Solrunless you do something smarter. If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd consider adding this functionality into the SpanQueryParser as a later step. Cheers, Tim From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Friday, March 07, 2014 3:17 AM To: dev@lucene.apache.org Subject: Re: Suggestions about writing / extending QueryParsers Thanks Tim and Upayavira for your replies. I still need to decide what the final syntax could be, however generally speaking the ideal would be that I am able to extend the current Lucene syntax with a new expression which will trigger the creation of a more like this query with something like +title:foo +text for similar docs%2 where the phrase between quotes will generate a MoreLikeThisQuery on that text if it's followed by the % character (and the number 2 may control the MLT configuration, e.g. min document freq == min term freq = 2), similarly to what it's done for proximity search (not sure about using %, it's just a syntax example). I guess then I'd need to extend the classic query parser, as per Tim's suggestions and I'd assume that if this goes into the classic qp it should be a no brainer on the Solr side. Does it sound correct / feasible? Regards, Tommaso 2014-03-06 15:08 GMT+01:00 Upayavira u...@odoko.co.ukmailto:u...@odoko.co.uk: Tommaso, Do say more about what you're thinking of. I'm currently getting my dev environment up to look into enhancing the MoreLikeThisHandler to be able handle function query boosts. This should be eminently possible from my initial research. However, if you're thinking of something more powerful, perhaps we can work together. Upayavira On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote: Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
RE: Suggestions about writing / extending QueryParsers
Hi Tommaso, It will depend on how different your target syntax will be. If you extend the classic parser (or, QueryParserBase), there is a fair amount of overhead and extras that you might not want or need. On the other hand, the query syntax and the methods will be familiar to the Lucene community, and there is a large number of test cases already built for you. On the third hand, if you need not modify the low level parsing stuff, you'll have to be familiar with javacc. There's the flexible family that should allow for easy modifications, and the xml family could offer an easy interface between a custom lexer and a parser. The SimpleQueryParser offers a model of building something fairly simple and yet very elegant from scratch. In deciding where to start, another consideration might include how easy it will be to integrate at the Solr level. Make sure to include field-based hooks for processing multiterms, prefix and range queries. For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to override a fair amount of code because every terminal had to be a SpanQuery - most of the queryparser infrastructure is built for traditional queries. So, what features do you want to add for mlt? What capabilities do you need? Cheers, Tim From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] Sent: Thursday, March 06, 2014 6:23 AM To: dev@lucene.apache.org Subject: Suggestions about writing / extending QueryParsers Hi all, I'm thinking about writing/extending a QueryParser for MLT queries; I've never really looked into that code too much, while I'm doing that now, I'm wondering if anyone has suggestions on how to start with such a topic. Should I write a new grammar for that ? Or can I just extend an existing grammar / class? Thanks in advance, Tommaso
RE: Span Not Queries
Dotting i's and crossing t's on javadocs. Have to push eta to end of this week. From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com] Sent: Monday, January 27, 2014 4:31 AM To: dev@lucene.apache.org Subject: Re: Span Not Queries Hi, Any news on this? On Fri, Jan 17, 2014 at 1:54 AM, Gopal Agarwal gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com wrote: Sounds perfect. Hopefully one of the committer picks this up and adds this to 4.7. Will keep checking the updates... On Fri, Jan 17, 2014 at 1:17 AM, Allison, Timothy B. talli...@mitre.orgmailto:talli...@mitre.org wrote: And don't forget analysis! :) The code is non-trivial, and it will take a generous committer to help me get it into shape for committing. Once I push my mods to jira (end of next week), you should be able to compile it and run it at least for dev/testing to confirm that it meets your needs. From: Gopal Agarwal [mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com] Sent: Thursday, January 16, 2014 1:21 PM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: Re: Span Not Queries Thanks Tim. This exactly fits my requirements of recursion, SpanNot and ComplexParser combination with Boolean Parser. Since I would end up doing the exact same changes to my QueryParserBase class, I would be locked with the current version of SOLR for forseeable future. Can you comment on when is the possible release if it gets reviewed by next week? On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. talli...@mitre.orgmailto:talli...@mitre.org wrote: Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) might help. I'm hoping to post updates to both by the end of next week. Then, if a committer would be willing to review and add these to Lucene/Solr, you should be good to go. Take a look at the description for LUCENE-5205and see if that capability will meet your needs. Thank you. Best, Tim From: Gopal Agarwal [mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com] Sent: Thursday, January 16, 2014 4:10 AM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: Fwd: Span Not Queries Please help me out with earlier query. In short: 1. Can we change the QueryParser.jj file to identify the SpanNot query as a boolean clause? 2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries also? For further explanation, following are the examples. On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote: *I have a business use case in which i need to use Span Not and other ordered proximity queries . And they can be nested upto any level A Boolean inside a ordered query or ordered query inside a Boolean . Currently i am thinking of changing the QuerParser.jj file to identify the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing complex queries . Can you suggest better way of achieving this.* *Following are the list of additions that i need to do in SOLR.* *1. Span NOT Operator* . 2.Adding Recursive and Range Proximity *Recursive Proximity *is a proximity query within a proximity query Ex:income tax~5 statement ~4 The recursion can be up to any level. * Range Proximity*: Currently we can only define number as a range we want interval as a range . Ex: profit income~3,5, United America~-5,4 3. Complex Queries A complex query is a query formed with a combination of Boolean operators or proximity queries or range queries or any possible combination of these. Ex:(income AND tax) statement~4 income tax~4 (statement OR period) ~3 ( income SPAN NOT income tax ) source ~3,5 Can anyone suggest us some way of achieving these 3 functionalities in SOLR ??? On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.comwrote: Nope. But the LucidWorks Search product query parser does support SpanNot if you use their BEFORE, AFTER, and NEAR span operators. See: http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations For example: George BEFORE:2 Bush NOT H to match George anything Bush, but not George H. W. Bush. What is your specific use case? -- Jack Krupansky -Original Message- From: Ankit Kumar Sent: Tuesday, October 15, 2013 3:58 AM To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org Subject: Span Not Queries I need to add Span Not queries in solr . Ther's a parser Surround Query Parser i went through this ( http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-** parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html ) to discover that surround query parser does not analyze text Does DisMaxQueryParser supports SpanNot Queries ??
RE: Span Not Queries
Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) might help. I'm hoping to post updates to both by the end of next week. Then, if a committer would be willing to review and add these to Lucene/Solr, you should be good to go. Take a look at the description for LUCENE-5205and see if that capability will meet your needs. Thank you. Best, Tim From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com] Sent: Thursday, January 16, 2014 4:10 AM To: dev@lucene.apache.org Subject: Fwd: Span Not Queries Please help me out with earlier query. In short: 1. Can we change the QueryParser.jj file to identify the SpanNot query as a boolean clause? 2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries also? For further explanation, following are the examples. On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote: *I have a business use case in which i need to use Span Not and other ordered proximity queries . And they can be nested upto any level A Boolean inside a ordered query or ordered query inside a Boolean . Currently i am thinking of changing the QuerParser.jj file to identify the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing complex queries . Can you suggest better way of achieving this.* *Following are the list of additions that i need to do in SOLR.* *1. Span NOT Operator* . 2.Adding Recursive and Range Proximity *Recursive Proximity *is a proximity query within a proximity query Ex:income tax~5 statement ~4 The recursion can be up to any level. * Range Proximity*: Currently we can only define number as a range we want interval as a range . Ex: profit income~3,5, United America~-5,4 3. Complex Queries A complex query is a query formed with a combination of Boolean operators or proximity queries or range queries or any possible combination of these. Ex:(income AND tax) statement~4 income tax~4 (statement OR period) ~3 ( income SPAN NOT income tax ) source ~3,5 Can anyone suggest us some way of achieving these 3 functionalities in SOLR ??? On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.comwrote: Nope. But the LucidWorks Search product query parser does support SpanNot if you use their BEFORE, AFTER, and NEAR span operators. See: http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations For example: George BEFORE:2 Bush NOT H to match George anything Bush, but not George H. W. Bush. What is your specific use case? -- Jack Krupansky -Original Message- From: Ankit Kumar Sent: Tuesday, October 15, 2013 3:58 AM To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org Subject: Span Not Queries I need to add Span Not queries in solr . Ther's a parser Surround Query Parser i went through this ( http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-** parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html ) to discover that surround query parser does not analyze text Does DisMaxQueryParser supports SpanNot Queries ??
RE: Span Not Queries
And don't forget analysis! :) The code is non-trivial, and it will take a generous committer to help me get it into shape for committing. Once I push my mods to jira (end of next week), you should be able to compile it and run it at least for dev/testing to confirm that it meets your needs. From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com] Sent: Thursday, January 16, 2014 1:21 PM To: dev@lucene.apache.org Subject: Re: Span Not Queries Thanks Tim. This exactly fits my requirements of recursion, SpanNot and ComplexParser combination with Boolean Parser. Since I would end up doing the exact same changes to my QueryParserBase class, I would be locked with the current version of SOLR for forseeable future. Can you comment on when is the possible release if it gets reviewed by next week? On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. talli...@mitre.orgmailto:talli...@mitre.org wrote: Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) might help. I'm hoping to post updates to both by the end of next week. Then, if a committer would be willing to review and add these to Lucene/Solr, you should be good to go. Take a look at the description for LUCENE-5205and see if that capability will meet your needs. Thank you. Best, Tim From: Gopal Agarwal [mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com] Sent: Thursday, January 16, 2014 4:10 AM To: dev@lucene.apache.orgmailto:dev@lucene.apache.org Subject: Fwd: Span Not Queries Please help me out with earlier query. In short: 1. Can we change the QueryParser.jj file to identify the SpanNot query as a boolean clause? 2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries also? For further explanation, following are the examples. On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote: *I have a business use case in which i need to use Span Not and other ordered proximity queries . And they can be nested upto any level A Boolean inside a ordered query or ordered query inside a Boolean . Currently i am thinking of changing the QuerParser.jj file to identify the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing complex queries . Can you suggest better way of achieving this.* *Following are the list of additions that i need to do in SOLR.* *1. Span NOT Operator* . 2.Adding Recursive and Range Proximity *Recursive Proximity *is a proximity query within a proximity query Ex:income tax~5 statement ~4 The recursion can be up to any level. * Range Proximity*: Currently we can only define number as a range we want interval as a range . Ex: profit income~3,5, United America~-5,4 3. Complex Queries A complex query is a query formed with a combination of Boolean operators or proximity queries or range queries or any possible combination of these. Ex:(income AND tax) statement~4 income tax~4 (statement OR period) ~3 ( income SPAN NOT income tax ) source ~3,5 Can anyone suggest us some way of achieving these 3 functionalities in SOLR ??? On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky j...@basetechnology.commailto:j...@basetechnology.comwrote: Nope. But the LucidWorks Search product query parser does support SpanNot if you use their BEFORE, AFTER, and NEAR span operators. See: http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations For example: George BEFORE:2 Bush NOT H to match George anything Bush, but not George H. W. Bush. What is your specific use case? -- Jack Krupansky -Original Message- From: Ankit Kumar Sent: Tuesday, October 15, 2013 3:58 AM To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org Subject: Span Not Queries I need to add Span Not queries in solr . Ther's a parser Surround Query Parser i went through this ( http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-** parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html ) to discover that surround query parser does not analyze text Does DisMaxQueryParser supports SpanNot Queries ??
dangers of limiting tokenizers/disabling assertions in MockTokenizer?
All, I realize that we should be consuming all tokens from a stream. I'd like to wrap a client's Analyzer with LimitTokenCountAnalyzer with consume=false. For the analyzers that I've used, this has caused no problems. When I use MockTokenizer, I run into this assertion error: end() called before incrementToken(). The comment in MockTokenizer reads: // some tokenizers, such as limiting tokenizers, call end() before incrementToken() returns false. // these tests should disable this check (in general you should consume the entire stream) Disabling assertions gives me pause as does disobeying the workflow (http://lucene.apache.org/core/4_5_1/core/index.html). I assume from the warnings that there are Analyzers and use cases that will fail unless the stream is entirely consumed. Is there a safe way to wrap a client Analyzer and only read x number of tokens? Should I allow the client to decide whether or not to consume? Thank you! Best, Tim