RE: 6.6.2 Release

2017-10-13 Thread Allison, Timothy B.
Sounds good.  Thank you!

From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com]
Sent: Friday, October 13, 2017 5:25 PM
To: dev@lucene.apache.org
Subject: Re: 6.6.2 Release

> Any chance we could get SOLR-11450 in?  I understand if the answer is no. 
Currently, I want to have this release out as soon as possible so as to 
mitigate the risk exposure of the security vulnerability. Since this is not 
committed yet, I'd vote for leaving this out and possibly having it included in 
a later release, if needed.
+1 to SOLR-11297.


On Sat, Oct 14, 2017 at 2:32 AM, David Smiley 
<david.w.smi...@gmail.com<mailto:david.w.smi...@gmail.com>> wrote:
Suggested criteria for bug-fix release issues:
* fixes a bug :-) and doesn't harm backwards-compatibility in the process
* helps users upgrade to later versions
* documentation

+1 to SOLR-11297

I'm not sure on SOLR-11450.  Seems it might introduce a back-compat issue?

On Fri, Oct 13, 2017 at 4:40 PM Erick Erickson 
<erickerick...@gmail.com<mailto:erickerick...@gmail.com>> wrote:
I'd also like to get SOLR-11297 in if there are no objections. Ditto if the 
answer is no

It's quite a safe fix though.



On Fri, Oct 13, 2017 at 1:26 PM, Allison, Timothy B. 
<talli...@mitre.org<mailto:talli...@mitre.org>> wrote:
Any chance we could get SOLR-11450 in?  I understand if the answer is no. 

Thank you!

From: Ishan Chattopadhyaya 
[mailto:ichattopadhy...@gmail.com<mailto:ichattopadhy...@gmail.com>]
Sent: Friday, October 13, 2017 4:23 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Subject: 6.6.2 Release

Hi,
In light of [0], we need a 6.6.2 release as soon as possible.
I'd like to volunteer to RM for this release, unless someone else wants to do 
so or has an objection.
Regards,
Ishan


[0] - 
https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list

--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
http://www.solrenterprisesearchserver.com



RE: 6.6.2 Release

2017-10-13 Thread Allison, Timothy B.
Any chance we could get SOLR-11450 in?  I understand if the answer is no. 

Thank you!

From: Ishan Chattopadhyaya [mailto:ichattopadhy...@gmail.com]
Sent: Friday, October 13, 2017 4:23 PM
To: dev@lucene.apache.org
Subject: 6.6.2 Release

Hi,
In light of [0], we need a 6.6.2 release as soon as possible.
I'd like to volunteer to RM for this release, unless someone else wants to do 
so or has an objection.
Regards,
Ishan


[0] - 
https://lucene.apache.org/solr/news.html#12-october-2017-please-secure-your-apache-solr-servers-since-a-zero-day-exploit-has-been-reported-on-a-public-mailing-list


RE: GSOC2017: Call to Solr and Tika/Nutch/Camel/NiFi/Zeppelin/etc mentors

2017-03-27 Thread Allison, Timothy B.
Alex,
  I'm more than happy to chip in on the Tika side.  Thank you for leading this 
effort.

   Cheers,

   Tim

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Sunday, March 26, 2017 9:09 PM
To: dev@lucene.apache.org
Subject: Re: GSOC2017: Call to Solr and Tika/Nutch/Camel/NiFi/Zeppelin/etc 
mentors

Sounds good. Let's see what happens.

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 26 March 2017 at 07:27, Dmitry Kan  wrote:
> Hi Alexandre,
>
> Forwarded your call to luke's google group:
> https://groups.google.com/forum/#!topic/luke-discuss/rmZo7R3gDdc
>
> There might be a potential for solr/lucene/luke projects, like adding 
> capability to open solr/lucene index in luke from a remote server:
> https://github.com/DmitryKey/luke/issues/68
>
> Good luck with SOC!
>
> Regards,
> Dmitry
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>
> On 17 March 2017 at 16:25, Alexandre Rafalovitch  wrote:
>>
>> I am mentoring in this year's Google Summer of Code (first timer!). I 
>> know there is a couple of us from Solr project, but I also noticed 
>> some mentors from upstream/downstream projects.
>>
>> I am proposing that we put at least a couple of integration projects 
>> to improve/upgrade Solr integration with both upstream (Tika) and 
>> downstream (Nutch/Camel/NiFi) projects. And maybe even propose some 
>> new projects, such as Solr backend engine for Zeppelin (so we could 
>> have a Python Notebook-like interface to Solr, not just via JDBC 
>> bridge, we do already).
>>
>> I am not sure whose JIRAs they should go into, but the project idea 
>> tag spans all ASF projects, so it is more important to figure out 
>> mentor-level agreement first.
>>
>> In any case, to push this forward, if we get several students all 
>> working on Solr, I could run a Solr bootcamp class for students, 
>> mentors, and whoever else in the sister communities wants to 
>> participate and get more familiar with Solr. We could also run a 
>> parallel mini-list (or Gitter room or whatever) where multiple new 
>> implementors of Solr integrations can hang out together and progress 
>> together.
>>
>>
>> Regards,
>>Alex.
>> P.s. I am also working on redoing Solr examples (starting from DIH 
>> ones at: SOLR-10311). If anybody has comments on what kind of 
>> examples would make integrations easier, I am very receptive.
>> P.p.s. Feel free to forward this to the other mailing lists for other 
>> relevant sister Apache communities.
>> 
>> http://www.solr-start.com/ - Resources for Solr users, new and 
>> experienced
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For 
>> additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org



Apache Tika's public regression corpus

2016-10-05 Thread Allison, Timothy B.
All,

I recently blogged about some of the work we're doing with a large scale 
regression corpus to make Tika, POI and PDFBox more robust and to identify 
regressions before release.  If you'd like to chip in with recommendations, 
requests or Hadoop/Spark clusters (why not shoot for the stars), please do!

  
http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/

Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 
for most of our files!

Cheers,

 Tim

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [jira] [Commented] (SOLR-8017) solr.PointType can't deal with coordination in format like (0.9504547, 1.0, 1.0890503)

2016-05-02 Thread Allison, Timothy B.
>> so that means that using tika metadata indexing with schemaless mode 
> is, well, useless ?
Yes. 

>I know of nobody using "schemaless" for production for the simple >reason that 
>it makes the best guess it can based on the _first_ time it >sees a particular 
>field. There's absolutely no way to guarantee that that >doc is representative 
>of all docs.
> And if you want to really get weird, some programs allow custom attributes.

Agreed. It makes no sense to go schemaless with Tika's metadata.

>In the Tika case you've also got the problem that there's no universal 
>metadata definition. What's "author" >in one type of doc might be "editor" in 
>another. Or "most_recent_edit" might be "last_edited" and even if >these are 
>dates the format won't necessarily be the same.

We do try to normalize across file formats to Dublin Core when possible -- 
dc:creator, dc:created.  We also try to normalize date formats for those 
metadata items that we know are dates (dc:created, etc.).  If you find issues 
with normalization or can recommend areas for improvement, please do!



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Lucene ancient greek normalization

2014-11-21 Thread Allison, Timothy B.
ICU looks promising:

Μῆνιν ἄειδε, θεὰ, Πηληϊάδεω Ἀχιλλῆος -

1.μηνιν
2.αειδε
3.θεα
4.πηληιαδεω
5.αχιλληοσ

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Friday, November 21, 2014 3:08 PM
To: dev@lucene.apache.org
Subject: Re: Lucene ancient greek normalization

Are you sure that's not something that's already addressed by the ICU
Filter? 
http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/icu/ICUTransformFilterFactory.html

If you follow the links to what's possible, the page talks about
Greek, though not ancient:
http://userguide.icu-project.org/transforms/general#TOC-Greek

There was also some discussion on:
https://issues.apache.org/jira/browse/LUCENE-1343

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 21 November 2014 14:14, paolo anghileri
paolo.anghil...@codegeneration.it wrote:
 For development purposes I need the ability in lucene to normalize ancient
 greek characters for al the cases of grammatical details such as accents,
 diacritics and so on.

 My need is to retrieve ancient greek words with accents and other
 grammatical details by the input of the string without accents.

 For example the input of οργανον (organon) should to retrieve also  Ὄργανον,


 I am not a lucene commiter and I a new to this so my question is about the
 best practice to implement this in Lucene, and possibile submit a commit
 proposal to Lucene A project management committee.

 I have made some searches and found this file in Lucene-soir:


 It contains normalization for some chars.
 My thought would be to add extra normalization here, including all unicode
 ancient greek chars with all grammatical details.
 I already have all the unicode values for that chars so It should not be
 difficult for me to include them

 If my understanding is correct, this should add to lucene the features
 described above.


 As I am new to this, my needs are:

  To be sure that this is the correct place in Lucene for doing normalization
 How to post commit proposal


 Any help appreciated

 Kind regards

 Paolo

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Bug in AnalyzingQueryParser Pattern

2014-07-21 Thread Allison, Timothy B.
Thank you, Dennis:

https://issues.apache.org/jira/i#browse/LUCENE-5839

From: Dennis Walter [mailto:dennis.wal...@gmail.com]
Sent: Sunday, July 20, 2014 2:52 PM
To: dev@lucene.apache.org
Subject: Bug in AnalyzingQueryParser Pattern


Hi there,

While reading the source code of AnalyzingQueryParser to understand what it 
does, I think I found a bug in the regular expression used to detect wildcards. 
It is defined as

  // gobble escaped chars or find a wildcard character
  private final Pattern wildcardPattern = 
Pattern.compile((\\.)|([?*]+))file:///\\.)|([%3f*]+)%22);

The first group will match a literal dot (.), while its intention seems to be 
to match a backslash and a single character. So the expression should instead 
be (.)|([?*]+)file:///\\\.)|([%3f*]+)

Best Regards
Dennis



ensuring codec can index offsets in test framework

2014-03-19 Thread Allison, Timothy B.
This is similar to David Smiley's question on Feb 16th, but SuppressCodecs 
would be too broad of a solution, I think.



I'm using LuceneTestCase's newIndexWriterConfig, and I have a test that 
requires IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.  The test 
passes quite often (famous last words), but I occasionally get an 
UnsupportedOperationException: this codec cannot index offsets.



Is there a way to have LuceneTestCase randomly select a codec (with particular 
subcomponents/configurations) that supports indexing offsets?



The codec that fails:



codec=Lucene46: {f1:MockVariableIntBlock(baseBlockSize=71)}, docValues:{}, ...



Most often, however Lucene46 does not fail.


RE: ensuring codec can index offsets in test framework

2014-03-19 Thread Allison, Timothy B.
Perfect.  As always, thank you.


From: Robert Muir [rcm...@gmail.com]
Sent: Wednesday, March 19, 2014 10:29 AM
To: dev@lucene.apache.org
Subject: Re: ensuring codec can index offsets in test framework

for now you can use an assume, there is a helper in LuceneTestCase:

String pf = TestUtil.getPostingsFormat(dummy);
boolean supportsOffsets = !doesntSupportOffsets.contains(pf);

another option is to suppress the codecs that don't support it
(anything using Sep layout).

This is annoying though, maybe we should remove Sep layout?
Realistically it was the precursor to the block layout that Lucene41
introduced, which was a big change, but i am unsure if its really
helping us anymore, because it just falls behind on things and i dont
think has any interesting qualities for real use or that would be
useful in testing, either..

On Wed, Mar 19, 2014 at 8:53 AM, Allison, Timothy B. talli...@mitre.org wrote:
 This is similar to David Smiley's question on Feb 16th, but SuppressCodecs
 would be too broad of a solution, I think.



 I'm using LuceneTestCase's newIndexWriterConfig, and I have a test that
 requires IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS.  The test
 passes quite often (famous last words), but I occasionally get an
 UnsupportedOperationException: this codec cannot index offsets.



 Is there a way to have LuceneTestCase randomly select a codec (with
 particular subcomponents/configurations) that supports indexing offsets?



 The codec that fails:



 codec=Lucene46: {f1:MockVariableIntBlock(baseBlockSize=71)}, docValues:{},
 ...



 Most often, however Lucene46 does not fail.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Suggestions about writing / extending QueryParsers

2014-03-07 Thread Allison, Timothy B.
Tommaso,
  Ah, now I see.  If you want to add new operators, you'll have to modify the 
javacc files.  For the SpanQueryParser, I added a handful of new operators and 
chose to go with regexes instead of javacc...not sure that was the right 
decision, but given my lack of knowledge of javacc, it was expedient.  If you 
have time or already know javacc, it shouldn't be difficult.
  As for nobrainer on the Solr side, y, it shouldn't be a problem.  However, as 
of now the basic queryparser is a copy and paste job between Lucene and Solr, 
so you'll just have to redo your code in Solrunless you do something 
smarter.
  If you'd be willing to wait for LUCENE-5205 to be brought into Lucene, I'd 
consider adding this functionality into the SpanQueryParser as a later step.

  Cheers,

 Tim

From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Friday, March 07, 2014 3:17 AM
To: dev@lucene.apache.org
Subject: Re: Suggestions about writing / extending QueryParsers

Thanks Tim and Upayavira for your replies.

I still need to decide what the final syntax could be, however generally 
speaking the ideal would be that I am able to extend the current Lucene syntax 
with a new expression which will trigger the creation of a more like this query 
with something like +title:foo +text for similar docs%2 where the phrase 
between quotes will generate a MoreLikeThisQuery on that text if it's followed 
by the % character (and the number 2 may control the MLT configuration, e.g. 
min document freq == min term freq = 2), similarly to what it's done for 
proximity search (not sure about using %, it's just a syntax example).
I guess then I'd need to extend the classic query parser, as per Tim's 
suggestions and I'd assume that if this goes into the classic qp it should be a 
no brainer on the Solr side.
Does it sound correct / feasible?

Regards,
Tommaso
2014-03-06 15:08 GMT+01:00 Upayavira 
u...@odoko.co.ukmailto:u...@odoko.co.uk:
Tommaso,

Do say more about what you're thinking of. I'm currently getting my dev 
environment up to look into enhancing the MoreLikeThisHandler to be able handle 
function query boosts. This should be eminently possible from my initial 
research. However, if you're thinking of something more powerful, perhaps we 
can work together.

Upayavira


On Thu, Mar 6, 2014, at 11:23 AM, Tommaso Teofili wrote:
Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso



RE: Suggestions about writing / extending QueryParsers

2014-03-06 Thread Allison, Timothy B.
Hi Tommaso,

  It will depend on how different your target syntax will be.  If you extend 
the classic parser (or, QueryParserBase), there is a fair amount of overhead 
and extras that you might not want or need.  On the other hand, the query 
syntax and the methods will be familiar to the Lucene community, and there is a 
large number of test cases already built for you.  On the third hand, if you 
need not modify the low level parsing stuff, you'll have to be familiar with 
javacc.

  There's the flexible family that should allow for easy modifications, and 
the xml family could offer an easy interface between a custom lexer and a 
parser.   The SimpleQueryParser offers a model of building something fairly 
simple and yet very elegant from scratch.

  In deciding where to start, another consideration might include how easy it 
will be to integrate at the Solr level.  Make sure to include field-based hooks 
for processing multiterms, prefix and range queries.

  For LUCENE-5205, I eventually chose to subclass QueryParserBase, and I had to 
override  a fair amount of code because every terminal had to be a SpanQuery - 
most of the queryparser infrastructure is built for traditional queries.

  So, what features do you want to add for mlt?  What capabilities do you need?

  Cheers,

  Tim



From: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Sent: Thursday, March 06, 2014 6:23 AM
To: dev@lucene.apache.org
Subject: Suggestions about writing / extending QueryParsers

Hi all,

I'm thinking about writing/extending a QueryParser for MLT queries; I've never 
really looked into that code too much, while I'm doing that now, I'm wondering 
if anyone has suggestions on how to start with such a topic.
Should I write a new grammar for that ? Or can I just extend an existing 
grammar / class?

Thanks in advance,
Tommaso


RE: Span Not Queries

2014-01-29 Thread Allison, Timothy B.
Dotting i's and crossing t's on javadocs.  Have to push eta to end of this week.

From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com]
Sent: Monday, January 27, 2014 4:31 AM
To: dev@lucene.apache.org
Subject: Re: Span Not Queries

Hi,

Any news on this?

On Fri, Jan 17, 2014 at 1:54 AM, Gopal Agarwal 
gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com wrote:
Sounds perfect. Hopefully one of the committer picks this up and adds this to 
4.7.

Will keep checking the updates...

On Fri, Jan 17, 2014 at 1:17 AM, Allison, Timothy B. 
talli...@mitre.orgmailto:talli...@mitre.org wrote:
And don't forget analysis! :)

The code is non-trivial, and it will take a generous committer to help me get 
it into shape for committing.  Once I push my mods to jira (end of next week), 
you should be able to compile it and run it at least for dev/testing to confirm 
that it meets your needs.

From: Gopal Agarwal 
[mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com]
Sent: Thursday, January 16, 2014 1:21 PM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: Re: Span Not Queries

Thanks Tim. This exactly fits my requirements of recursion, SpanNot and 
ComplexParser combination with Boolean Parser.

Since I would end up doing the exact same changes to my QueryParserBase class, 
I would be locked with the current version of SOLR for forseeable future.

Can you comment on when is the possible release if it gets reviewed by next 
week?


On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. 
talli...@mitre.orgmailto:talli...@mitre.org wrote:
Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) 
might help.  I'm hoping to post updates to both by the end of next week.  Then, 
if a committer would be willing to review and add these to Lucene/Solr, you 
should be good to go.

Take a look at the description for LUCENE-5205and see if that capability will 
meet your needs.  Thank you.

  Best,

 Tim

From: Gopal Agarwal 
[mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com]
Sent: Thursday, January 16, 2014 4:10 AM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: Fwd: Span Not Queries

Please help me out with earlier query.

In short:
1. Can we change the QueryParser.jj file to identify the SpanNot query as a 
boolean clause?

2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries 
also?

For further explanation, following are the examples.

On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar 
ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote:
*I have a business use case in which i need to use Span Not and
other ordered proximity queries . And they can be nested upto any level
A Boolean inside a ordered query or ordered query inside a Boolean
 . Currently i am thinking of changing the QuerParser.jj file to identify
the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing
complex queries . Can you suggest better way of achieving this.*

*Following are the list of additions that i need to do in SOLR.*

*1. Span NOT Operator*  .

2.Adding Recursive and Range Proximity

  *Recursive Proximity *is a proximity query within a proximity query

Ex:income tax~5   statement ~4  The recursion can be up to any
level.

* Range Proximity*: Currently we can only define number as a range we
want interval as a range .

Ex: profit income~3,5,  United America~-5,4



3. Complex  Queries

A complex query is a query formed with a combination of Boolean operators
or proximity queries or range queries or any possible combination of these.

Ex:(income AND tax) statement~4

   income tax~4  (statement OR period) ~3

  ( income SPAN NOT  income tax ) source ~3,5

 Can anyone suggest us some way of achieving these 3 functionalities in SOLR
 ???


On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.comwrote:

 Nope. But the LucidWorks Search product query parser does support SpanNot
 if you use their BEFORE, AFTER, and NEAR span operators.

 See:
 http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations

 For example: George BEFORE:2 Bush NOT H to match George anything Bush,
 but not George H. W. Bush.

 What is your specific use case?

 -- Jack Krupansky

 -Original Message- From: Ankit Kumar
 Sent: Tuesday, October 15, 2013 3:58 AM
 To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org
 Subject: Span Not Queries


 I need to add Span Not queries in solr . Ther's a parser Surround Query
 Parser  i went through this (
 http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-**
 parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html
 )
 to discover that surround query parser does not analyze text

 Does DisMaxQueryParser supports SpanNot Queries ??








RE: Span Not Queries

2014-01-16 Thread Allison, Timothy B.
Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) 
might help.  I'm hoping to post updates to both by the end of next week.  Then, 
if a committer would be willing to review and add these to Lucene/Solr, you 
should be good to go.

Take a look at the description for LUCENE-5205and see if that capability will 
meet your needs.  Thank you.

  Best,

 Tim

From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com]
Sent: Thursday, January 16, 2014 4:10 AM
To: dev@lucene.apache.org
Subject: Fwd: Span Not Queries

Please help me out with earlier query.

In short:
1. Can we change the QueryParser.jj file to identify the SpanNot query as a 
boolean clause?

2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries 
also?

For further explanation, following are the examples.

On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar 
ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote:
*I have a business use case in which i need to use Span Not and
other ordered proximity queries . And they can be nested upto any level
A Boolean inside a ordered query or ordered query inside a Boolean
 . Currently i am thinking of changing the QuerParser.jj file to identify
the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing
complex queries . Can you suggest better way of achieving this.*

*Following are the list of additions that i need to do in SOLR.*

*1. Span NOT Operator*  .

2.Adding Recursive and Range Proximity

  *Recursive Proximity *is a proximity query within a proximity query

Ex:income tax~5   statement ~4  The recursion can be up to any
level.

* Range Proximity*: Currently we can only define number as a range we
want interval as a range .

Ex: profit income~3,5,  United America~-5,4



3. Complex  Queries

A complex query is a query formed with a combination of Boolean operators
or proximity queries or range queries or any possible combination of these.

Ex:(income AND tax) statement~4

   income tax~4  (statement OR period) ~3

  ( income SPAN NOT  income tax ) source ~3,5

 Can anyone suggest us some way of achieving these 3 functionalities in SOLR
 ???


On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.comwrote:

 Nope. But the LucidWorks Search product query parser does support SpanNot
 if you use their BEFORE, AFTER, and NEAR span operators.

 See:
 http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations

 For example: George BEFORE:2 Bush NOT H to match George anything Bush,
 but not George H. W. Bush.

 What is your specific use case?

 -- Jack Krupansky

 -Original Message- From: Ankit Kumar
 Sent: Tuesday, October 15, 2013 3:58 AM
 To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org
 Subject: Span Not Queries


 I need to add Span Not queries in solr . Ther's a parser Surround Query
 Parser  i went through this (
 http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-**
 parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html
 )
 to discover that surround query parser does not analyze text

 Does DisMaxQueryParser supports SpanNot Queries ??





RE: Span Not Queries

2014-01-16 Thread Allison, Timothy B.
And don't forget analysis! :)

The code is non-trivial, and it will take a generous committer to help me get 
it into shape for committing.  Once I push my mods to jira (end of next week), 
you should be able to compile it and run it at least for dev/testing to confirm 
that it meets your needs.

From: Gopal Agarwal [mailto:gopal.agarw...@gmail.com]
Sent: Thursday, January 16, 2014 1:21 PM
To: dev@lucene.apache.org
Subject: Re: Span Not Queries

Thanks Tim. This exactly fits my requirements of recursion, SpanNot and 
ComplexParser combination with Boolean Parser.

Since I would end up doing the exact same changes to my QueryParserBase class, 
I would be locked with the current version of SOLR for forseeable future.

Can you comment on when is the possible release if it gets reviewed by next 
week?


On Thu, Jan 16, 2014 at 11:06 PM, Allison, Timothy B. 
talli...@mitre.orgmailto:talli...@mitre.org wrote:
Apologies for the self-promotion...LUCENE-5205 and its Solr cousin (SOLR-5410) 
might help.  I'm hoping to post updates to both by the end of next week.  Then, 
if a committer would be willing to review and add these to Lucene/Solr, you 
should be good to go.

Take a look at the description for LUCENE-5205and see if that capability will 
meet your needs.  Thank you.

  Best,

 Tim

From: Gopal Agarwal 
[mailto:gopal.agarw...@gmail.commailto:gopal.agarw...@gmail.com]
Sent: Thursday, January 16, 2014 4:10 AM
To: dev@lucene.apache.orgmailto:dev@lucene.apache.org
Subject: Fwd: Span Not Queries

Please help me out with earlier query.

In short:
1. Can we change the QueryParser.jj file to identify the SpanNot query as a 
boolean clause?

2. Can we use ComplexPhraseQuery Parser to support SpanOR and SpanNOT queries 
also?

For further explanation, following are the examples.

On Tue, Oct 15, 2013 at 11:27 PM, Ankit Kumar 
ankitthemight...@gmail.commailto:ankitthemight...@gmail.com wrote:
*I have a business use case in which i need to use Span Not and
other ordered proximity queries . And they can be nested upto any level
A Boolean inside a ordered query or ordered query inside a Boolean
 . Currently i am thinking of changing the QuerParser.jj file to identify
the SpanNot query and use Complex Phrase Query Parser of Lucene for parsing
complex queries . Can you suggest better way of achieving this.*

*Following are the list of additions that i need to do in SOLR.*

*1. Span NOT Operator*  .

2.Adding Recursive and Range Proximity

  *Recursive Proximity *is a proximity query within a proximity query

Ex:income tax~5   statement ~4  The recursion can be up to any
level.

* Range Proximity*: Currently we can only define number as a range we
want interval as a range .

Ex: profit income~3,5,  United America~-5,4



3. Complex  Queries

A complex query is a query formed with a combination of Boolean operators
or proximity queries or range queries or any possible combination of these.

Ex:(income AND tax) statement~4

   income tax~4  (statement OR period) ~3

  ( income SPAN NOT  income tax ) source ~3,5

 Can anyone suggest us some way of achieving these 3 functionalities in SOLR
 ???


On Tue, Oct 15, 2013 at 10:15 PM, Jack Krupansky 
j...@basetechnology.commailto:j...@basetechnology.comwrote:

 Nope. But the LucidWorks Search product query parser does support SpanNot
 if you use their BEFORE, AFTER, and NEAR span operators.

 See:
 http://docs.lucidworks.com/**display/lweug/Proximity+**Operationshttp://docs.lucidworks.com/display/lweug/Proximity+Operations

 For example: George BEFORE:2 Bush NOT H to match George anything Bush,
 but not George H. W. Bush.

 What is your specific use case?

 -- Jack Krupansky

 -Original Message- From: Ankit Kumar
 Sent: Tuesday, October 15, 2013 3:58 AM
 To: solr-u...@lucene.apache.orgmailto:solr-u...@lucene.apache.org
 Subject: Span Not Queries


 I need to add Span Not queries in solr . Ther's a parser Surround Query
 Parser  i went through this (
 http://lucene.472066.n3.**nabble.com/Surround-query-**http://nabble.com/Surround-query-**
 parser-not-working-td4075066.**htmlhttp://lucene.472066.n3.nabble.com/Surround-query-parser-not-working-td4075066.html
 )
 to discover that surround query parser does not analyze text

 Does DisMaxQueryParser supports SpanNot Queries ??






dangers of limiting tokenizers/disabling assertions in MockTokenizer?

2013-11-01 Thread Allison, Timothy B.
All,
  I realize that we should be consuming all tokens from a stream.  I'd like to 
wrap a client's Analyzer with LimitTokenCountAnalyzer with consume=false. For 
the analyzers that I've used, this has caused no problems.  When I use 
MockTokenizer, I run into this assertion error: end() called before 
incrementToken().  The comment in MockTokenizer reads:

// some tokenizers, such as limiting tokenizers, call end() before 
incrementToken() returns false.
// these tests should disable this check (in general you should consume the 
entire stream)

 Disabling assertions gives me pause as does disobeying the workflow 
(http://lucene.apache.org/core/4_5_1/core/index.html).  I assume from the 
warnings that there are Analyzers and use cases that will fail unless the 
stream is entirely consumed.

  Is there a safe way to wrap a client Analyzer and only read x number of 
tokens?  Should I allow the client to decide whether or not to consume?

  Thank you!

 Best,

  Tim