from:"Modassar Ather \(JIRA\)"

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-03-06 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923586#comment-13923586
 ] 

Modassar Ather commented on LUCENE-5205:


Hi,

Phrases with stop words in them are not getting searched whereas a phrase 
without it gets searched using SpanQueryParser/ComplexPhraseQueryParser.
E.g. calculator for evaluating is not showing any result for an indexed 
document whereas a phrase without stop word like evaluating mathematical is 
getting searched.

The similar search works fine with classic parser which uses PhraseQuery as it 
includes the position of terms for all the terms including stop words in the 
query created.
Kindly provide your inputs.

Thanks,
Modassar

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.7

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, LUCENE-5205_smallTestMods.patch, 
 LUCENE_5205.patch, SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6156) Exception while using group with timeAllowed on SolrCloud.

2014-06-10 Thread Modassar Ather (JIRA)

Modassar Ather created SOLR-6156:


 Summary: Exception while using group with timeAllowed on SolrCloud.
 Key: SOLR-6156
 URL: https://issues.apache.org/jira/browse/SOLR-6156
 Project: Solr
  Issue Type: Bug
Reporter: Modassar Ather


Following exception is thrown when using grouping with timeAllowed. Solr 
version used is 4.8.0.
SEVERE: 
null:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
java.lang.NullPointerException
at 
org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158)
at 
org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:113)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
at 
org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:219)
at 
org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:156)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:338)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6156) Exception while using group with timeAllowed on SolrCloud.

2014-06-10 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026297#comment-14026297
 ] 

Modassar Ather commented on SOLR-6156:
--

Hi [~cpoerschke]

With timeAllowed following query is used without rows parameter which caused 
exception as in description of this ticket.
group=truegroup.field=FIELD_NAME

 Exception while using group with timeAllowed on SolrCloud.
 --

 Key: SOLR-6156
 URL: https://issues.apache.org/jira/browse/SOLR-6156
 Project: Solr
  Issue Type: Bug
Reporter: Modassar Ather

 Following exception is thrown when using grouping with timeAllowed. Solr 
 version used is 4.8.0.
 SEVERE: 
 null:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
 java.lang.NullPointerException
 at 
 org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158)
 at 
 org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:113)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
 at 
 org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:219)
 at 
 org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:156)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:338)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather commented on LUCENE-5205:


I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field:(SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather edited comment on LUCENE-5205 at 11/18/14 11:59 AM:
---

I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.


was (Author: modassar):
I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field:(SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field:(SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field:(SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field:(SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216113#comment-14216113
 ] 

Modassar Ather edited comment on LUCENE-5205 at 11/18/14 12:01 PM:
---

I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.
NOTE: A space between the field: and query is added to avoid transformation to 
smileys.


was (Author: modassar):
I am trying following queries and facing an issue for which need your 
suggestions. The environment is 4 shard cluster with embedded zookeeper on one 
of them.

q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

field: (SEARCH TOOL'S PROVIDER'S AND CONSULTING COMPANY) Gets transformed to 
following:
+spanNear([field:search, spanNear([field:s, field:provider], 0, true), field:s, 
field:and, field:consulting, field:company], 0, true)

field: (SEARCH TOOL'S SOLUTION PROVIDER TECHNOLOGY CO., LTD.) Gets stuck and 
does not return. We have set query timeAllowed to 5 minutes but it seems that 
it is not reaching here and continues.
During debug I found that it gets stuck at m.find(), Line 154 of SpanQueryLexer 
after it has created token for double quotes and term SEARCH.

Whereas the above query without (') gets transformed to following
field: (SEARCH TOOLS SOLUTION PROVIDER TECHNOLOGY CO., LTD.) = 
+spanNear([field:search, field:tools, field:solution, field:provider, 
field:technology, field:co, field:ltd], 0, true)

Need your help in understanding if I am not using the query properly or it can 
be an issue.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in:

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216353#comment-14216353
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks [~talli...@apache.org] for your response. I am using it from lucene5205 
branch(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene5205/) 
integrated as patch to latest Lucene core jar.


 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-19 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217708#comment-14217708
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks [~talli...@apache.org] for your response. I am using the SpanQuryParser 
and fix for query hanging issue from your github site as provided in your 
comment.
I tried using WhiteSpaceTokenizer. Will check with StandardAnalyzer too.

With WhiteSpaceTokenizer:
q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) still gets transformed 
to following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-19 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217708#comment-14217708
 ] 

Modassar Ather edited comment on LUCENE-5205 at 11/19/14 10:45 AM:
---

Thanks [~talli...@apache.org] for your response. I am using the SpanQuryParser 
and fix for query hanging issue from your github site as provided in your 
comment.
I am using WhiteSpaceTokenizer.

With WhiteSpaceTokenizer:
q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) still gets transformed 
to following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

I am trying to find the possible cause of the removal of '' in my config.


was (Author: modassar):
Thanks [~talli...@apache.org] for your response. I am using the SpanQuryParser 
and fix for query hanging issue from your github site as provided in your 
comment.
I tried using WhiteSpaceTokenizer. Will check with StandardAnalyzer too.

With WhiteSpaceTokenizer:
q=field: (SEARCH TOOLS PROVIDER  CONSULTING COMPANY) still gets transformed 
to following:
+spanNear([field:search, field:tools, field:provider, field:, field:consulting, 
field:company], 0, true)

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-19 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217817#comment-14217817
 ] 

Modassar Ather commented on LUCENE-5205:


It is solr.PatternReplaceFilterFactory in my analyzer chain which is replacing 
 with blank. Thanks for sharing the above details.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.9

 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-11-20 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220574#comment-14220574
 ] 

Modassar Ather commented on LUCENE-5205:


Sorry for replying little late [~talli...@apache.org].
bq. Ah, ok, so to confirm, no further action is required from me on the  issue?
As of now I see no issue with usage of  in the query as it is getting removed 
in my analyzer's chain.

bq. Are you ok with single quotes becoming operators? Can you see a way of 
improving that behavior?
We have been using double quotes with square brackets for phrase and nested 
phrase queries respectively. Have not used single quotes for the same.

 [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
 classic QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231355#comment-14231355
 ] 

Modassar Ather commented on LUCENE-5205:



A query like following is throwing Exception.
Query: field: (term1* term2*) term3*~5 AND (field: (term4 OR term5 OR 
term6...termN)) where N  a couple of thousand e.g 6000-8000.
Exception:
Exception in thread main java.lang.StackOverflowError
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4554)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4615)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4466)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3694)
at java.util.regex.Pattern$Branch.match(Pattern.java:4502)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4556)
at java.util.regex.Pattern$Loop.match(Pattern.java:4683)

Kindly let me know if I am using query in a wrong way.
NOTE: A space between the field: and query is added to avoid transformation to 
smileys.

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232537#comment-14232537
 ] 

Modassar Ather commented on LUCENE-5205:


[~talli...@apache.org], yes the query have around 7500 OR'ed terms. 

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4647) Grouping is broken on docvalues-only fields

2014-07-15 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061877#comment-14061877
 ] 

Modassar Ather commented on SOLR-4647:
--

Hi,

I am also seeing this issue while doing grouping on docValues enabled field. I 
checked createField(...) method of FieldType which returns null if the field is 
not indexed and stored. 
Kindly provide inputs if any of the indexe/stored needs to be set to true while 
creating a docValue field or this is an issue.

Thanks,
Modassar


 Grouping is broken on docvalues-only fields
 ---

 Key: SOLR-4647
 URL: https://issues.apache.org/jira/browse/SOLR-4647
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.2
Reporter: Adrien Grand
  Labels: newdev

 There are a few places where grouping uses 
 FieldType.toObject(SchemaField.createField(String, float)) to translate a 
 String field value to an Object. The problem is that createField returns null 
 when the field is neither stored nor indexed, even if it has doc values.
 An option to fix it could be to use the ValueSource instead to resolve the 
 Object value (similarily to NumericFacets).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4647) Grouping is broken on docvalues-only fields

2014-07-15 Thread Modassar Ather (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061877#comment-14061877
]

Modassar Ather edited comment on SOLR-4647 at 7/15/14 9:40 AM:
---

Hi,

I am also seeing this issue while doing grouping on docValues enabled field. I
checked createField(...) method of FieldType which returns null if the field is
not indexed and stored.
And when the returned field (which is null) gets passed to the
fieldType.toObject(...) method from finish() method of Grouping.java it causes
the NullPointerException.
Kindly provide inputs if any of the indexe/stored needs to be set to true while
creating a docValue field or this is an issue.

Thanks,
Modassar

was (Author: modassar):
Hi,

Thanks,
Modassar

Grouping is broken on docvalues-only fields
---

Key: SOLR-4647
URL: https://issues.apache.org/jira/browse/SOLR-4647
Project: Solr
Issue Type: Bug
Affects Versions: 4.2
Reporter: Adrien Grand
Labels: newdev

There are a few places where grouping uses
FieldType.toObject(SchemaField.createField(String, float)) to translate a
String field value to an Object. The problem is that createField returns null
when the field is neither stored nor indexed, even if it has doc values.
An option to fix it could be to use the ValueSource instead to resolve the
Object value (similarily to NumericFacets).

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6156) Exception while using group with timeAllowed on SolrCloud.

2014-07-16 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063390#comment-14063390
 ] 

Modassar Ather commented on SOLR-6156:
--

While debugging found that the create() method under SearchGroupsFieldCommand 
class return an empty list of collectors if topNGroups  0 and if 
includeGroupCount is false.
And if the collectors list is empty CommandHandler class invokes 
searchWithTimeLimiter(query, filter, null);. null is passed as collectors.

With the null collector passed an instance of TimeLimitingCollector is created 
in searchWithTimeLimiter(...) method of CommandHandler class if 
queryCommand.getTimeAllowed()  0.

Code snippet from CommandHandler. Here in the code below the collector passed 
to TimeLimitingCollector is null.
if (queryCommand.getTimeAllowed()  0 ) {
  collector = new TimeLimitingCollector(collector, 
TimeLimitingCollector.getGlobalCounter(), queryCommand.getTimeAllowed());
}

The NullPointerException occurrs when the TimeLimitingCollector instance 
created with null collector invokes setNextReader(...)

Not sure if topNGroups should be greater than 0 or includeGroupCount should be 
true when using timeAllowed or it is an issue. Kindly provide inputs.
Also let me know if the collectors list is null and timeAllowed0 which 
collector can be used to create the instance of TimeLimitingCollector.

Thanks,
Modassar

 Exception while using group with timeAllowed on SolrCloud.
 --

 Key: SOLR-6156
 URL: https://issues.apache.org/jira/browse/SOLR-6156
 Project: Solr
  Issue Type: Bug
Reporter: Modassar Ather

 Following exception is thrown when using grouping with timeAllowed. Solr 
 version used is 4.8.0.
 SEVERE: 
 null:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
 java.lang.NullPointerException
 at 
 org.apache.lucene.search.TimeLimitingCollector.setNextReader(TimeLimitingCollector.java:158)
 at 
 org.apache.lucene.search.MultiCollector.setNextReader(MultiCollector.java:113)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:612)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
 at 
 org.apache.solr.search.grouping.CommandHandler.searchWithTimeLimiter(CommandHandler.java:219)
 at 
 org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:156)
 at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:338)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591683#comment-14591683
 ] 

Modassar Ather commented on LUCENE-5205:


While debugging I found following.

{noformat}fl:[term1 term2]~1 [term3 term4]~1 is getting parsed as following:
spanNear([spanNear([fl:term1, fl:term2], 1, false), spanNear([fl:term3, 
fl:term4], 1, false)], 0, true){noformat}

{noformat}fl:([term1 term2]~1 [term3 term4]~1)  is getting parsed as 
following:
spanNear([spanOr([spanNear([fl:term1, fl:term2], 1, false), spanNear([fl:term3, 
fl:term4], 1, false)])], 0, true){noformat}

The second query is causing the exception as mentioned in my previous comments. 
As per my understanding it is the final SpanNear which gets an SpanOr which is 
only one span where as the SpanNear expects two subSpans.
The internal SpanNear gets this test passed as they have two sub-spans but the 
final SpanNear is failing.  {noformat}e.g. spanNear([fl:term1, fl:term2], 1, 
false){noformat}

Please let me know your suggestions.


 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-22 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595442#comment-14595442
 ] 

Modassar Ather commented on LUCENE-5205:


Adding one more test which will cause the issue.
term1 any stop word E.g. plug in

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-21 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595323#comment-14595323
 ] 

Modassar Ather commented on LUCENE-5205:


I also had to do few changes during the migration from 4.x to 5.x in 
SpanQueryParser related code. I am looking forward for the fix.
Thanks for your help.

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-25 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600957#comment-14600957
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks [~talli...@mitre.org]
Will integrate and start using it.

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-19 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593237#comment-14593237
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]
Did you get a chance to look at the issue? Please look into it and let me know 
your inputs.


 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-22 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597157#comment-14597157
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org] 
Did you mean the fix for the failing query with exception (Less than 2 
subSpans.size() : 1) ?

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591367#comment-14591367
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]
I migrated to Lucene/Solr 5.2.0 from Lucene/Solr 5.1.0. Lucene/Solr 5.2.0 has 
many changes related to Spans.
While testing the SpanQueryParser I found a query like below is causing 
exception.
{noformat} Query: ft:([term1 term2]~1 [term3 term4]~1) {noformat}
{noformat} Exception: msg: Error from server at 
http:/host:8983/solr/collection: Less than 2 subSpans.size():1, {noformat}
The exception is triggering from 
org.apache.lucene.search.spans.ConjunctionSpans line number 38.
Kindly provide your suggestions.

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591616#comment-14591616
 ] 

Modassar Ather commented on LUCENE-5205:


Adding one more query for better clarification.
{noformat}[(term1 [term2 term3]~1 [term term]~1) (term4 [term5 term6]~1 
term7)]~2 (term8 [term9 term10])~4{noformat}

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-06-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591616#comment-14591616
 ] 

Modassar Ather edited comment on LUCENE-5205 at 6/18/15 10:43 AM:
--

Adding one more query for better clarification.
{noformat}[(term1 [term2 term3]~1 [term4 term5]~1) (term6 [term7 term8]~1 
term9)]~2 (term10 [term11 term12])~4{noformat}


was (Author: modassar):
Adding one more query for better clarification.
{noformat}[(term1 [term2 term3]~1 [term term]~1) (term4 [term5 term6]~1 
term7)]~2 (term8 [term9 term10])~4{noformat}

 SpanQueryParser with recursion, analysis and syntax very similar to classic 
 QueryParser
 ---

 Key: LUCENE-5205
 URL: https://issues.apache.org/jira/browse/LUCENE-5205
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Tim Allison
  Labels: patch
 Attachments: LUCENE-5205-cleanup-tests.patch, 
 LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
 LUCENE-5205_dateTestReInitPkgPrvt.patch, 
 LUCENE-5205_improve_stop_word_handling.patch, 
 LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
 SpanQueryParser_v1.patch.gz, patch.txt


 This parser extends QueryParserBase and includes functionality from:
 * Classic QueryParser: most of its syntax
 * SurroundQueryParser: recursive parsing for near and not clauses.
 * ComplexPhraseQueryParser: can handle near queries that include multiterms 
 (wildcard, fuzzy, regex, prefix),
 * AnalyzingQueryParser: has an option to analyze multiterms.
 At a high level, there's a first pass BooleanQuery/field parser and then a 
 span query parser handles all terminal nodes and phrases.
 Same as classic syntax:
 * term: test 
 * fuzzy: roam~0.8, roam~2
 * wildcard: te?t, test*, t*st
 * regex: /\[mb\]oat/
 * phrase: jakarta apache
 * phrase with slop: jakarta apache~3
 * default or clause: jakarta apache
 * grouping or clause: (jakarta apache)
 * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
 * multiple fields: title:lucene author:hatcher
  
 Main additions in SpanQueryParser syntax vs. classic syntax:
 * Can require in order for phrases with slop with the \~ operator: 
 jakarta apache\~3
 * Can specify not near: fever bieber!\~3,10 ::
 find fever but not if bieber appears within 3 words before or 10 
 words after it.
 * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
 apache\]~3 lucene\]\~4 :: 
 find jakarta within 3 words of apache, and that hit has to be within 
 four words before lucene
 * Can also use \[\] for single level phrasal queries instead of  as in: 
 \[jakarta apache\]
 * Can use or grouping clauses in phrasal queries: apache (lucene solr)\~3 
 :: find apache and then either lucene or solr within three words.
 * Can use multiterms in phrasal queries: jakarta\~1 ap*che\~2
 * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
 /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like jakarta within two 
 words of ap*che and that hit has to be within ten words of something like 
 solr or that lucene regex.
 * Can require at least x number of hits at boolean level: apache AND (lucene 
 solr tika)~2
 * Can use negative only query: -jakarta :: Find all docs that don't contain 
 jakarta
 * Can use an edit distance  2 for fuzzy query via SlowFuzzyQuery (beware of 
 potential performance issues!).
 Trivial additions:
 * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
 prefix =2)
 * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
 =2: (jakarta~1 (OSA) vs jakarta~1(Levenshtein)
 This parser can be very useful for concordance tasks (see also LUCENE-5317 
 and LUCENE-5318) and for analytical search.  
 Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
 Most of the documentation is in the javadoc for SpanQueryParser.
 Any and all feedback is welcome.  Thank you.
 Until this is added to the Lucene project, I've added a standalone 
 lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706525#comment-14706525
 ] 

Modassar Ather edited comment on SOLR-7954 at 8/21/15 10:35 AM:


Following test method can be used to add data using which the exception can be 
reproduced. Please do the necessary changes.
Change ZKHOST:ZKPORT to point to zkhost available and COLECTION to the 
available collection.
{noformat}
public void index() throws SolrServerException, IOException {
CloudSolrClient s = new CloudSolrClient(ZKHOST:ZKPORT);
int count = 0;
s.setDefaultCollection(COLECTION);
ListSolrInputDocument documents = new ArrayList();
for (int i = 1; i = 100; i++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField(field1, i);
doc.addField(colid, val!+i+!-+ref+i);
doc.addField(field, DATA+(12345+i));
documents.add(doc);
if((documents.size() % 1) == 0){
count = count + 1;
s.add(documents);
System.out.println(System.currentTimeMillis() +  - Indexed 
document #  + NumberFormat.getInstance().format(count));
documents = new ArrayList();
}
}


System.out.println(Comitting.);
s.commit(true, true);

System.out.println(Optimizing.);
s.optimize(true, true, 1);
s.close();
System.out.println(Done.);
}
{noformat}


was (Author: modassar):
Following test method can be used to add data using which the exception can be 
reproduced. Please do the necessary changes.
Change ZKHOST:ZKPORT to point to zkhost available and COLECTION to the 
available collection.
{noformat}
public void index() throws SolrServerException, IOException {
CloudSolrClient s = new CloudSolrClient(ZKHOST:ZKPORT);
int count = 0;
s.setDefaultCollection(COLECTION);
ListSolrInputDocument documents = new ArrayList();
for (int i = 1; i = 100; i++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField(field1, i);
doc.addField(colid, val!+i+!-+ref+i);
doc.addField(field, DATA+(12345+i));
documents.add(doc);
if((documents.size() % 1) == 0){
count = count + 1;
s.add(documents);
System.out.println(System.currentTimeMillis() +  - Indexed 
document #  + NumberFormat.getInstance().format(count));
documents = new ArrayList();
}
}


System.out.println(Comitting.);
s.commit(true, true);

System.out.println(Optimizing.);
s.close();
System.out.println(Done.);
}
{noformat}

 Exception while using {!cardinality=1.0}.
 -

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: SolrCloud 4 node cluster.
 Ubuntu 12.04
 OS Type 64 bit
Reporter: Modassar Ather

 Following exception is thrown for the query : 
 bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
 The exception is not seen once the cardinality is set to 0.9 or less.
 The field is docValues enabled and indexed=false. The same exception I tried 
 to reproduce on non docValues field but could not.
 ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
 null:java.lang.ArrayIndexOutOfBoundsException: 3
 at 
 net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
 at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
 at net.agkn.hll.HLL.toBytes(HLL.java:917)
 at net.agkn.hll.HLL.toBytes(HLL.java:869)
 at 
 org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
 at 
 org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
 at 
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at

[jira] [Commented] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706525#comment-14706525
 ] 

Modassar Ather commented on SOLR-7954:
--

Following test method can be used to add data using which the exception can be 
reproduced. Please do the necessary changes.
Change ZKHOST:ZKPORT to point to zkhost available and COLECTION to the 
available collection.
{noformat}
public void index() throws SolrServerException, IOException {
CloudSolrClient s = new CloudSolrClient(ZKHOST:ZKPORT);
int count = 0;
s.setDefaultCollection(COLECTION);
ListSolrInputDocument documents = new ArrayList();
for (int i = 1; i = 100; i++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField(field1, i);
doc.addField(colid, val!+i+!-+ref+i);
doc.addField(field, DATA+(12345+i));
documents.add(doc);
if((documents.size() % 1) == 0){
count = count + 1;
s.add(documents);
System.out.println(System.currentTimeMillis() +  - Indexed 
document #  + NumberFormat.getInstance().format(count));
documents = new ArrayList();
}
}


System.out.println(Comitting.);
s.commit(true, true);

System.out.println(Optimizing.);
s.close();
System.out.println(Done.);
}
{noformat}

 Exception while using {!cardinality=1.0}.
 -

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: SolrCloud 4 node cluster.
 Ubuntu 12.04
 OS Type 64 bit
Reporter: Modassar Ather

 Following exception is thrown for the query : 
 bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
 The exception is not seen once the cardinality is set to 0.9 or less.
 The field is docValues enabled and indexed=false. The same exception I tried 
 to reproduce on non docValues field but could not.
 ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
 null:java.lang.ArrayIndexOutOfBoundsException: 3
 at 
 net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
 at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
 at net.agkn.hll.HLL.toBytes(HLL.java:917)
 at net.agkn.hll.HLL.toBytes(HLL.java:869)
 at 
 org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
 at 
 org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
 at 
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
 at

[jira] [Created] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather (JIRA)

Modassar Ather created SOLR-7954:


 Summary: Exception while using {!cardinality=1.0}.
 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
Reporter: Modassar Ather


Following exception is thrown for the query : 
bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
The exception is not seen once the cardinality is set to 0.9 or less.
The field is docValues enabled and indexed=false. The same exception I tried to 
reproduce on non docValues field but could not.

ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
null:java.lang.ArrayIndexOutOfBoundsException: 3
at 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706280#comment-14706280
 ] 

Modassar Ather commented on SOLR-7954:
--

The schema used is as follows.

?xml version=1.0 encoding=UTF-8 ?
schema name=collection version=1.5

 types
fieldType name=string class=solr.StrField sortMissingLast=true 
stored=false omitNorms=true/
fieldType name=string_dv class=solr.StrField 
sortMissingLast=true stored=false indexed=false docValues=true/
fieldType name=long class=solr.TrieLongField precisionStep=0 
omitNorms=true positionIncrementGap=0 stored=false/
/types

fields
field name=field1  type=string stored=true  /
field name=field   type=string_dv  multiValued=true /
field name=_version_   type=long   stored=true  /
field name=colid   type=string stored=true  /
/fields
uniqueKeycolid/uniqueKey
/schema

 Exception while using {!cardinality=1.0}.
 -

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
Reporter: Modassar Ather

 Following exception is thrown for the query : 
 bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
 The exception is not seen once the cardinality is set to 0.9 or less.
 The field is docValues enabled and indexed=false. The same exception I tried 
 to reproduce on non docValues field but could not.
 ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
 null:java.lang.ArrayIndexOutOfBoundsException: 3
 at 
 net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
 at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
 at net.agkn.hll.HLL.toBytes(HLL.java:917)
 at net.agkn.hll.HLL.toBytes(HLL.java:869)
 at 
 org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
 at 
 org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
 at 
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at org.eclipse.jetty.server.Server.handle(Server.java:497)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
 at 
 org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
 at 
 org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To

[jira] [Updated] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-21 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated SOLR-7954:
-
Environment: 
SolrCloud 4 node cluster.
Ubuntu 12.04
OS Type 64 bit

 Exception while using {!cardinality=1.0}.
 -

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: SolrCloud 4 node cluster.
 Ubuntu 12.04
 OS Type 64 bit
Reporter: Modassar Ather

 Following exception is thrown for the query : 
 bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
 The exception is not seen once the cardinality is set to 0.9 or less.
 The field is docValues enabled and indexed=false. The same exception I tried 
 to reproduce on non docValues field but could not.
 ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
 null:java.lang.ArrayIndexOutOfBoundsException: 3
 at 
 net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
 at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
 at net.agkn.hll.HLL.toBytes(HLL.java:917)
 at net.agkn.hll.HLL.toBytes(HLL.java:869)
 at 
 org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
 at 
 org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
 at 
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
 at 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
 at 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
 at 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
 at 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
 at 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
 at 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
 at org.eclipse.jetty.server.Server.handle(Server.java:497)
 at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
 at 
 org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
 at 
 org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
 at 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-23 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated SOLR-7954:
-
Description: 
Following exception is thrown for the query : 
bq. q=pn:*stats=truestats.field={!cardinality=1.0}field.
The exception is not seen once the cardinality is set to 0.9 or less.
The field is docValues enabled and indexed=false. The same exception I tried to 
reproduce on non docValues field but could not.

ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
null:java.lang.ArrayIndexOutOfBoundsException: 3
at 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

  was:
Following exception is thrown for the query : 
bq. q=field:querystats=truestats.field={!cardinality=1.0}field.
The exception is not seen once the cardinality is set to 0.9 or less.
The field is docValues enabled and indexed=false. The same exception I tried to 
reproduce on non docValues field but could not.

ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
null:java.lang.ArrayIndexOutOfBoundsException: 3
at 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at

[jira] [Updated] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-23 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated SOLR-7954:
-
Description: 
Following exception is thrown for the query : 
bq. q=field1:*stats=truestats.field={!cardinality=1.0}field.
The exception is not seen once the cardinality is set to 0.9 or less.
The field is docValues enabled and indexed=false. The same exception I tried to 
reproduce on non docValues field but could not.

ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
null:java.lang.ArrayIndexOutOfBoundsException: 3
at 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

  was:
Following exception is thrown for the query : 
bq. q=pn:*stats=truestats.field={!cardinality=1.0}field.
The exception is not seen once the cardinality is set to 0.9 or less.
The field is docValues enabled and indexed=false. The same exception I tried to 
reproduce on non docValues field but could not.

ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
null:java.lang.ArrayIndexOutOfBoundsException: 3
at 
net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
at net.agkn.hll.HLL.toBytes(HLL.java:917)
at net.agkn.hll.HLL.toBytes(HLL.java:869)
at 
org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
at 
org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
at 
org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
at

[jira] [Commented] (SOLR-7954) Exception while using {!cardinality=1.0}.

2015-08-24 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708909#comment-14708909
 ] 

Modassar Ather commented on SOLR-7954:
--

I tested following schema with the same data in field and field2. Both 
reproduced the problem. Then I tried to find if it is value in cardinality 
which is causing the issue. I tried with 10 to 12 document and both the 
field returned cardinality but after increasing it to around 15 it caused 
the exception.
{noformat}
?xml version=1.0 encoding=UTF-8 ?
schema name=collection version=1.5

types
fieldType name=string class=solr.StrField sortMissingLast=true 
stored=false omitNorms=true/
fieldType name=string_dv class=solr.StrField sortMissingLast=true 
stored=false indexed=false docValues=true/
fieldType name=long class=solr.TrieLongField precisionStep=0 
omitNorms=true positionIncrementGap=0 stored=false/
/types

fields
field name=field type=string_dv   multiValued=true /
field name=field1   type=string stored=true /
field name=field2   type=string multiValued=true /
field name=versiontype=long   stored=true /
field name=colidtype=string  stored=true /
/fields
uniqueKeycolid/uniqueKey
/schema
{noformat}

Following is the method to index data.
{noformat}
public void index() throws SolrServerException, IOException {
CloudSolrClient s = new CloudSolrClient(ZKHOST:ZKPORT);
int count = 0;
s.setDefaultCollection(COLECTION);
ListSolrInputDocument documents = new ArrayList();
for (int i = 1; i = 100; i++) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField(field1, i);
doc.addField(colid, val!+i+!-+ref+i);
doc.addField(field, DATA+(12345+i));
doc.addField(field2, DATA+(12345+i));
documents.add(doc);
if((documents.size() % 1) == 0){
count = count + 1;
s.add(documents);
System.out.println(System.currentTimeMillis() +  - Indexed 
document #  + NumberFormat.getInstance().format(count));
documents = new ArrayList();
}
}


System.out.println(Comitting.);
s.commit(true, true);

System.out.println(Optimizing.);
s.optimize(true, true, 1);
s.close();
System.out.println(Done.);
}
{noformat}

 Exception while using {!cardinality=1.0}.
 -

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: SolrCloud 4 node cluster.
 Ubuntu 12.04
 OS Type 64 bit
Reporter: Modassar Ather
Assignee: Hoss Man

 Following exception is thrown for the query : 
 bq. q=field1:*stats=truestats.field={!cardinality=1.0}field.
 The exception is not seen once the cardinality is set to 0.9 or less.
 The field is docValues enabled and indexed=false. The same exception I tried 
 to reproduce on non docValues field but could not.
 ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; 
 null:java.lang.ArrayIndexOutOfBoundsException: 3
 at 
 net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152)
 at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247)
 at net.agkn.hll.HLL.toBytes(HLL.java:917)
 at net.agkn.hll.HLL.toBytes(HLL.java:869)
 at 
 org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348)
 at 
 org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151)
 at 
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064)
 at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
 at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
 at 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
 at 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
 at

[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query

2015-08-24 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710599#comment-14710599
 ] 

Modassar Ather commented on SOLR-7954:
--

To add to the summary and description.

I changed the {noformat}doc.addField(colid, val!+i+!-+ref+i);{noformat} 
to {noformat}doc.addField(colid, val+i+!-+ref+i);{noformat}
The documents got distributed to all the nodes. I indexed 1 million documents 
and was able to reproduce the issue. All the shards had around 20 documents 
each.
Later I indexed 40 documents on which I could not reproduce it. All the 
shards had around 10 documents each.
There are 4 shards with no replica on my test environment.

 ArrayIndexOutOfBoundsException from distributed HLL serialization logic when 
 using using stats.field={!cardinality=1.0} in a distributed query
 --

 Key: SOLR-7954
 URL: https://issues.apache.org/jira/browse/SOLR-7954
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.2.1
 Environment: SolrCloud 4 node cluster.
 Ubuntu 12.04
 OS Type 64 bit
Reporter: Modassar Ather
Assignee: Hoss Man
 Attachments: SOLR-7954.patch


 User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a 
 field that has extremely high cardinality on a single shard (example: 150K 
 unique values) can lead to ArrayIndexOutOfBoundsException: 3 on the shard 
 during serialization of the HLL values.
 using cardinality=0.9 (or lower) doesn't produce the same symptoms, 
 suggesting the problem is specific to large log2m and regwidth values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-09-08 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734335#comment-14734335
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]

There is a document with following content in it which is indexed and stored.
{noformat}about 2% growth{noformat}

If following query is searched and the matched terms are highlighted then all 
the three terms of the document is highlighting.
Query: {noformat}"(growth* [term 2]) (about*)"~2{noformat}
Highlighted text : {noformat}about 2% 
growth{noformat}

I tried to debug and found that scorer.getFieldWeightedSpanTerms() has entries 
for all the terms in it at PositionSpan (0, 2).
Please help me understand 
* If it is an issue?
* Why "term" and "2" is present in the span terms although it should not match 
a document? 
* Why 2% is getting highlighted?

Regards,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail:

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-09-10 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740137#comment-14740137
 ] 

Modassar Ather commented on LUCENE-5205:


A fix for this issue will be a great help. Please provide your comments.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-09-10 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738250#comment-14738250
 ] 

Modassar Ather commented on LUCENE-5205:


Please let me know if a new bug needs to be opened for this issue?

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-09-14 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742980#comment-14742980
 ] 

Modassar Ather commented on LUCENE-5205:


Got the point. I think LUCENE-6796 will take care of this issue.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-10-05 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942956#comment-14942956
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]

The patch of this feature is failing to build with compilation error with 
Lucene/Solr-5.3.1.
I am trying to resolve it locally. public void fillBytesRef() method from 
TermToBytesRefAttribute.java has been removed.
Kindly look into it.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037358#comment-15037358
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]

A prefix in double quotes is getting eaten up by the analyzer. E.g. f:"term*" 
is getting parsed to (+f:refasten). What I understand the wildcard queries are 
not analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not used the way it is.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037358#comment-15037358
 ] 

Modassar Ather edited comment on LUCENE-5205 at 12/3/15 6:41 AM:
-

Hi [~talli...@mitre.org]

A prefix in double quotes is getting eaten up by the analyzer. E.g. f:"term*" 
is getting parsed to (+f:term). What I understand the wildcard queries are not 
analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not be used the way it is.

Thanks,
Modassar


was (Author: modassar):
Hi [~talli...@mitre.org]

A prefix in double quotes is getting eaten up by the analyzer. E.g. f:"term*" 
is getting parsed to (+f:refasten). What I understand the wildcard queries are 
not analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not used the way it is.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037358#comment-15037358
 ] 

Modassar Ather edited comment on LUCENE-5205 at 12/3/15 6:43 AM:
-

Hi [~talli...@mitre.org]

Wildard (*) in double quotes is getting eaten up by the analyzer. E.g. 
f:"term*" is getting parsed to (+f:term). What I understand the wildcard 
queries are not analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not be used the way it is in 
the example i.e a single wildcard term within double quotes.

Thanks,
Modassar


was (Author: modassar):
Hi [~talli...@mitre.org]

A prefix in double quotes is getting eaten up by the analyzer. E.g. f:"term*" 
is getting parsed to (+f:term). What I understand the wildcard queries are not 
analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not be used the way it is.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--

[jira] [Comment Edited] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-02 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037358#comment-15037358
 ] 

Modassar Ather edited comment on LUCENE-5205 at 12/3/15 6:44 AM:
-

Hi [~talli...@mitre.org]

Wildard * in double quotes is getting eaten up by the analyzer. E.g. f:"term*" 
is getting parsed to (+f:term). What I understand the wildcard queries are not 
analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not be used the way it is in 
the example i.e a single wildcard term within double quotes.

Thanks,
Modassar


was (Author: modassar):
Hi [~talli...@mitre.org]

Wildard (*) in double quotes is getting eaten up by the analyzer. E.g. 
f:"term*" is getting parsed to (+f:term). What I understand the wildcard 
queries are not analyzed.
I noticed that the same works fine within a phrase. E.g. f:"term1 term*". 
Please let me know if it is an issue or it should not be used the way it is in 
the example i.e a single wildcard term within double quotes.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene)

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-06 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044377#comment-15044377
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]

Kindly share any update on the fix if any.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038040#comment-15038040
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks. Currently I am using 5.2.1.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038045#comment-15038045
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038048#comment-15038048
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038100#comment-15038100
 ] 

Modassar Ather commented on LUCENE-5205:


Sorry for posting the same message more than one time. Some how the add button 
was not responding may be due to slow network which caused this.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038041#comment-15038041
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back on the version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038043#comment-15038043
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back to older version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks. Currently I am using 5.2.1.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back to older version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038049#comment-15038049
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038046#comment-15038046
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038042#comment-15038042
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for the quick response. Currently I am using 5.2.1 and will not be going 
back to older version.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back to older version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back to older version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back on the version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Deleted] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Modassar Ather updated LUCENE-5205:
---
Comment: was deleted

(was: Thanks for the quick response. Currently I am using 5.2.1 and will not be 
going back to older version.)

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-03 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038025#comment-15038025
 ] 

Modassar Ather commented on LUCENE-5205:


Thanks for your response. Just thought of checking if you are planning to add 
the fix to previous Solr/Lucene version too. I think a patch will be helpful.

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query

2016-01-07 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087105#comment-15087105
 ] 

Modassar Ather commented on SOLR-7954:
--

{noformat}q=fl1:net*=fl=50=true={!cardinality=1.0}fl{noformat}
  
Above query is returning cardinality around 15 million. It is taking around 4 
minutes. Similar response time is seen with different queries which yields high 
cardinality. Kindly note that the cardinality=1.0 is the desired goal.
Here in the above example the fl1 is a text field whereas fl is a docValue 
enabled non-stroed, non-indexed field.
Kindly let me know if such response time is expected or I am missing something 
about this feature in my query.

> ArrayIndexOutOfBoundsException from distributed HLL serialization logic when 
> using using stats.field={!cardinality=1.0} in a distributed query
> --
>
> Key: SOLR-7954
> URL: https://issues.apache.org/jira/browse/SOLR-7954
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.2.1
> Environment: SolrCloud 4 node cluster.
> Ubuntu 12.04
> OS Type 64 bit
>Reporter: Modassar Ather
>Assignee: Hoss Man
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7954.patch, SOLR-7954.patch, SOLR-7954.patch
>
>
> User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a 
> field that has extremely high cardinality on a single shard (example: 150K 
> unique values) can lead to "ArrayIndexOutOfBoundsException: 3" on the shard 
> during serialization of the HLL values.
> using "cardinality=0.9" (or lower) doesn't produce the same symptoms, 
> suggesting the problem is specific to large log2m and regwidth values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-01 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033625#comment-15033625
 ] 

Modassar Ather edited comment on LUCENE-5205 at 12/1/15 12:43 PM:
--

Hi [~talli...@mitre.org]

Kindly let me know how to escape special character in SpanQueryParser. From the 
discussion above it seems that single quote is used. I thinks I am missing some 
understanding about it as normally \ is used to escape special character.
e.g "understanding (span query)"  
To escape brackets in above query what is the right character. I want to match 
the phrase where brackets are not considered as special character.

Thanks,
Modassar


was (Author: modassar):
Hi [~talli...@mitre.org]

Kindly let me know how to escape special character in SpanQueryParser. From the 
discussion above it seems that single quote is used. I thinks I am missing some 
understanding about it as normally \ is used to escape special character.
e.g "understanding (span query)"  
To escape brackets in above query what is the right character. I want a match 
the phrase where brackets are not considered as special character.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons

[jira] [Commented] (LUCENE-5205) SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2015-12-01 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033625#comment-15033625
 ] 

Modassar Ather commented on LUCENE-5205:


Hi [~talli...@mitre.org]

Kindly let me know how to escape special character in SpanQueryParser. From the 
discussion above it seems that single quote is used. I thinks I am missing some 
understanding about it as normally \ is used to escape special character.
e.g "understanding (span query)"  
To escape brackets in above query what is the right character. I want a match 
the phrase where brackets are not considered as special character.

Thanks,
Modassar

> SpanQueryParser with recursion, analysis and syntax very similar to classic 
> QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.
> Until this is added to the Lucene project, I've added a standalone 
> lucene-addons repo (with jars compiled for the latest stable build of Lucene) 
>  on [github|https://github.com/tballison/lucene-addons].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6796) Some terms incorrectly highlighted in complex SpanQuery

2015-11-25 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028171#comment-15028171
 ] 

Modassar Ather commented on LUCENE-6796:


Hi David,

Please let me know your plan for the fix of this issue.

Thanks,
Modassar

> Some terms incorrectly highlighted in complex SpanQuery
> ---
>
> Key: LUCENE-6796
> URL: https://issues.apache.org/jira/browse/LUCENE-6796
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Affects Versions: 5.3
>Reporter: Tim Allison
>Assignee: David Smiley
>Priority: Trivial
> Attachments: LUCENE-6796-testcase.patch
>
>
> [~modassar] initially raised this on LUCENE-5205.  I'm opening this as a 
> separate issue.
> If a SpanNear is within a SpanOr, it looks like the child terms within the 
> SpanNear query are getting highlighted even if there is no match on that 
> SpanNear query...in some special cases.  Specifically, in the format of the 
> parser in LUCENE-5205 {{"(b [c z]) d\"~2"}}, which is equivalent to: find "b" 
> or the phrase "c z" within two words of "d" either direction
> This affects trunk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8853) edismax - Ignores OR in between terms.

2016-03-18 Thread Modassar Ather (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200980#comment-15200980
 ] 

Modassar Ather commented on SOLR-8853:
--

Please help me understand how to achieve (A OR B) where any of the term or both 
of the term will cause a match.

> edismax - Ignores OR in between terms.
> --
>
> Key: SOLR-8853
> URL: https://issues.apache.org/jira/browse/SOLR-8853
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Modassar Ather
>
> edismax parser is ignoring OR in between terms. The default operator is set 
> to AND. No value for mm is set in solr configuration.
> E.g. {noformat}fl:(java OR book){noformat} is parsed to {noformat}"+((fl:java 
> fl:book)~2)"{noformat}
> When the query has explicit OR then why the ~2 (I think due to mm being set 
> to 100%) is present in the parsed query?
> How can I achieve following?
> "+((fl:java fl:book))"
> Please help in understanding if it is intentional or is there a way to 
> achieve the desired search or is it a bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8853) edismax - Ignores OR in between terms.

2016-03-15 Thread Modassar Ather (JIRA)

Modassar Ather created SOLR-8853:


 Summary: edismax - Ignores OR in between terms.
 Key: SOLR-8853
 URL: https://issues.apache.org/jira/browse/SOLR-8853
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 5.5
Reporter: Modassar Ather


edismax parser is ignoring OR in between terms. The default operator is set to 
AND. No value for mm is set in solr configuration.
E.g. {noformat}fl:(java OR book){noformat} is parsed to {noformat}"+((fl:java 
fl:book)~2)"{noformat}
When the query has explicit OR then why the ~2 (I think due to mm being set to 
100%) is present in the parsed query?

How can I achieve following?
"+((fl:java fl:book))"

Please help in understanding if it is intentional or is there a way to achieve 
the desired search or is it a bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

67 matches

Mail list logo