from:"Jack Krupansky \(JIRA\)"

[jira] [Commented] (LUCENE-7202) Come up with a comprehensive proposal for naming spatial modules and technologies

2016-04-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239406#comment-15239406
 ] 

Jack Krupansky commented on LUCENE-7202:


Morton seems like more of a codec-level issue than an API - you still have 
k-dimensions of coordinates, but they are simply encoded to a singe number for 
each k-dimensional point. Maybe the implementation name finds its way into the 
API, but the first issue should be what is logically being modeled - what kind 
of points, like lat-lon, geospatial. or what. Presumably any can of 
k-dimensional space can be Morton-encoded.

XYZ? That's fine for math-style axes, for things like 3-D CAD models and 3-D 
printing, but seems inappropriate for a coordinate system intended to model 
points on the surface of a sphere like the locations of places around the globe.

To me, "Geo" seems to be an accepted reference to modeling "geographical" 
locations on the globe/planet.

How you model things like the location of a satellite or the space station is 
another matter. Geosynchronous satellites simply have an elevation/altitude 
above a surface point. Non-geosynchronous satellites have an orbit rather than 
a location per se, although we can speak of their location (surface plus 
elevation/altitude) at any given/specified moment in time. Ditto for aircraft, 
which have a flight path and only momentary location at some altitude (although 
a helicopter can maintain location for a longer moment.)

Besides geospatial surface points and 3-D CAD-style monitoring, which 
real-world use cases are these modules intended to cover. IOW, how should 
real-world users relate to them and choose from them?


> Come up with a comprehensive proposal for naming spatial modules and 
> technologies
> -
>
> Key: LUCENE-7202
> URL: https://issues.apache.org/jira/browse/LUCENE-7202
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/sandbox, modules/spatial, modules/spatial3d
>Affects Versions: master
>Reporter: Karl Wright
>
> There are three different spatial implementations circulating at the moment, 
> and nobody seems happy with the naming of them.  For each implementation 
> strategy, we need both a module name and a descriptive technology name that 
> we can use to distinguish one from the other.  I would expect the following 
> people to have an interest in this process: [~rcmuir], [~dsmiley], 
> [~mikemccand], etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8396) Investigate PointField to replace NumericField types

2016-03-25 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212559#comment-15212559
 ] 

Jack Krupansky commented on SOLR-8396:
--

My apologies as I am still only very slowly coming up to speed on this "New 
Math For Lucene" stuff. It feels like there are three distinct issues in play:

1. Desire to use the latest and greatest Lucene numeric field types. Granted, 
they are currently now called  IntPoint, FloatPoint, DoublePoint, etc., but 
functionally they are still simply int, float, and double values - no semantic 
difference, just the class names and then some method name changes for indexing 
(?) and query. My feeling is that we should preserve the legacy type names even 
if Lucene insists on calling them "points." Keep user schema files unchanged.
2. Desire to work with existing data - and existing schema files. Mix 
metaphors: cans of worms and nested Russian dolls.
3. Desire to auto-upgrade existing Solr index data to new "points" for better 
performance, reduced storage, reduced memory, reduced heap.

Some points:

1. Personally, I think it would be worth the effort to see if the Lucene guys 
can stick to to old names for IntField, et al even if the implementation is 
different under the hood.
2. Maybe there will be a need to be able to open an existing numeric field, 
discover that it is legacy numeric field (trie), and then under the hood use 
some wrapper to maintain the new API for the old format. IOW, switch Solr to 
using the new API, even for legacy numeric fields.
3. Seems like there is some need investigate the possibility or a 
NumericFieldUpgrader to rewrite a trie field as a point field. Seems like a 
necessary job for the Lucene guys for existing Lucene indexes, even if Solr 
wasn't in the picture.

> Investigate PointField to replace NumericField types
> 
>
> Key: SOLR-8396
> URL: https://issues.apache.org/jira/browse/SOLR-8396
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-8396.patch, SOLR-8396.patch
>
>
> In LUCENE-6917, [~mikemccand] mentioned that DimensionalValues are better 
> than NumericFields in most respects. We should explore the benefits of using 
> it in Solr and hence, if appropriate, switch over to using them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions

2016-03-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208763#comment-15208763
 ] 

Jack Krupansky commented on SOLR-8176:
--

To what extent can the graph traversal be parallelized for the data on a single 
node? The eternal question with Solr is how much data you can put on a node 
before you need to shard, or how big each shard can be. I'm curious how graph 
traversal affects that calculation. Also, how merge policy and segment size 
should be configured so that segments can be traversed in parallel. If there 
was some more idea way to organize the nodes in segments, maybe people could 
pack a lot more data on fat nodes to reduce the inter-node delays. 
Alternatively, maybe have more nodes mean more of the operations can be done in 
parallel without conflicting on local machine resources. Interesting tradeoffs.

> Model distributed graph traversals with Streaming Expressions
> -
>
> Key: SOLR-8176
> URL: https://issues.apache.org/jira/browse/SOLR-8176
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrCloud, SolrJ
>Affects Versions: master
>Reporter: Joel Bernstein
>  Labels: Graph
> Fix For: master
>
>
> I think it would be useful to model a few *distributed graph traversal* use 
> cases with Solr's *Streaming Expression* language. This ticket will explore 
> different approaches with a goal of implementing two or three common graph 
> traversal use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8844) Date math silently ignored for date strings

2016-03-14 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193332#comment-15193332
 ] 

Jack Krupansky commented on SOLR-8844:
--

1. No field is specified for the fq parameter here. What is df?
2. Do any matching date/time values occur as literal strings in the default 
search field?


> Date math silently ignored for date strings
> ---
>
> Key: SOLR-8844
> URL: https://issues.apache.org/jira/browse/SOLR-8844
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 6.1
>
>
> Consider the following query, ordered by date ascending: {code}
> http://localhost:8983/solr/logs/select?q=*:*=[2011-05-26T08:15:36Z%2B3DAY%20TO%20NOW/DAY]=time%20asc
> {code}
> Should not have a result set where the first entry entry has 
> 2011-05-26T08:15:36Z for the time field.
> It appears date math is just ignored, while i would expect it to work or 
> throw an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8831) allow _version_ field to be retrievable via docValues

2016-03-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191504#comment-15191504
 ] 

Jack Krupansky commented on SOLR-8831:
--

Now that docValues is supported for _version_, the question arises as to which 
is preferred (faster, less memory), stored or docValues. IOW, which should be 
the default. I presume it should be docValues, but I have no real clue.

Also, the doc for Atomic Update has this example as a Power Tip, that has BOTH 
stored and docValues set:

{code}

{code}

See:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Should that be changed to stored="false"? Or, is there actually some aditional 
hidden benefit to store="true" AND docValues="true"?


> allow _version_ field to be retrievable via docValues
> -
>
> Key: SOLR-8831
> URL: https://issues.apache.org/jira/browse/SOLR-8831
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 6.0
>
> Attachments: SOLR-8831.patch
>
>
> Right now, one is prohibited from having an unstored _version_ field, even if 
> docValues are enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8831) allow _version_ field to be unstored

2016-03-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191075#comment-15191075
 ] 

Jack Krupansky edited comment on SOLR-8831 at 3/11/16 3:49 PM:
---

Can we come up with a nice clean term for "stored or docValues are enabled"?

I mean, the issue title here is misleading, as the description then indicates - 
"if docValues are enabled." So, it should be "allow _version_ field to be 
unstored if docValues are enabled."

Traditional database nomenclature is no help here since the concept of 
non-stored data is meaningless in a true database.

Personally, I'd be happier if Solr hid a lot of the byzantine complexity of 
Lucene, including this odd distinction between stored and docValues. I mean, to 
me they are just two different implementations of the logical concept of 
storing data for later retreival - how the data is stored rather than whether 
it is stored.

I'll offer two suggested simple terms to be used at the Solr level even if 
Lucene insists on remaining byzantine: "xstored" or "retrievable", both meaning 
that the field attributes make it possible for Solr to retrieve data after 
indexing, either because the field is stored or has docValues enabled. This is 
not a proposal for a feature, but simply terminology to be used to talk about 
fields which are... "either stored or have docValues enabled." (If I wanted a 
feature, it might be to have a new attribute like 
retrieval_storage="\{by_field|by_document|none}" or... 
stored="\{yes|no|docValues|fieldValues}".)

I'm not proposing any feature here since that would be out of the scope of the 
issue, but since this issue needs doc, I am just proposing new terminology for 
that doc.

Again, to summarize more briefly, I am proposed that the terminology of 
"retrievable" be used to refer to fields that are either stored or have 
docValues enabled.


was (Author: jkrupan):
Can we come up with a nice clean term for "stored or docValues are enabled"?

I mean, the issue title here is misleading, as the description then indicates - 
"if docValues are enabled." So, it should be "allow _version_ field to be 
unstored if docValues are enabled."

Traditional database nomenclature is no help here since the concept of 
non-stored data is meaningless in a true database.

Personally, I'd be happier if Solr hid a lot of the byzantine complexity of 
Lucene, including this odd distinction between stored and docValues. I mean, to 
me they are just two different implementations of the logical concept of 
storing data for later retreival - how the data is stored rather than whether 
it is stored.

I'll offer two suggested simple terms to be used at the Solr level even if 
Lucene insists on remaining byzantine: "xstored" or "retrievable", both meaning 
that the field attributes make it possible for Solr to retrieve data after 
indexing, either because the field is stored or has docValues enabled. This is 
not a proposal for a feature, but simply terminology to be used to talk about 
fields which are... "either stored or have docValues enabled." (If I wanted a 
feature, it might be to have a new attribute like 
retrieval_storage="{by_field|by_document|none}" or... 
stored="{yes|no|docValues|fieldValues}".)

I'm not proposing any feature here since that would be out of the scope of the 
issue, but since this issue needs doc, I am just proposing new terminology for 
that doc.

Again, to summarize more briefly, I am proposed that the terminology of 
"retrievable" be used to refer to fields that are either stored or have 
docValues enabled.

> allow _version_ field to be unstored
> 
>
> Key: SOLR-8831
> URL: https://issues.apache.org/jira/browse/SOLR-8831
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8831.patch
>
>
> Right now, one is prohibited from having an unstored _version_ field, even if 
> docValues are enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8831) allow _version_ field to be unstored

2016-03-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15191075#comment-15191075
 ] 

Jack Krupansky commented on SOLR-8831:
--

Can we come up with a nice clean term for "stored or docValues are enabled"?

I mean, the issue title here is misleading, as the description then indicates - 
"if docValues are enabled." So, it should be "allow _version_ field to be 
unstored if docValues are enabled."

Traditional database nomenclature is no help here since the concept of 
non-stored data is meaningless in a true database.

Personally, I'd be happier if Solr hid a lot of the byzantine complexity of 
Lucene, including this odd distinction between stored and docValues. I mean, to 
me they are just two different implementations of the logical concept of 
storing data for later retreival - how the data is stored rather than whether 
it is stored.

I'll offer two suggested simple terms to be used at the Solr level even if 
Lucene insists on remaining byzantine: "xstored" or "retrievable", both meaning 
that the field attributes make it possible for Solr to retrieve data after 
indexing, either because the field is stored or has docValues enabled. This is 
not a proposal for a feature, but simply terminology to be used to talk about 
fields which are... "either stored or have docValues enabled." (If I wanted a 
feature, it might be to have a new attribute like 
retrieval_storage="{by_field|by_document|none}" or... 
stored="{yes|no|docValues|fieldValues}".)

I'm not proposing any feature here since that would be out of the scope of the 
issue, but since this issue needs doc, I am just proposing new terminology for 
that doc.

Again, to summarize more briefly, I am proposed that the terminology of 
"retrievable" be used to refer to fields that are either stored or have 
docValues enabled.

> allow _version_ field to be unstored
> 
>
> Key: SOLR-8831
> URL: https://issues.apache.org/jira/browse/SOLR-8831
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Attachments: SOLR-8831.patch
>
>
> Right now, one is prohibited from having an unstored _version_ field, even if 
> docValues are enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-09 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188158#comment-15188158
 ] 

Jack Krupansky commented on SOLR-8812:
--

The difference in the generated query appears to be the "))~2" which indicates 
a BooleanQuery with a minShouldMatch of 2 which means that both OR/SHOULD terms 
MUST match, effectively turning SHOULD/OR into MUST/AND.

I'm guessing it was this 5.5 change: SOLR-2649:

{code}
* SOLR-2649: MM ignored in edismax queries with operators.
  (Greg Pendlebury, Jan Høydahl et. al. via Erick Erickson)
{code}

I think q.op=AND simply sets MM=100%, effectively overriding the explicit OR.



> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8740) use docValues by default

2016-03-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185336#comment-15185336
 ] 

Jack Krupansky commented on SOLR-8740:
--

My apologies for any unnecessary noise I may have caused here. I just think 
that every single docValues issue raised for Solr should endeavor to make the 
lives of Solr users a lot easier, not more complicated and even more confusing. 
As things stand, docValues is more of an expert-only feature. The mere fact 
that we can't make docValues uniformly the default illustrates that in spades.

> use docValues by default
> 
>
> Key: SOLR-8740
> URL: https://issues.apache.org/jira/browse/SOLR-8740
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master
>Reporter: Yonik Seeley
> Fix For: master
>
>
> We should consider switching to docValues for most of our non-text fields.  
> This may be a better default since it is more NRT friendly and acts to avoid 
> OOM errors due to large field cache or UnInvertedField entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8740) use docValues by default

2016-03-07 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183177#comment-15183177
 ] 

Jack Krupansky commented on SOLR-8740:
--

And default to docValuesFormat="Memory" as well, or is that already the default 
when docValues="true" is set?

Personally, I still find the whole docValues vs. Stored fields narrative 
extremely confusing. I've never been able to figure out why Lucene still needs 
Stored fields (other than for tokenized text fields) if docValues is so much 
better.

In any case, with this Jira in place, there should be clear doc as to what 
scenarios, if any, stored="true" might have any utility for non-tokenized/text 
fields.

> use docValues by default
> 
>
> Key: SOLR-8740
> URL: https://issues.apache.org/jira/browse/SOLR-8740
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: master
>Reporter: Yonik Seeley
> Fix For: master
>
>
> We should consider switching to docValues for most of our non-text fields.  
> This may be a better default since it is more NRT friendly and acts to avoid 
> OOM errors due to large field cache or UnInvertedField entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3744) Solr LuceneQParser only handles pure negative queries at the top-level query, but not within parenthesized sub-queries

2016-03-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176543#comment-15176543
 ] 

Jack Krupansky commented on SOLR-3744:
--

Personally, I think the proper fix is in Lucene BooleanQuery itself - if no 
positive clauses are present, a MatchAllDocsQuery should be added as a MUST 
clause.

For example, currently if you have only one clause and it is MUST_NOT, BQ 
explicitly rewrites to MatchNoDocsQuery.

See:
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java

Any objection from the core Lucene committers?

What does ES do when no positive clauses are present in a subquery?


> Solr LuceneQParser only handles pure negative queries at the top-level query, 
> but not within parenthesized sub-queries
> --
>
> Key: SOLR-3744
> URL: https://issues.apache.org/jira/browse/SOLR-3744
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Jack Krupansky
>
> The SolrQuerySyntax wiki says that pure negative queries are supported ("Pure 
> negative queries (all clauses prohibited) are allowed"), which is true at the 
> top-level query, but not for sub-queries enclosed within parentheses.
> See:
> http://wiki.apache.org/solr/SolrQuerySyntax
> Some queries that will not evaluate properly:
> test AND (-fox)
> test (-fox)
> test OR (abc OR (-fox))
> test (-fox)
> Sub-queries combined with the "AND" and "OR" keyword operators also fail to 
> evaluate properly. For example,
> test OR -fox
> -fox OR test
> Note that all of these queries are supported properly by the edismax query 
> parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3744) Solr LuceneQParser only handles pure negative queries at the top-level query, but not within parenthesized sub-queries

2016-03-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176525#comment-15176525
 ] 

Jack Krupansky commented on SOLR-3744:
--

Long ago...

I'll try to remember. My vague recollection is that Solr simply fixed the 
top-level and inherited the nested behavior from Lucene, but now that Solr has 
its own copy of the basic query parser it should be fixable in the base query 
parser. But... I don't recall where I had tracked that down to.

> Solr LuceneQParser only handles pure negative queries at the top-level query, 
> but not within parenthesized sub-queries
> --
>
> Key: SOLR-3744
> URL: https://issues.apache.org/jira/browse/SOLR-3744
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Jack Krupansky
>
> The SolrQuerySyntax wiki says that pure negative queries are supported ("Pure 
> negative queries (all clauses prohibited) are allowed"), which is true at the 
> top-level query, but not for sub-queries enclosed within parentheses.
> See:
> http://wiki.apache.org/solr/SolrQuerySyntax
> Some queries that will not evaluate properly:
> test AND (-fox)
> test (-fox)
> test OR (abc OR (-fox))
> test (-fox)
> Sub-queries combined with the "AND" and "OR" keyword operators also fail to 
> evaluate properly. For example,
> test OR -fox
> -fox OR test
> Note that all of these queries are supported properly by the edismax query 
> parser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171096#comment-15171096
 ] 

Jack Krupansky commented on SOLR-8110:
--

bq. "safe"... "moderate"... "legacy"

My only real nit is that it would be a shame if we couldn't say simply that 
people will be safe if they stick to Java identifier rules. That would mean $ 
and full Unicode.

My point is that it makes learning Solr more intuitive since Java is more of a 
commonly-known entity - "Solr field names are Java identifiers", rather than 
encumber people with yet another set of rules to learn.

Note that the current Solr code mostly uses 
isJavaIdentifierStart/isJavaIdentifierPart today, but disallowing $, probably 
due to parameter substitution. IOW, Unicode is there today.

See:
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/StrParser.java
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrReturnFields.java


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171092#comment-15171092
 ] 

Jack Krupansky commented on SOLR-8110:
--

bq.  lucene expressions

I was going to say that Luceene Expressions are basically JavaScript, but... 
they are sort-of based on JS, but really more of a conceptual rather than 
literal basis. Here's Lucene's grammar rule for VARIABLE:

{code}
VARIABLE: ID ARRAY* ( [.] ID ARRAY* )*;
fragment ARRAY: [[] ( STRING | INTEGER ) [\]];
fragment ID: [_$a-zA-Z] [_$a-zA-Z0-9]*;
fragment STRING
: ['] ( '\\\'' | '' | ~[\\'] )*? [']
| ["] ( '\\"' | '' | ~[\\"] )*? ["]
;
{code}

See:
https://github.com/apache/lucene-solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressions/js/Javascript.g4

No Unicode support, no random special characters, just $ and _, but apparently 
dot as well.

An ID is:

{code}
ID: [_$a-zA-Z] [_$a-zA-Z0-9]*
{code}

And any number of IDs can be written with dots between them to represent a 
single VARIABLE token.

JavaScript identifiers are defined in the ECMAScript spec:
https://tc39.github.io/ecma262/#prod-IdentifierName

Letters in Java/ECMAScript are Unicode as defined by the Unicode property 
“ID_Start” and "ID_Continue". Java/ECMAScript supports $ and _ in addition to 
letters.

Identifier start and continue character types are defined by the Unicode UAX#31 
 Identifier spec:
http://unicode.org/reports/tr31/


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170700#comment-15170700
 ] 

Jack Krupansky commented on SOLR-8110:
--

I can't recall any explicit statement on case sensitivity, although I would 
imagine that the existing "anything goes" model would default to 
case-sensitive. Personally, I would prefer case-insensitive. I can't recall a 
schema in which case-sensitive field names were used, while case mistakes are 
not uncommon.

> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170698#comment-15170698
 ] 

Jack Krupansky commented on SOLR-8110:
--

Dollar sign is permitted in Java identifier, including at the start. As per the 
Java Spec, "The "Java letters" include uppercase and lowercase ASCII Latin 
letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical 
reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024)." 
It goes on to say that "The $ character should be used only in mechanically 
generated source code or, rarely, to access pre-existing names on legacy 
systems."

See:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8

If anything, I had been assuming that we were proposing a superset of Java 
identifiers (hyphen, dot as part of name.)

I'm not positive whether there might be any conflict with parameter 
substitution for dollar sign.


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-26 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170155#comment-15170155
 ] 

Jack Krupansky commented on SOLR-8110:
--

I've accepted the fact that Solr will probably never need to support full infix 
expressions. If somebody wants to seriously propose full infix expressions, 
fine, but it seems too much to me to worry much about vague possibilities.

Note that I am still a proponent of having quoted/escaped names which allow 
anything in names, ala SQL.

> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-26 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169830#comment-15169830
 ] 

Jack Krupansky commented on SOLR-8110:
--

Dot is a tough case. I can see reserving it for future expansion, but I can 
also see its utility in field names where its value is based on using it as a 
pseudo-field delimiter, such as in cases where data may in fact have come from 
an SQL ETL operation that actually did use the dot as a compound field name.

How about... saying that dot is pseudo-reserved for compound field name 
references, and if the decomposed field name has a well-defined meaning in some 
context, such as where there are contextual named structural entities, such as 
table or collection names, then so be it, but if it has no clear meaning in a 
context, then the full, dotted name will be treated as a raw field name? So, at 
the level of the fl parameter a dotted name would get parsed as a compound name 
and then treated as a simple field name.

> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
> Attachments: SOLR-8110.patch, SOLR-8110.patch
>
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8713) New UI points to the wiki for Query Syntax instead of the Reference Guide

2016-02-26 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169703#comment-15169703
 ] 

Jack Krupansky commented on SOLR-8713:
--

Be careful, because Confluence has the working text of the NEXT release of Solr 
(6.0 right now), not the current release or even necessarily the release that 
the admin UI is running.

It would be nice if the Confluence doc was per-release in addition to the 
development version, but right now only the PDF is per-release, which is what 
the admin UI should point to.


> New UI points to the wiki for Query Syntax instead of the Reference Guide
> -
>
> Key: SOLR-8713
> URL: https://issues.apache.org/jira/browse/SOLR-8713
> Project: Solr
>  Issue Type: Bug
>  Components: UI
>Affects Versions: master
>Reporter: Tomás Fernández Löbbe
>Priority: Trivial
>  Labels: newdev
>
> Old Admin UI points to 
> https://cwiki.apache.org/confluence/display/solr/Query+Syntax+and+Parsing but 
> the new one points to http://wiki.apache.org/solr/SolrQuerySyntax



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-23 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159113#comment-15159113
 ] 

Jack Krupansky commented on SOLR-8110:
--

1. Since the concept of enforcement of naming conventions is new, I would 
suggest making it optional in 6.x, preferably out-out - most people can 
probably live with it without problem. Whether it would just be a schema 
version trigger or a separate config/schema option can be debated.

2. Consider the concept of delimited identifiers as in SQL - enclose 
non-regular names in quotes. It is worth noting that highly-irregular names are 
not currently supported in queries even today (most special characters will 
terminate the field name in most query parsers.)


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156259#comment-15156259
 ] 

Jack Krupansky commented on SOLR-8110:
--

There is the issue of simple ASCII letters vs. Unicode letters. Java 
Identifiers support arbitrary Unicode letters which "allows programmers to use 
identifiers in their programs that are written in their native languages." See 
Character.isJavaIdentifierStart and isJavaIdentifierPart.


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-18 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153085#comment-15153085
 ] 

Jack Krupansky commented on SOLR-8621:
--

Shouldn't the index config reference page still list  but with a 
"Deprecated" notice? Ditto for .

The Upgrading Solr ref page does give an example of how to migrate from MP to 
MPF (and for MF) - it would be nice to link to that from a deprecated notice on 
the index config page.

See:
https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig
https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr


> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
> Fix For: 5.5, master
>
> Attachments: SOLR-8621-example_contrib_configs.patch, 
> SOLR-8621-example_contrib_configs.patch, SOLR-8621.patch, 
> explicit-merge-auto-set.patch
>
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * Lucene's SortingMergePolicy can be configured in Solr (with SOLR-5730)
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5 deprecates (but maintains)  support
> * SOLR-8668 in solr 6.0(\?) will remove  support 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7555) Display total space and available space in Admin

2016-02-17 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151147#comment-15151147
 ] 

Jack Krupansky edited comment on SOLR-7555 at 2/17/16 8:52 PM:
---

I recently noticed that quite a few of the Amazon EC2 instance types have two 
or more local SSD storage devices. Should Solr display "total space" across all 
available local devices or just for the storage device on which Solr appears to 
be configured? If the instance supports EBS-only, I presume it would be total 
for EBS that the instance type supports.


was (Author: jkrupan):
I recently noticed that quite a few f the Amazon EC2 instance types have two or 
more local SSD storage devices. Should Solr display "total space" across all 
available local devices or just for the storage device on which Solr appears to 
be configured? If the instance supports EBS-only, I presume it would be total 
for EBS that the instance type supports.

> Display total space and available space in Admin
> 
>
> Key: SOLR-7555
> URL: https://issues.apache.org/jira/browse/SOLR-7555
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 5.1
>Reporter: Eric Pugh
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
> Attachments: DiskSpaceAwareDirectory.java, 
> SOLR-7555-display_disk_space.patch, SOLR-7555-display_disk_space_v2.patch, 
> SOLR-7555-display_disk_space_v3.patch, SOLR-7555-display_disk_space_v4.patch, 
> SOLR-7555-display_disk_space_v5.patch, SOLR-7555.patch, SOLR-7555.patch, 
> SOLR-7555.patch
>
>
> Frequently I have access to the Solr Admin console, but not the underlying 
> server, and I'm curious how much space remains available.   This little patch 
> exposes total Volume size as well as the usable space remaining:
> !https://monosnap.com/file/VqlReekCFwpK6utI3lP18fbPqrGI4b.png!
> I'm not sure if this is the best place to put this, as every shard will share 
> the same data, so maybe it should be on the top level Dashboard?  Also not 
> sure what to call the fields! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7555) Display total space and available space in Admin

2016-02-17 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151147#comment-15151147
 ] 

Jack Krupansky commented on SOLR-7555:
--

I recently noticed that quite a few f the Amazon EC2 instance types have two or 
more local SSD storage devices. Should Solr display "total space" across all 
available local devices or just for the storage device on which Solr appears to 
be configured? If the instance supports EBS-only, I presume it would be total 
for EBS that the instance type supports.

> Display total space and available space in Admin
> 
>
> Key: SOLR-7555
> URL: https://issues.apache.org/jira/browse/SOLR-7555
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 5.1
>Reporter: Eric Pugh
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 6.0
>
> Attachments: DiskSpaceAwareDirectory.java, 
> SOLR-7555-display_disk_space.patch, SOLR-7555-display_disk_space_v2.patch, 
> SOLR-7555-display_disk_space_v3.patch, SOLR-7555-display_disk_space_v4.patch, 
> SOLR-7555-display_disk_space_v5.patch, SOLR-7555.patch, SOLR-7555.patch, 
> SOLR-7555.patch
>
>
> Frequently I have access to the Solr Admin console, but not the underlying 
> server, and I'm curious how much space remains available.   This little patch 
> exposes total Volume size as well as the usable space remaining:
> !https://monosnap.com/file/VqlReekCFwpK6utI3lP18fbPqrGI4b.png!
> I'm not sure if this is the best place to put this, as every shard will share 
> the same data, so maybe it should be on the top level Dashboard?  Also not 
> sure what to call the fields! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8110) Start enforcing field naming recomendations in next X.0 release?

2016-02-17 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151121#comment-15151121
 ] 

Jack Krupansky commented on SOLR-8110:
--

It would be nice to say that a "Solr identifier" had the same rules as a Java 
identifier, but Java allows dollar signs and excludes keywords and reserved 
terms like if, for, true, false, null. Hmmm... I don't know if many people 
would complain is Solr didn't allow those keywords as field names.

The main three exceptions to the current soft-rule that I have run across are:

1. Dot for compound names.
2. Hyphen feels a little more natural than underscore unless you're truly 
thinking about Java code and imagining that you could write a minus sign for a 
subtraction operation.
3. An ISO date/time value for dynamic fields which want to be time stamped. An 
optional text keyword prefix and hyphen are common for these timestamped 
columns as well.
4. Spaces, but I think sensible people can accept those as not permitted in 
names.

The main difficulty I am aware of in Solr is parsing of function queries, 
including (or especially) in the field list of the fl parameter.


> Start enforcing field naming recomendations in next X.0 release?
> 
>
> Key: SOLR-8110
> URL: https://issues.apache.org/jira/browse/SOLR-8110
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> For a very long time now, Solr has made the following "recommendation" 
> regarding field naming conventions...
> bq. field names should consist of alphanumeric or underscore characters only 
> and not start with a digit.  This is not currently strictly enforced, but 
> other field names will not have first class support from all components and 
> back compatibility is not guaranteed.  ...
> I'm opening this issue to track discussion about if/how we should start 
> enforcing this as a rule instead (instead of just a "recommendation") in our 
> next/future X.0 (ie: major) release.
> The goals of doing so being:
> * simplify some existing code/apis that currently use hueristics to deal with 
> lists of field and produce strange errors when the huerstic fails (example: 
> ReturnFields.add)
> * reduce confusion/pain for new users who might start out unaware of the 
> recommended conventions and then only later encountering a situation where 
> their field names are not supported by some feature and get frustrated 
> because they have to change their schema, reindex, update index/query client 
> expectations, etc...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5730) make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

2016-02-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129131#comment-15129131
 ] 

Jack Krupansky commented on SOLR-5730:
--

Let me try again... again, my apologies for not commenting much earlier before 
things got a bit complicated. Let me see if I have this straight:

1. There are three related tickets: SOLR-4654, SOLR-5730, SOLR-8621.
2. There are three key features of interest: UpgradeIndexMergePolicy, 
SortingMergePolicy , and EarlyTerminatingSortingCollector.
3. The first ticket is kind of the umbrella.
4. The second ticket is focused on the second and third features.
5. The third ticket is the foundation for all three features.
6. The third ticket has some user impact and delivers some additional minor 
benefits, but enabling those other three features is its true purpose.
7. SortingMergePolicy and EarlyTerminatingSortingCollector are really two sides 
of a single feature, the index side and the query side of (in my words) 
"pre-sorted indexing".

Now, I have only one remaining question area: Isn't the forceMerge method the 
only real benefit of UpgradeIndexMergePolicy? Is that purely for the Solr 
optimize option, or is there some intent to surface it for users some other way 
in Solr? Isn't it more of a one-time operation rather than something that 
should be in place for all merge operations? Or is it so cheap if not used that 
we should simply pre-configure it all the time?

> make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector 
> configurable in Solr
> --
>
> Key: SOLR-5730
> URL: https://issues.apache.org/jira/browse/SOLR-5730
> Project: Solr
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-5730-part1of2.patch, SOLR-5730-part1of2.patch, 
> SOLR-5730-part2of2.patch, SOLR-5730-part2of2.patch
>
>
> *Example configuration (solrconfig.xml) - corresponding to latest attached 
> patch:*
> {noformat}
> 
>   timestamp desc
> 
> {noformat}
> *Example configuration (solrconfig.xml) - corresponding to current 
> (work-in-progress master-solr-8621) SOLR-8621 efforts:*
> {noformat}
> -
> +
> +  TieredMergePolicyFactory
> +  timestamp desc
> +
> {noformat}
> *Example use (EarlyTerminatingSortingCollector):*
> {noformat}
> =timestamp+desc=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5730) make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr

2016-02-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128303#comment-15128303
 ] 

Jack Krupansky commented on SOLR-5730:
--

Sorry for arriving so late to the party here, but I've gotten lost in all the 
back and forth... is there going to be a simple and easy to use XML element to 
let the user simply enable sort merge and specify a field list, as opposed to 
having to manually construct an elaborate Lucene-level set of wrapped merge 
policies? I mean, sure, some experts will indeed wish to fully configure every 
detail of a Lucene merge policy, but for non-expert users who just want to 
assure that their index is pre-sorted to align with a query sorting, the syntax 
should be... simple. If the user does construct some elaborate wrapped MP, then 
some sort of parameter substitution would be needed, but if the user uses the 
default solrconfig which has no explicit MP, Solr should build that full, 
wrapped MP with just the sort field names substituted.

In short, I just wanted to know whether this was intended to be a very easy to 
use feature (supposed to be the trademark of Solr) or some super-elaborate 
expert-only feature that we would be forced to recommend that average users 
stay away from.

Personally, my preference would be to focus on introducing a first-class Solr 
feature of a "preferred document order", which is effectively a composite 
primary key in database nomenclature.

So, let's not forget that this is Solr we are talking about, not raw Lucene.

I'd like to know that [~yo...@apache.org] and [~hossman] are explicitly on 
board with what is bring proposed.

> make Lucene's SortingMergePolicy and EarlyTerminatingSortingCollector 
> configurable in Solr
> --
>
> Key: SOLR-5730
> URL: https://issues.apache.org/jira/browse/SOLR-5730
> Project: Solr
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: SOLR-5730-part1of2.patch, SOLR-5730-part1of2.patch, 
> SOLR-5730-part2of2.patch, SOLR-5730-part2of2.patch
>
>
> *Example configuration (solrconfig.xml):*
> {noformat}
> 
>   timestamp desc
> 
> {noformat}
> *Example use (EarlyTerminatingSortingCollector):*
> {noformat}
> =timestamp+desc=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127090#comment-15127090
 ] 

Jack Krupansky commented on SOLR-8621:
--

Will both the  and  elements will be deprecated as 
well (in addition to being allowed within the new )?

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-02-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127099#comment-15127099
 ] 

Jack Krupansky commented on SOLR-8621:
--

IIUC, the motivation here is to permit any number of merge policies to be 
configured with the goal of supporting wrapping of merge policies. Okay, but 
what tells Solr which MP is the outer/default MP?

> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *(proposed) roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8621) solrconfig.xml: deprecate/replace with

2016-01-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123590#comment-15123590
 ] 

Jack Krupansky commented on SOLR-8621:
--

Is this simply a rename of the XML element name (from  to 
) or is there some other user-visible feature enhancement 
or change?

Is Fix Version 6.0?


> solrconfig.xml: deprecate/replace  with 
> -
>
> Key: SOLR-8621
> URL: https://issues.apache.org/jira/browse/SOLR-8621
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>
> * end-user benefits:*
> * Lucene's UpgradeIndexMergePolicy can be configured in Solr
> * (with SOLR-5730) Lucene's SortingMergePolicy can be configured in Solr
> * customisability: arbitrary merge policies including wrapping/nested merge 
> policies can be created and configured
> *roadmap:*
> * solr 5.5 introduces  support
> * solr 5.5(\?) deprecates (but maintains)  support
> * solr 6.0(\?) removes  support 
> +work-in-progress git branch:+ 
> [master-solr-8621|https://github.com/apache/lucene-solr/tree/master-solr-8621]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6991) WordDelimiterFilter bug

2016-01-25 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115405#comment-15115405
 ] 

Jack Krupansky commented on LUCENE-6991:


Does seem odd and wrong.

I also notice that it is not generating terms for the single letters from the 
%-escapes: %3A, %2F.

It also seems odd that that long token of catenated word parts is not all of 
the word parts from the URL. It seems like a digit not preceded by a letter is 
causing a break, while a digit preceded by a letter prevents a break.

Since you are using the white space tokenizer, the WDF is only seeing each 
space-delimited term at a time. You might try your test with just the URL 
portion itself, both with and without the escaped quote, just to see if that 
affects anything.


> WordDelimiterFilter bug
> ---
>
> Key: LUCENE-6991
> URL: https://issues.apache.org/jira/browse/LUCENE-6991
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 4.10.4, 5.3.1
>Reporter: Pawel Rog
>Priority: Minor
>
> I was preparing analyzer which contains WordDelimiterFilter and I realized it 
> sometimes gives results different then expected.
> I prepared a short test which shows the problem. I haven't used Lucene tests 
> for this but this doesn't matter for showing the bug.
> {code}
> String urlIndexed = "144.214.37.14 - - [05/Jun/2013:08:39:27 +] \"GET 
> /products/key-phrase-extractor/ HTTP/1.1\"" +
> " 200 3437 http://www.google.com/url?sa=t=j==s&; +
> 
> "source=web=15=rja=0CEgQFjAEOAo=http%3A%2F%2Fwww.sematext.com%2Fproducts%2Fkey-"
>  +
> 
> "phrase-extractor%2F=TPOuUbaWM-OKiQfGxIGYDw=AFQjCNGwYAFYg_M3EZnp2eEWJzdvRrVPrg"
>  +
> "=oYitONI2EIZ0CQar7Ej8HA=bv.47380653,d.aGc\" \"Mozilla/5.0 
> (X11; Ubuntu; Linux i686; rv:20.0) " +
> "Gecko/20100101 Firefox/20.0\"";
> List tokens1 = new ArrayList();
> List tokens2 = new ArrayList();
> WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer();
> TokenStream tokenStream = analyzer.tokenStream("test", urlIndexed);
> tokenStream = new WordDelimiterFilter(tokenStream,
> WordDelimiterFilter.GENERATE_WORD_PARTS |
> WordDelimiterFilter.CATENATE_WORDS |
> WordDelimiterFilter.SPLIT_ON_CASE_CHANGE,
> null);
> CharTermAttribute charAttrib = 
> tokenStream.addAttribute(CharTermAttribute.class);
> tokenStream.reset();
> while(tokenStream.incrementToken()) {
>   tokens1.add(charAttrib.toString());
>   System.out.println(charAttrib.toString());
> }
> tokenStream.end();
> tokenStream.close();
> urlIndexed = "144.214.37.14 - - [05/Jun/2013:08:39:27 +] \"GET 
> /products/key-phrase-extractor/ HTTP/1.1\"" +
> " 200 3437 \"http://www.google.com/url?sa=t=j==s&; +
> 
> "source=web=15=rja=0CEgQFjAEOAo=http%3A%2F%2Fwww.sematext.com%2Fproducts%2Fkey-"
>  +
> 
> "phrase-extractor%2F=TPOuUbaWM-OKiQfGxIGYDw=AFQjCNGwYAFYg_M3EZnp2eEWJzdvRrVPrg"
>  +
> "=oYitONI2EIZ0CQar7Ej8HA=bv.47380653,d.aGc\" \"Mozilla/5.0 (X11; 
> Ubuntu; Linux i686; rv:20.0) " +
> "Gecko/20100101 Firefox/20.0\"";
> System.out.println("\n\n\n\n");
> tokenStream = analyzer.tokenStream("test", urlIndexed);
> tokenStream = new WordDelimiterFilter(tokenStream,
> WordDelimiterFilter.GENERATE_WORD_PARTS |
> WordDelimiterFilter.CATENATE_WORDS |
> WordDelimiterFilter.SPLIT_ON_CASE_CHANGE,
> null);
> charAttrib = tokenStream.addAttribute(CharTermAttribute.class);
> tokenStream.reset();
> while(tokenStream.incrementToken()) {
>   tokens2.add(charAttrib.toString());
>   System.out.println(charAttrib.toString());
> }
> tokenStream.end();
> tokenStream.close();
> assertEquals(Joiner.on(",").join(tokens1), Joiner.on(",").join(tokens2));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8029) Modernize and standardize Solr APIs

2016-01-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109850#comment-15109850
 ] 

Jack Krupansky commented on SOLR-8029:
--

Is this likely to be in 6.0 or 6.1?

+1 for 6.0, even if not absolutely 100% completely done. At least 6.0 can be 
billed as having a modern API, even if there might be some additional work 
required to get it fully rock solid/fully tested in 6.1.

> Modernize and standardize Solr APIs
> ---
>
> Key: SOLR-8029
> URL: https://issues.apache.org/jira/browse/SOLR-8029
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: Trunk
>Reporter: Noble Paul
>Assignee: Noble Paul
>  Labels: API, EaseOfUse
> Fix For: Trunk
>
> Attachments: SOLR-8029.patch, SOLR-8029.patch, SOLR-8029.patch, 
> SOLR-8029.patch
>
>
> Solr APIs have organically evolved and they are sometimes inconsistent with 
> each other or not in sync with the widely followed conventions of HTTP 
> protocol. Trying to make incremental changes to make them modern is like 
> applying band-aid. So, we have done a complete rethink of what the APIs 
> should be. The most notable aspects of the API are as follows:
> The new set of APIs will be placed under a new path {{/solr2}}. The legacy 
> APIs will continue to work under the {{/solr}} path as they used to and they 
> will be eventually deprecated.
> There are 4 types of requests in the new API 
> * {{/v2//*}} : Hit a collection directly or manage 
> collections/shards/replicas 
> * {{/v2//*}} : Hit a core directly or manage cores 
> * {{/v2/cluster/*}} : Operations on cluster not pertaining to any collection 
> or core. e.g: security, overseer ops etc
> This will be released as part of a major release. Check the link given below 
> for the full specification.  Your comments are welcome
> [Solr API version 2 Specification | http://bit.ly/1JYsBMQ]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3141) Deprecate OPTIMIZE command in Solr

2016-01-06 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086474#comment-15086474
 ] 

Jack Krupansky commented on SOLR-3141:
--

bq. optimize() is rarely necessary anymore

Well... I used to say that same thing, because I was under the impression that 
the common merge policies would automatically optimize segments over time, but 
over the past year there have been several email threads with users who had 
heavy update/delete usage patterns where the index size appeared to remain 
bloated due to deleted/updated documents.

So... we need a revised story... and doc.

What exactly should we be telling people who update/delete lots of docs 
frequently and still find that the index is bloated?

Is there maybe some underlying bug or tuning of the delete/merge policy needed?

Or... maybe people still need an explicit "force merge" command to effectively 
say "I just finished a large batch of document updates/deletes but I'm done 
now, so merge away."

Personally, I would like to see a "start batch" mode, which signals that the 
user intends to make a lot of changes and Solr/Lucene should make no attempt to 
optimize or clean things up or update caches until the user signals "end of 
batch", at which time any appropriate merging or optimization or cache 
refreshing can occur. Not everybody will want to do this, but it still seems to 
be a semi-common use of Solr.


> Deprecate OPTIMIZE command in Solr
> --
>
> Key: SOLR-3141
> URL: https://issues.apache.org/jira/browse/SOLR-3141
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 3.5
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: force, optimize
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-3141.patch, SOLR-3141.patch, SOLR-3141.patch
>
>
> Background: LUCENE-3454 renames optimize() as forceMerge(). Please read that 
> issue first.
> Now that optimize() is rarely necessary anymore, and renamed in Lucene APIs, 
> what should be done with Solr's ancient optimize command?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038585#comment-15038585
 ] 

Jack Krupansky commented on SOLR-2649:
--

The behavior of mm only applying to the top-level query is not documented at 
present.

Even if having mm apply only to the top-level query is intended, it seems a 
separate matter as to how the q.op parameter applies. I've never seen any doc 
or discussion that suggested that the default operator should only apply to the 
top-level query. I haven't looked at the code lately, but it used to be that 
q.op was just used to set the internal mm value and then completely ignored in 
the sense that it was not passed down to the Lucene query parser to use as the 
Lucene default operator. IOW, the Lucene setDefaultOperator method was never 
called.

See:
https://lucene.apache.org/core/5_3_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setDefaultOperator(org.apache.lucene.queryparser.classic.QueryParser.Operator)


> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988338#comment-14988338
 ] 

Jack Krupansky commented on LUCENE-6874:


Certainly Solr can update its example schemas to use whatever alternative 
tokenizer or option is decided on so that Solr users, many of whom are not Java 
developers, will no longer fall into this NBSP trap, but... that still feels 
like a less than desirable resolution.

[~thetaphi], could you elaborate more specifically on the existing use case 
that you are trying to preserve? I mean, like in terms of a real-world example. 
Where do some of your NBSPs actually live in the wild?

It seems to me that the vast majority of normal users would not be negatively 
impacted by having "white space" be defined using the Unicode model. I never 
objected to using the Java model, but that's because I had overlooked this 
nuance of NBSP. My concern for Solr users is that NBSP occurs somewhat commonly 
in HTML web pages - as a formatting technique more than an attempt at 
influencing tokenization.


> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988646#comment-14988646
 ] 

Jack Krupansky commented on LUCENE-6874:


bq. Because WST and WDF should really only be used as a last resort.

Absolutely agreed. From a Solr user perspective we really need a much simpler 
model for semi-standard tokens out of the box without the user having to 
scratch their heads and resorting to WST in the first (last) place. LOL - maybe 
if we could eliminate this need to resort to WST, we wouldn't have to fret as 
much about WST.

bq.  I generally suggest to my users to use ClassicTokenizer

Personally, I've always refrained from recommending CT since I thought ST was 
supposed to replace it and that the email and URL support was considered an 
excess not worth keeping. I've considered CT as if it were deprecated (which it 
is not.) And, I never see anybody else recommending it on the user list. And, 
the fact that it can't handle slashes for product number is a deal killer. I'm 
not sure that I would argue in favor of resurrecting CT as a first-class 
recommendation, especially since it can't handle non-European languages, but...

That said, I do think it is worth separately (from this Jira) considering a 
fresh, new tokenizer that starts with the goodness of ST and adds in an 
approximation of the reasons that people resort to WST. Whether that can be an 
option on ST or has to be a separate tokenizer would need to be debated. I'd 
prefer an option on ST, either to simply allow embedded special characters or 
to specify a list or regex of special character to be allowed or excluded.

People would still need to combine NewT with WDF, but at least the tokenization 
would be more explicit.

Personally I would prefer to see an option for whether to retain or strip 
external punctuation vs. embedded special characters. Trailing periods and 
commas and columns and enclosing parentheses are just the kinds of things we 
had to resort to WDF for when using WST to retain embedded special characters.

And if people really want to be ambitious, a totally new tokenizer that 
subsumed the good parts of WDF would make a lot of lives of Solr users much 
easier.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988615#comment-14988615
 ] 

Jack Krupansky commented on LUCENE-6874:


Tika is the other (main?) approach to ingesting text from HTML web pages. I 
haven't checked exactly what it does on .

Maybe [~dsmiley] could elaborate on which use case he was encountering that 
inspired this Jira issue.

> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
> Attachments: LUCENE-6874-jflex.patch, LUCENE-6874.patch
>
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985540#comment-14985540
 ] 

Jack Krupansky commented on LUCENE-6874:


+1 for using the Unicode definition of white space rather than the (odd) Java 
definition. From a Solr user perspective, the fact that Java is used for 
implementation under the hood should be irrelevant. That said, the Javadoc for 
WhitespaceTokenizer#isTokenChar does explicitly refer to isWhitespace already.

The term "non-breaking white space" explicitly refers to line breaking and has 
no mention of tokens in either Unicode or traditional casual usage.

>From a Solr user perspective, there is like zero value to having NBSP from 
>HTML web pages being treated as if it were not traditional white space.

>From a Solr user perspective, the primary use of whitespace tokenizer is to 
>avoid the fact that standard tokenizer breaks on various special characters 
>such as occur in product numbers.

In short, the benefits to Solr users for NBSP being tokenized as white space 
seem to outweigh any minor use cases for treating it as non-white space. A 
compatibility mode can be provided if those minor use cases are considered 
truly worthwhile.


> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6874) WhitespaceTokenizer should tokenize on NBSP

2015-11-02 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985540#comment-14985540
 ] 

Jack Krupansky edited comment on LUCENE-6874 at 11/2/15 5:34 PM:
-

+1 for using the Unicode definition of white space rather than the (odd) Java 
definition. From a Solr user perspective, the fact that Java is used for 
implementation under the hood should be irrelevant. That said, the Javadoc for 
WhitespaceTokenizer#isTokenChar does explicitly refer to isWhitespace already.

The term "non-breaking white space" explicitly refers to line breaking and has 
no mention of tokens in either Unicode or traditional casual usage.

>From a Solr user perspective, there is like zero value to having NBSP from 
>HTML web pages being treated as if it were not traditional white space.

>From a Solr user perspective, the primary use of whitespace tokenizer is to 
>avoid the fact that standard tokenizer breaks on various special characters 
>such as occur in product numbers.

One of the ongoing problems in the Solr community is the sheer amount of time 
spent explaining nuances and gotchas, even if they do happen to be documented 
somewhere in the fine print - no sane user reads the fine print anyway. No Solr 
user actually uses WhitespaceTokenizer directly - they reference 
WhitespaceTokenizerFactory, and then having to drop down to Lucene and Java for 
doc is way too much to ask a typical Solr user. Our collective goal should be 
to minimize nuances and gotchas (IMHO.)

In short, the benefits to Solr users for NBSP being tokenized as white space 
seem to outweigh any minor use cases for treating it as non-white space. A 
compatibility mode can be provided if those minor use cases are considered 
truly worthwhile.

Ugh... there are plenty of other places in doc for other tokenizers and filters 
that refer to "whitespace" and need to address this same issue, either to treat 
NBSP as white space or doc the nuance/gotcha much more thoroughly and 
effectively.

OTOH... an alternative view... having so many un/poorly-documented nuances and 
gotchas is money in the pockets of consultants and a great argument in favor of 
Solr users maximizing the employment of Solr consultants.


was (Author: jkrupan):
+1 for using the Unicode definition of white space rather than the (odd) Java 
definition. From a Solr user perspective, the fact that Java is used for 
implementation under the hood should be irrelevant. That said, the Javadoc for 
WhitespaceTokenizer#isTokenChar does explicitly refer to isWhitespace already.

The term "non-breaking white space" explicitly refers to line breaking and has 
no mention of tokens in either Unicode or traditional casual usage.

>From a Solr user perspective, there is like zero value to having NBSP from 
>HTML web pages being treated as if it were not traditional white space.

>From a Solr user perspective, the primary use of whitespace tokenizer is to 
>avoid the fact that standard tokenizer breaks on various special characters 
>such as occur in product numbers.

In short, the benefits to Solr users for NBSP being tokenized as white space 
seem to outweigh any minor use cases for treating it as non-white space. A 
compatibility mode can be provided if those minor use cases are considered 
truly worthwhile.


> WhitespaceTokenizer should tokenize on NBSP
> ---
>
> Key: LUCENE-6874
> URL: https://issues.apache.org/jira/browse/LUCENE-6874
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: David Smiley
>Priority: Minor
>
> WhitespaceTokenizer uses [Character.isWhitespace 
> |http://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-]
>  to decide what is whitespace.  Here's a pertinent excerpt:
> bq. It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or 
> PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', 
> '\u2007', '\u202F')
> Perhaps Character.isWhitespace should have been called 
> isLineBreakableWhitespace?
> I think WhitespaceTokenizer should tokenize on this.  I am aware it's easy to 
> work around but why leave this trap in by default?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6842) No way to limit the fields cached in memory and leads to OOM when there are thousand of fields (thousands)

2015-10-19 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963420#comment-14963420
 ] 

Jack Krupansky commented on LUCENE-6842:


Generally, Lucene has few hard limits, but the general guidance is that 
ultimately you will be limited by available system resources such as RAM and 
CPU. There may not be any hard limit to the number of fields, but that doesn't 
mean that you can safely assume that a large number of fields will always work 
for a limited amount of RAM and CPU. Exactly how much RAM and CPU you need will 
depend on your specific application, that you yourself will have to test for - 
known as a proof of concept.

Generally, people have resource problems based on the number of documents 
rather than the number of fields for each document. You haven't detailed how 
many documents you are indexing and how many of these fields are actually 
present in an average document. Who knows, maybe the number of fields is not 
the problem per se and it is the number of documents that is the cause of the 
resource issue, or a combination of the two.

That said, I will defer to the more senior Lucene committers here, but 
personally I would suggest that "hundreds" or "low thousands" is a more 
practical recommended best practice upper limit to total number of fields in a 
Lucene index. Generally, "dozens" or at most "low hundreds" would be most 
recommended and the safest assumption. Sure, maybe 10,000 fields might actually 
work, but then number of documents and operations and query complexity will 
also come into play.

All of that said, I'm sure we are all intently curious why exactly you feel 
that you need so many fields.

> No way to limit the fields cached in memory and leads to OOM when there are 
> thousand of fields (thousands)
> --
>
> Key: LUCENE-6842
> URL: https://issues.apache.org/jira/browse/LUCENE-6842
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.6.1
> Environment: Linux, openjdk 1.6.x
>Reporter: Bala Kolla
> Attachments: HistogramOfHeapUsage.png
>
>
> I am opening this defect to get some guidance on how to handle a case of 
> server running out of memory and it seems like it's something to do how we 
> index. But want to know if there is anyway to reduce the impact of this on 
> memory usage before we look into the way of reducing the number of fields. 
> Basically we have many thousands of fields being indexed and it's causing a 
> large amount of memory being used (25GB) and eventually leading to 
> application to hang and force us to restart every few minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8160) Terms query parser should optionally do query analysis

2015-10-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955080#comment-14955080
 ] 

Jack Krupansky commented on SOLR-8160:
--

The doc is a bit misleading, for both the Term and Terms query parsers:

bq. documents matching any of the specified values. This can be useful for 
generating filter queries from the external human readable terms returned by 
the faceting or terms components

It should be explicit that these are indexed, already analyzed term values, not 
"external human readable terms" as the doc indicates.

> Terms query parser should optionally do query analysis 
> ---
>
> Key: SOLR-8160
> URL: https://issues.apache.org/jira/browse/SOLR-8160
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers, search
>Affects Versions: 5.3
>Reporter: Devansh Dhutia
>
> Field setup as
> {code}
>  multiValued="false" required="false" />
>
>   
>  
>  
>   
>   
>  
>  
>   
>
> {code}
> Value sent to cs field for indexing include: AA, BB
> Following is observed
> {code}={!terms f=cs}AA,BB{code} yields 0 results
> {code}={!terms f=cs}aa,bb{code} yields 2 results
> {code}=cs:(AA BB){code} yields 2 results
> {code}=cs:(aa bb){code} yields 2 results
> The first variant above should behave like the other 3 & obey query time 
> analysis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6301) Deprecate Filter

2015-10-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953084#comment-14953084
 ] 

Jack Krupansky edited comment on LUCENE-6301 at 10/12/15 12:58 PM:
---

I know this change has been in progress for awhile, but it just kind of sunk 
for me finally and now I'm wondering what the impact on Solr will be. I mean, 
wasn't Filter supposed to be a big performance win over a Query since it 
eliminates the performance impact of scoring? If that was the case, is Lucene 
proving some alternate method of achieving a similar performance improvement? I 
think it is, but... not stated quite so explicitly. An example of the expected 
migration would help a lot. I think the example should be in the Lucene Javadoc 
- "To filter documents without the performance overhead of scoring, use the 
following technique..." If I understand properly, one would simply wrap the 
query in a BooleanQuery with a single clause that uses 
BooleanQuery.Clause.FILTER and that would have exactly the same effect (and 
performance gain) as the old Filter class. Is that statement 100% accurate? If 
so, it would be good to make it explicit here in Jira, in the deprecation 
comment in the the Filter class, and in BooleanQuery as well. Thanks!


was (Author: jkrupan):
I know this change has been in progress for awhile, but it just kind of sunk 
for me finally in and now I'm wondering what the impact on Solr will be. I 
mean, wasn't Filter supposed to be a big performance win over a Query since it 
eliminates the performance impact of scoring? If that was the case, is Lucene 
proving some alternate method of achieving a similar performance improvement? I 
think it is, but... not stated quite so explicitly. An example of the expected 
migration would help a lot. I think the example should be in the Lucene Javadoc 
- "To filter documents without the performance overhead of scoring, use the 
following technique..." If I understand properly, one would simply wrap the 
query in a BooleanQuery with a single clause that uses 
BooleanQuery.Clause.FILTER and that would have exactly the same effect (and 
performance gain) as the old Filter class. Is that statement 100% accurate? If 
so, it would be good to make it explicit here in Jira, in the deprecation 
comment in the the Filter class, and in BooleanQuery as well. Thanks!

> Deprecate Filter
> 
>
> Key: LUCENE-6301
> URL: https://issues.apache.org/jira/browse/LUCENE-6301
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.2, Trunk
>
> Attachments: LUCENE-6301.patch, LUCENE-6301.patch
>
>
> It will still take time to completely remove Filter, but I think we should 
> start deprecating it now to state our intention and encourage users to move 
> to queries as soon as possible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6301) Deprecate Filter

2015-10-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953084#comment-14953084
 ] 

Jack Krupansky commented on LUCENE-6301:


I know this change has been in progress for awhile, but it just kind of sunk 
for me finally in and now I'm wondering what the impact on Solr will be. I 
mean, wasn't Filter supposed to be a big performance win over a Query since it 
eliminates the performance impact of scoring? If that was the case, is Lucene 
proving some alternate method of achieving a similar performance improvement? I 
think it is, but... not stated quite so explicitly. An example of the expected 
migration would help a lot. I think the example should be in the Lucene Javadoc 
- "To filter documents without the performance overhead of scoring, use the 
following technique..." If I understand properly, one would simply wrap the 
query in a BooleanQuery with a single clause that uses 
BooleanQuery.Clause.FILTER and that would have exactly the same effect (and 
performance gain) as the old Filter class. Is that statement 100% accurate? If 
so, it would be good to make it explicit here in Jira, in the deprecation 
comment in the the Filter class, and in BooleanQuery as well. Thanks!

> Deprecate Filter
> 
>
> Key: LUCENE-6301
> URL: https://issues.apache.org/jira/browse/LUCENE-6301
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.2, Trunk
>
> Attachments: LUCENE-6301.patch, LUCENE-6301.patch
>
>
> It will still take time to completely remove Filter, but I think we should 
> start deprecating it now to state our intention and encourage users to move 
> to queries as soon as possible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6301) Deprecate Filter

2015-10-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953174#comment-14953174
 ] 

Jack Krupansky commented on LUCENE-6301:


Thanks! LGTM. Now let's see if the Solr guys pick up on this.

> Deprecate Filter
> 
>
> Key: LUCENE-6301
> URL: https://issues.apache.org/jira/browse/LUCENE-6301
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.2, Trunk
>
> Attachments: LUCENE-6301.patch, LUCENE-6301.patch
>
>
> It will still take time to completely remove Filter, but I think we should 
> start deprecating it now to state our intention and encourage users to move 
> to queries as soon as possible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6305) BooleanQuery.equals should ignore clause order

2015-10-09 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950255#comment-14950255
 ] 

Jack Krupansky commented on LUCENE-6305:


No objection, but it would be good for the javadoc for BQ and BQ.Builder to 
explicitly state the contract that the order that clauses are added will not 
impact either the results of the query, their order, or the performance of the 
execution of the query - assuming those facts are all true.

> BooleanQuery.equals should ignore clause order
> --
>
> Key: LUCENE-6305
> URL: https://issues.apache.org/jira/browse/LUCENE-6305
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6305.patch, LUCENE-6305.patch
>
>
> BooleanQuery.equals is sensitive to the order in which clauses have been 
> added. So for instance "+A +B" would be considered different from "+B +A" 
> although it generates the same matches and scores.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

2015-10-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942686#comment-14942686
 ] 

Jack Krupansky commented on LUCENE-6664:


Hey [~mikemccand], don't get discouraged, this was a very valuable exercise. I 
am a solid proponent of getting multi-term synonyms working in a full and 
robust manner, but I recognize that they just don't fit in cleanly with the 
existing flat token stream architecture. That's life. In any case, don't give 
up on this long-term effort.

Maybe the best thing for now is to retain the traditional flat synonym filter 
for compatibility, fully add the new SynonymGraphFilter, and then add the 
optional ability to enable graph support in the main Lucene query parser. 
(Alas, Solr, has its own fork of the Lucene query parser.) Support within 
phrase queries is the tricky part.

It would also be good to address the issue with non-phrase terms being analyzed 
separately - the query parser should recognize adjacent terms without operators 
are analyze as a group so that multi-token synonyms can be recognized.

> Replace SynonymFilter with SynonymGraphFilter
> -
>
> Key: LUCENE-6664
> URL: https://issues.apache.org/jira/browse/LUCENE-6664
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Attachments: LUCENE-6664.patch, LUCENE-6664.patch, LUCENE-6664.patch, 
> LUCENE-6664.patch, usa.png, usa_flat.png
>
>
> Spinoff from LUCENE-6582.
> I created a new SynonymGraphFilter (to replace the current buggy
> SynonymFilter), that produces correct graphs (does no "graph
> flattening" itself).  I think this makes it simpler.
> This means you must add the FlattenGraphFilter yourself, if you are
> applying synonyms during indexing.
> Index-time syn expansion is a necessarily "lossy" graph transformation
> when multi-token (input or output) synonyms are applied, because the
> index does not store {{posLength}}, so there will always be phrase
> queries that should match but do not, and then phrase queries that
> should not match but do.
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> goes into detail about this.
> However, with this new SynonymGraphFilter, if instead you do synonym
> expansion at query time (and don't do the flattening), and you use
> TermAutomatonQuery (future: somehow integrated into a query parser),
> or maybe just "enumerate all paths and make union of PhraseQuery", you
> should get 100% correct matches (not sure about "proper" scoring
> though...).
> This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6821) TermQuery's constructors should clone the incoming term

2015-10-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942310#comment-14942310
 ] 

Jack Krupansky commented on LUCENE-6821:


Won't this change have the prospect of increasing the amount of GC due to all 
these extra objects?

Maybe might it be advisable to have an alternative constructor that doesn't 
clone so that users like Solr can exploit the fact that their code won't be 
making any further use of the input term?


> TermQuery's constructors should clone the incoming term
> ---
>
> Key: LUCENE-6821
> URL: https://issues.apache.org/jira/browse/LUCENE-6821
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6821.patch
>
>
> This is a follow-up of LUCENE-6435: the bug stems from the fact that you can 
> build term queries out of shared BytesRef objects (such as the ones returned 
> by TermsEnum.next), which is a bit trappy. If TermQuery's constructors would 
> clone the incoming term, we wouldn't have this trap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7249) Solr engine misses null-values in OR null part for eDisMax parser

2015-03-16 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363383#comment-14363383
 ] 

Jack Krupansky commented on SOLR-7249:
--

It's best to pursue this type of issue on the Solr user list first.

Have you added debugQuery=true to your request and looked at the parsed_query 
in the response? That shows how your query is actually interpreted.

You wrote AND -area, but that probably should be NOT area or simply -area.

 Solr engine misses null-values in OR null part for eDisMax parser
 ---

 Key: SOLR-7249
 URL: https://issues.apache.org/jira/browse/SOLR-7249
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.10.3
 Environment: Windows 7
 CentOS 6.6
Reporter: Arsen Li

 Solr engine misses null-values in OR null part for eDisMax parser
 For example, I have following query:
 ((*:* AND -area:[* TO *]) OR area:[100 TO 300]) AND objectId:40105451
 full query path visible in Solr Admin panel is
 select?q=((*%3A*+AND+-area%3A%5B*+TO+*%5D)+OR+area%3A%5B100+TO+300%5D)+AND+objectId%3A40105451wt=jsonindent=true
 so, it should return record if area between 100 and 300 or area not declared.
 it works ok for default parser, but when I set edismax checkbox checked in 
 Solr admin panel - it returns nothing (area for objectId=40105451 is null). 
 Request path is following
 select?q=((*%3A*+AND+-area%3A%5B*+TO+*%5D)+OR+area%3A%5B100+TO+300%5D)+AND+objectId%3A40105451wt=jsonindent=truedefType=edismaxstopwords=truelowercaseOperators=true
 However, when I move query from q field to q.alt field - it works ok, 
 query is
 select?wt=jsonindent=truedefType=edismaxq.alt=((*%3A*+AND+-area%3A%5B*+TO+*%5D)+OR+area%3A%5B100+TO+300%5D)+AND+objectId%3A40105451stopwords=truelowercaseOperators=true
 note, asterisks are not saved by editor, refer to 
 http://stackoverflow.com/questions/29059460/solr-misses-or-null-query-when-parsing-by-edismax-parser
 if needed more accurate syntax



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5507) Admin UI - Refactoring using AngularJS

2014-12-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259657#comment-14259657
 ] 

Jack Krupansky commented on SOLR-5507:
--

bq.  All I ask, though, is that you forgive the occasional burst of ebullient 
enthusiasm!

No need for it to be forgiven... all ebullient enthusiasm is always welcome and 
encouraged.


 Admin UI - Refactoring using AngularJS
 --

 Key: SOLR-5507
 URL: https://issues.apache.org/jira/browse/SOLR-5507
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
 Attachments: SOLR-5507.patch


 On the LSR in Dublin, i've talked again to [~upayavira] and this time we 
 talked about Refactoring the existing UI - using AngularJS: providing (more, 
 internal) structure and what not ;
 He already started working on the Refactoring, so this is more a 'tracking' 
 issue about the progress he/we do there.
 Will extend this issue with a bit more context  additional information, w/ 
 thoughts about the possible integration in the existing UI and more (:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6892) Make it possible to define update request processors as toplevel components

2014-12-28 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259665#comment-14259665
]

Jack Krupansky commented on SOLR-6892:
--

Thanks for the description updates. Comments...

1. We need to be explicit about how and when the hard-wired processors are
invoked. In particular the run update processor. The log update processor
is somewhat special in that it is not mandatory, but a lot of people are not
explicitly aware of it, so if they leave it out, they will be wondering why
they don't get logging of updates.

2. I suggest three parameters: pre.processors to specify processors before
the default chain, post.processors to specify processors after the default
chain (before or after run update and log update??), and processors to
specify a processor list to completely replace the default chain.

3. Make log update be automatically added at the end unless a nolog
processor is specified.

4. Make run update be automatically added at the end unless a norun
processor is specified.

5. Discuss processor vs. processors - I prefer the latter since it is
explicit, but maybe allow both since the singular/plural can be confusing.

6. Consider supporting both a single parameter with a csv list as well as
multiple parameters each with a single value. I prefer having the choice.
Having a separate parameter for each processor can be more explicit sometimes.

7. Consider a single-processor parameter with the option to specify the
parameters for that processor. That would make it possible to invoke the
various field mutating update processors, which would be especially cool and
convenient.

Make it possible to define update request processors as toplevel components

Key: SOLR-6892
URL: https://issues.apache.org/jira/browse/SOLR-6892
Project: Solr
Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul

The current update processor chain is rather cumbersome and we should be able
to use the updateprocessors without a chain.
The scope of this ticket is
* A new tag updateProcessor becomes a toplevel tag and it will be
equivalent to the {{processor}} tag inside
{{updateRequestProcessorChain}} . The only difference is that it should
require a {{name}} attribute. The {{updateProcessorChain}} tag will
continue to exist and it should be possible to define processor inside as
well . It should also be possible to reference a named URP in a chain.
* Any update request will be able to pass a param {{processor=a,b,c}} ,
where a,b,c are names of update processors. A just in time chain will be
created with those URPs
* Some in built update processors (wherever possible) will be predefined with
standard names and can be directly used in requests
* What happens when I say processor=a,b,c in a request? It will execute the
default chain after the just-in-time chain {{a-b-c}} .
* How to execute a different chain other than the default chain? the same old
mechanism of update.chain=x means that the chain {{x}} will be applied after
{{a,b,c}}
* How to avoid the default processor chain from being executed ? There will
be an implicit URP called {{STOP}} . send your request as
processor=a,b,c,STOP.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5507) Admin UI - Refactoring using AngularJS

2014-12-27 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259455#comment-14259455
]

Jack Krupansky commented on SOLR-5507:
--

This issue has gotten confused. Please clarify the summary and description to
inform readers whether the intention is:

1. Simply refactor the implementation to make the code more maintainable and
extensible.
2. Add features to the existing UI to cater to advanced users.
3. Revamp the UI itself to cater to new and novice users.
4. Replace the existing UI or supplement it with two UI's, one for novices
(guides them through processes) and one for experts (access more features more
easily.)

IOW, what are the requirements here?

I'm not opposed to any of the above, but the original issue summary and
description seemed more focused on the internal implementation rather than the
externals of a new UI.

Admin UI - Refactoring using AngularJS
--

Key: SOLR-5507
URL: https://issues.apache.org/jira/browse/SOLR-5507
Project: Solr
Issue Type: Improvement
Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor
Attachments: SOLR-5507.patch

On the LSR in Dublin, i've talked again to [~upayavira] and this time we
talked about Refactoring the existing UI - using AngularJS: providing (more,
internal) structure and what not ;
He already started working on the Refactoring, so this is more a 'tracking'
issue about the progress he/we do there.
Will extend this issue with a bit more context additional information, w/
thoughts about the possible integration in the existing UI and more (:

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6892) Make update processors toplevel components

2014-12-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259566#comment-14259566
 ] 

Jack Krupansky commented on SOLR-6892:
--

Issue type should be Improvement, not Bug, right?

 Make update processors toplevel components 
 ---

 Key: SOLR-6892
 URL: https://issues.apache.org/jira/browse/SOLR-6892
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul

 The current update processor chain is rather cumbersome and we should be able 
 to use the updateprocessors without a chain.
 The scope of this ticket is 
 * updateProcessor tag becomes a toplevel tag and it will be equivalent to 
 the processor tag inside updateRequestProcessorChain . The only 
 difference is that it should require a {{name}} attribute
 * Any update request will be able  to pass a param {{processor=a,b,c}} , 
 where a,b,c are names of update processors. A just in time chain will be 
 created with those update processors
 * Some in built update processors (wherever possible) will be predefined with 
 standard names and can be directly used in requests 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6892) Make update processors toplevel components

2014-12-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259567#comment-14259567
 ] 

Jack Krupansky commented on SOLR-6892:
--

It might be instructive to look at how the search handler deals with search 
components and possibly consider rationalizing the two handlers so that there 
is a little more commonality in how lists of components/processors are 
specified. For example, consider a first, last, and full processor list. 
IOW, be able to specify a list of processors to apply before the 
solrconfig-specified list, after, or to completely replace the 
solrconfig-specified list of processors.

 Make update processors toplevel components 
 ---

 Key: SOLR-6892
 URL: https://issues.apache.org/jira/browse/SOLR-6892
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul

 The current update processor chain is rather cumbersome and we should be able 
 to use the updateprocessors without a chain.
 The scope of this ticket is 
 * updateProcessor tag becomes a toplevel tag and it will be equivalent to 
 the processor tag inside updateRequestProcessorChain . The only 
 difference is that it should require a {{name}} attribute
 * Any update request will be able  to pass a param {{processor=a,b,c}} , 
 where a,b,c are names of update processors. A just in time chain will be 
 created with those update processors
 * Some in built update processors (wherever possible) will be predefined with 
 standard names and can be directly used in requests 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6852) SimplePostTool should no longer default to collection1

2014-12-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247435#comment-14247435
 ] 

Jack Krupansky commented on SOLR-6852:
--

Is this really for 5.0 only and not trunk/6.0 as well?

 SimplePostTool should no longer default to collection1
 --

 Key: SOLR-6852
 URL: https://issues.apache.org/jira/browse/SOLR-6852
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0

 Attachments: SOLR-6852.patch, SOLR-6852.patch


 Solr no longer would be bootstrapped with collection1 and so it no longer 
 makes sense for the SimplePostTool to default to collection1 either.
 Without an explicit collection/core/url value, the call should just fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4792) stop shipping a war in 5.0

2014-11-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229141#comment-14229141
 ] 

Jack Krupansky commented on SOLR-4792:
--

As I just noted on the Solr user list, it would be helpful if people could 
provide a reference to some existing server products that they are attempting 
to model Solr 5.0 after - or provide a rationale as to why no existing server 
products provide a model worthy of adopting for Solr. I mean, are we trying too 
reinvent the wheel here, or what?!

So, which existing Apache server product is Solr 5.0 most closely trying to 
emulate in terms of overall operation as a server and web service?

I'd request that the description of this Jira be redone to provide a more clear 
description of what Solr is expected to look like - from a Solr user 
perspective - once the infamous war is no longer shipped. I mean, the phrase 
we are free to do anything we want may mean something to some of the more 
elite devs here, but show a little sympathy to the rest of the Solr community!



 stop shipping a war in 5.0
 --

 Key: SOLR-4792
 URL: https://issues.apache.org/jira/browse/SOLR-4792
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Mark Miller
 Fix For: 5.0, Trunk

 Attachments: SOLR-4792.patch


 see the vote on the developer list.
 This is the first step: if we stop shipping a war then we are free to do 
 anything we want. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6079) PatternReplaceCharFilter crashes JVM with OutOfMemoryError

2014-11-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227671#comment-14227671
 ] 

Jack Krupansky commented on LUCENE-6079:


But the pattern might in fact need the entire input, such as to match the end 
of the input with $.

Still, it would be nice to have an optional chunked mode for cases such as 
this (assuming that pattern doesn't end with $), such as input which is the 
full text of a multi-MB PDF file. I would suggest that such as mode be the 
default, with a reasonable chunk size such as 100K. There should also be an 
overlap size so that when reading the next chunk it would start matching with 
an overlap from the end of the previous chunk, and not perform a match that 
extends into the overlap area at the end of a chunk unless it is the last 
chunk, so that matches could be made across chunk boundaries.

Actually, it turns out that there was such a feature, with a maxBlockChars 
parameter, but it was deprecated long ago - no mention in CHANGES.TXT. But... 
it's still supported in the factory code, with only a TODO comment suggesting 
that a warning would be appropriate, but the actual Lucene filter constructor 
simply ignores this parameter.



 PatternReplaceCharFilter crashes JVM with OutOfMemoryError
 --

 Key: LUCENE-6079
 URL: https://issues.apache.org/jira/browse/LUCENE-6079
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.10.2
 Environment: Microsoft Windows, x86_64, 32 GB main memory
Reporter: Alexander Veit
Priority: Critical

 PatternReplaceCharFilter fills memory with input data until an 
 OutOfMemoryError is thrown.
 java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:3332)
   at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
   at 
 java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
   at 
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:569)
   at java.lang.StringBuilder.append(StringBuilder.java:190)
   at 
 org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.fill(PatternReplaceCharFilter.java:84)
   at 
 org.apache.lucene.analysis.pattern.PatternReplaceCharFilter.read(PatternReplaceCharFilter.java:74)
 ...
 PatternReplaceCharFilter should read data chunk-wise and pass the transformed 
 output chunk-wise to the caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4587) Implement Saved Searches a la ElasticSearch Percolator

2014-11-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203509#comment-14203509
 ] 

Jack Krupansky commented on SOLR-4587:
--

bq. as long as the API remains the same

-1

Just go with a contrib module ASAP, like even today's Luwak in 5.0, and let 
people get experience with an experimental API, and then debate what the 
final, non-contrib API should be, or maybe there might be real benefit with 
multiple modules with somewhat distinct APIs for different use cases. No need 
to presume that a one-size-fits-all API is necessarily best here.



 Implement Saved Searches a la ElasticSearch Percolator
 --

 Key: SOLR-4587
 URL: https://issues.apache.org/jira/browse/SOLR-4587
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other, SolrCloud
Reporter: Otis Gospodnetic
 Fix For: Trunk


 Use Lucene MemoryIndex for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses

2014-11-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196260#comment-14196260
 ] 

Jack Krupansky commented on SOLR-4586:
--

[~yo...@apache.org], I think you just stumbled upon the single most compelling 
reason for releasing and attracting people to Solr 5.0 - No more Max Boolean 
Clauses!

 Increase default maxBooleanClauses
 --

 Key: SOLR-4586
 URL: https://issues.apache.org/jira/browse/SOLR-4586
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2
 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50
Reporter: Shawn Heisey
 Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, 
 SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch


 In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to 
 someone asking a question about queries.  Mark Miller told me that 
 maxBooleanClauses no longer applies, that the limitation was removed from 
 Lucene sometime in the 3.x series.  The config still shows up in the example 
 even in the just-released 4.2.
 Checking through the source code, I found that the config option is parsed 
 and the value stored in objects, but does not actually seem to be used by 
 anything.  I removed every trace of it that I could find, and all tests still 
 pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4586) Increase default maxBooleanClauses

2014-11-04 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196266#comment-14196266
 ] 

Jack Krupansky commented on SOLR-4586:
--

[~reparker], yeah, this is the known behavior - the first core loaded sets this 
setting and any subsequent core loads ignore any new setting. So, yes, you need 
the bounce to change it.

 Increase default maxBooleanClauses
 --

 Key: SOLR-4586
 URL: https://issues.apache.org/jira/browse/SOLR-4586
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.2
 Environment: 4.3-SNAPSHOT 1456767M - ncindex - 2013-03-15 13:11:50
Reporter: Shawn Heisey
 Attachments: SOLR-4586.patch, SOLR-4586.patch, SOLR-4586.patch, 
 SOLR-4586.patch, SOLR-4586.patch, SOLR-4586_verify_maxClauses.patch


 In the #solr IRC channel, I mentioned the maxBooleanClauses limitation to 
 someone asking a question about queries.  Mark Miller told me that 
 maxBooleanClauses no longer applies, that the limitation was removed from 
 Lucene sometime in the 3.x series.  The config still shows up in the example 
 even in the just-released 4.2.
 Checking through the source code, I found that the config option is parsed 
 and the value stored in objects, but does not actually seem to be used by 
 anything.  I removed every trace of it that I could find, and all tests still 
 pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5302) Analytics Component

2014-10-31 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191672#comment-14191672
 ] 

Jack Krupansky commented on SOLR-5302:
--

Fix version still says trunk only... but this will be in 5.0 (branch_5x), right?

 Analytics Component
 ---

 Key: SOLR-5302
 URL: https://issues.apache.org/jira/browse/SOLR-5302
 Project: Solr
  Issue Type: New Feature
Reporter: Steven Bower
Assignee: Erick Erickson
 Fix For: Trunk

 Attachments: SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch, 
 SOLR-5302.patch, SOLR-5302_contrib.patch, Search Analytics Component.pdf, 
 Statistical Expressions.pdf, solr_analytics-2013.10.04-2.patch


 This ticket is to track a replacement for the StatsComponent. The 
 AnalyticsComponent supports the following features:
 * All functionality of StatsComponent (SOLR-4499)
 * Field Faceting (SOLR-3435)
 ** Support for limit
 ** Sorting (bucket name or any stat in the bucket
 ** Support for offset
 * Range Faceting
 ** Supports all options of standard range faceting
 * Query Faceting (SOLR-2925)
 * Ability to use overall/field facet statistics as input to range/query 
 faceting (ie calc min/max date and then facet over that range
 * Support for more complex aggregate/mapping operations (SOLR-1622)
 ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, 
 median, percentiles
 ** Operations: negation, abs, add, multiply, divide, power, log, date math, 
 string reversal, string concat
 ** Easily pluggable framework to add additional operations
 * New / cleaner output format
 Outstanding Issues:
 * Multi-value field support for stats (supported for faceting)
 * Multi-shard support (may not be possible for some operations, eg median)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index

2014-10-06 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160283#comment-14160283
]

Jack Krupansky commented on LUCENE-5992:

What about versions of an index during the development process, like each
time a change to the index format is committed? Such as the alpha and beta
stages in 4.0?

I'd be happier with four version ints: major, minor, patch, change.

Although, in theory, we shouldn't be changing the index format in either minor
or patch releases, but bug fixes for indexing can be valid changes as well.

Now, the question is whether change should reset to zero each time we branch,
or should really just be an ever-increasing index format version number. The
latter may make sense, but either is fine. The latter also makes sense from the
perspective of the potential of successive releases which don't introduce index
incompatibilities. I lean towards the latter, but still makes sense to
defensively record which release wrote an index.

Version should not be encoded as a String in the index
--

Key: LUCENE-5992
URL: https://issues.apache.org/jira/browse/LUCENE-5992
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, Trunk

Attachments: LUCENE-5992.patch

The version is really just 3 (maybe 4) ints under-the-hood, but today we
write it as a String which then requires spooky string tokenization/parsing
when we open the index. I think it should be encoded directly as ints.
In LUCENE-5952 I had tried to make this change, but it was controversial, and
got booted.
Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into
fixing all sorts of things *except* these three ints.
Maybe 3rd time's a charm ;)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5989) Add BinaryField, to index a single binary token

2014-10-05 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159511#comment-14159511
]

Jack Krupansky commented on LUCENE-5989:

bq. rename StringField to KeywordField, making it more obvious that this field
isn't tokenized. Then a KeywordsField can take a String or BytesRef in ctors.

Both Lucene and Solr are suffering from a conflation of the two concepts of
treating an input stream as a single token (a keyword) and as a sequence of
tokens (sequence of keywords). We have the KeywordTokenizer that does NOT
tokenize the input stream into a sequence of keywords. The term keyword
search is commonly used to describe the ability of search engines to find
individual keywords in extended streams of text - a clear reference to
keyword in a tokenized stream.

So, I don't understand how it is claimed that naming StringField to
KeywordField is making anything obvious - it seems to me to be adding to the
existing confusion rather than clarifying anything. I mean, the term keyword
should be treated more as a synonym for token or term, NOT as synonym for
string or raw character sequence.

I agree that we need a term for raw, uninterpreted character sequence, but it
seems to me that string is a more obvious candidate than keyword.

There has been some grumbling at the Solr level that KeywordTokenizer should be
renamed to... something, anything, but just not KeywordTokenizer, which
obviously implied that the input stream will be tokenized into a sequence of
keywords, which it does not.

In an effort to try to resolve this ongoing confusion, can somebody provide
from historical background as to how KeywordTokenizer got its name, and how a
subset of people continue to refer to an uninterpreted sequence of characters
as a keyword rather than a string. I checked the Javadoc, Jira, and even the
source code, but came up empty.

In short, it is a real eye-opener to see a claim that the term keyword in any
way makes it obvious that input is not tokenized!!

Maybe we could fix this for 5.0 to have a cleaner set of terminology going
forward. At a minimum, we should have some clarifying language in the Javadoc.
And hopefully we can refrain from making the confusion/conflation worse by
renaming StringField to KeywordField.

bq. Then a KeywordsField can take a String

Is that simply a typo or is the intent to have both a KeywordField (singular)
and a KeywordsField (plural)? I presume it is a typo, but... maybe it's a
Freudian slip and highlights this semantic difficulty that persists in the
Lucene terminology (and hence infects Solr terminology as well.)

Add BinaryField, to index a single binary token
---

Key: LUCENE-5989
URL: https://issues.apache.org/jira/browse/LUCENE-5989
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, Trunk

Attachments: LUCENE-5989.patch

5 years ago (LUCENE-1458) we enabled fully binary terms in the
lowest levels of Lucene (the codec APIs) yet today, actually adding an
arbitrary byte[] binary term during indexing is far from simple: you
must make a custom Field with a custom TokenStream and a custom
TermToBytesRefAttribute, as far as I know.
This is supremely expert, I wonder if anyone out there has succeeded
in doing so?
I think we should make indexing a single byte[] as simple as indexing
a single String.
This is a pre-cursor for issues like LUCENE-5596 (encoding IPv6
address as byte[16]) and LUCENE-5879 (encoding native numeric values
in their simple binary form).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6568) Join Discovery Contrib

2014-09-27 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150589#comment-14150589
]

Jack Krupansky commented on SOLR-6568:
--

This sounds quite interesting, but... it's tagged as minor, so... what's the
catch or limitation that prevents this from being a major?

Does it well well or at all for indexes that are not 100% memory resident? What
about SSD?

Does it only work with integer join keys? Is that a restriction that could be
relaxed? Or possibly have two parallel components, one that is super fast for
integer keys and only reasonably fast for non-integer keys. Might it be
possible to build an off-heap map from non-integer key to a temporary integer
key?

Join Discovery Contrib
--

Key: SOLR-6568
URL: https://issues.apache.org/jira/browse/SOLR-6568
Project: Solr
Issue Type: New Feature
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
Fix For: 5.0

This contribution was commissioned by the *NCBI* (National Center for
Biotechnology Information).
The Join Discovery Contrib is a set of Solr plugins that support large scale
joins and join facets between Solr cores.
There are two different Join implementations included in this contribution.
Both implementations are designed to work with integer join keys. It is very
common in large BioInformatic and Genomic databases to use integer primary
and foreign keys. Integer keys allow Bioinformatic and Genomic search engines
and discovery tools to perform complex operations on large data sets very
efficiently.
The Join Discovery Contrib provides features that will be applicable to
anyone working with the freely available databases from the NCBI and likely a
large number of other BioInformatic and Genomic databases. These features are
not specific though to Bioinformatics and Genomics, they can be used in any
datasets where integer keys are used to define the primary and foreign keys.
What is included in this contrib:
1) A new JoinComponent. This component is used instead of the standard
QueryComponent. It facilitates very large scale relational joins between two
Solr indexes (cores). The join algorithm used in this component is known as a
*parallel partitioned merge join*. This is an algorithm which partitions the
results from both sides of the join and then sorts and merges the partitions
in parallel.
Below are some of it's features:
* Sub-second performance on very large joins. The parallel join algorithm is
capable of sub-second performance on joins with tens of millions of records
on both sides of the join.
* The JoinComponent returns tuples with fields from both sides of the join.
The initial release returns the primary keys from both sides of the join and
the join key.
* The tuples also include, and are ranked by, a combined score from both
sides of the join.
* Special purpose memory-mapped on-disk indexes to support \*:\* joins. This
makes it possible to join an entire index with a sub-set of another index
with sub-second performance.
* Support for very fast one-to-one, one-to-many and many-to-many joins. Fast
many-to-many joins make it possible to join between indexes on multi-value
fields.
2) A new JoinFacetComponent. This component provides facets for both indexes
involved in the join.
3) The BitSetJoinQParserPlugin. A very fast parallel filter join based on
bitsets that supports infinite levels of nesting. It can be used as a filter
query in combination with the JoinComponent or with the standard query
component.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6568) Join Discovery Contrib

2014-09-27 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150589#comment-14150589
]

Jack Krupansky edited comment on SOLR-6568 at 9/27/14 1:53 PM:
---

This sounds quite interesting, but... it's tagged as minor, so... what's the
catch or limitation that prevents this from being a major?

Does it work well or at all for indexes that are not 100% memory resident? What
about SSD?

was (Author: jkrupan):
This sounds quite interesting, but... it's tagged as minor, so... what's the
catch or limitation that prevents this from being a major?

Does it well well or at all for indexes that are not 100% memory resident? What
about SSD?

Join Discovery Contrib
--

Key: SOLR-6568
URL: https://issues.apache.org/jira/browse/SOLR-6568
Project: Solr
Issue Type: New Feature
Reporter: Joel Bernstein
Assignee: Joel Bernstein
Priority: Minor
Fix For: 5.0

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6445) Allow flexible JSON input

2014-08-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14113800#comment-14113800
 ] 

Jack Krupansky commented on SOLR-6445:
--

+1 for violating the JSON standard! Okay, sure maybe we should have an option 
to require strict JSON, but it should default to false.

Could we support unquoted simple name values as well? Like:

{code}
{id: my-key}
{code}
 And if people strenuously object, maybe we just need to have a Solr JSON 
(SJSON or SON - Solr Object Notation) format with the relaxed rules.


 Allow flexible JSON input 
 --

 Key: SOLR-6445
 URL: https://issues.apache.org/jira/browse/SOLR-6445
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul

 Support single quotes and unquoted keys
 {code:javascript}
 //all the following must be valid and equivalent
 {id :mykey}
 {'id':'mykey'}
 {id: mykey}
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3619) Rename 'example' dir to 'server' and pull examples into an 'examples' directory

2014-08-08 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090765#comment-14090765
 ] 

Jack Krupansky commented on SOLR-3619:
--

I do hope that people use Elasticsearch as a high priority criteria for whether 
standing up a tutorial or production instance of Solr is easy enough. I mean, I 
still hear plenty of chatter that Solr is too hard. Granted, a lot of that is 
just perception, but the final result of this issue should be that Solr has two 
SHORT web pages for those two use cases that clearly show that Solr is just as 
easy to stand up as Elasticsearch.

Elasticsearch says Installation is a snap. Solr needs to be able to do the 
same.



 Rename 'example' dir to 'server' and pull examples into an 'examples' 
 directory
 ---

 Key: SOLR-3619
 URL: https://issues.apache.org/jira/browse/SOLR-3619
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Timothy Potter
 Fix For: 4.9, 5.0

 Attachments: SOLR-3619.patch, SOLR-3619.patch, managed-schema, 
 server-name-layout.png, solrconfig.xml






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6315) Remove SimpleOrderedMap

2014-08-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083962#comment-14083962
 ] 

Jack Krupansky commented on SOLR-6315:
--

Is order a part of the contract for the usages of this class? I mean, the 
current Javadoc does explicitly say that repetition and null values are NOT a 
part of the contract, but it doesn't say that order, another feature of 
NameList, is not important, while the name itself says Ordered. Kind of 
ambiguous, so a first order (Hah!) of business is to clarify whether 
maintaining order is a part of the contract, and then to validate that contract 
with actual usages.

Switching to map implies that order is no longer part of the contract, so it 
will be free to vary from release to release or between JVMs. Personally, I 
wish that Map was UnorderedMap, or even UnstableOrderMap, to make the contract 
crystal clear.

In fact it would be great to have the ordering of serialization of Map be a 
seeded random test framework parameter to catch cases where the code or test 
cases have become dependent on order of map serialization or any other 
non-contract behavior for that matter.

Will this change have ANY behavior change that will be visible to Solr 
application developers or users?


 Remove SimpleOrderedMap
 ---

 Key: SOLR-6315
 URL: https://issues.apache.org/jira/browse/SOLR-6315
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Reporter: Shai Erera
Assignee: Shai Erera
 Attachments: SOLR-6315.patch


 As I described on SOLR-912, SimpleOrderedMap is redundant and generally 
 useless class, with confusing jdocs. We should remove it. I'll attach a patch 
 shortly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5867) Add BooleanSimilarity

2014-08-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082778#comment-14082778
 ] 

Jack Krupansky commented on LUCENE-5867:


Would this be expected to result in any dramatic improvement in indexing or 
query performance, or a dramatic reduction in index size?


 Add BooleanSimilarity
 -

 Key: LUCENE-5867
 URL: https://issues.apache.org/jira/browse/LUCENE-5867
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Robert Muir
 Attachments: LUCENE-5867.patch


 This can be used when the user doesn't want tf/idf scoring for some reason. 
 The idea is that the score is just query_time_boost * index_time_boost, no 
 queryNorm/IDF/TF/lengthNorm...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6103) Add DateRangeField

2014-08-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082978#comment-14082978
 ] 

Jack Krupansky commented on SOLR-6103:
--

You might want to take a peek at the LucidWorks Search query parser support of 
date queries. It would be so nice to have comparable date support in Solr 
itself.

It includes the ability to auto-expand a simple partial date/time term into a 
full range, as well as using partial date/time in explicit range queries.

See:
http://docs.lucidworks.com/display/lweug/Date+Queries


 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6103) Add DateRangeField

2014-08-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083031#comment-14083031
 ] 

Jack Krupansky commented on SOLR-6103:
--

Once nuance is for the end of the range - [2010 TO 2012] should expand the 
starting date to the beginning of that period, but expand the ending date to 
the end of that period (2012-12-31T23:59:59.999Z). And [2010 TO 2012} would 
expand the ending date to the beginning (rather than the ending) of the period 
(2012-01-01T00:00:00Z), with the exclusive flag set as well.


 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6103) Add DateRangeField

2014-08-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14083031#comment-14083031
 ] 

Jack Krupansky edited comment on SOLR-6103 at 8/1/14 10:42 PM:
---

One nuance is for the end of the range - [2010 TO 2012] should expand the 
starting date to the beginning of that period, but expand the ending date to 
the end of that period (2012-12-31T23:59:59.999Z). And [2010 TO 2012} would 
expand the ending date to the beginning (rather than the ending) of the period 
(2012-01-01T00:00:00Z), with the exclusive flag set as well.



was (Author: jkrupan):
Once nuance is for the end of the range - [2010 TO 2012] should expand the 
starting date to the beginning of that period, but expand the ending date to 
the end of that period (2012-12-31T23:59:59.999Z). And [2010 TO 2012} would 
expand the ending date to the beginning (rather than the ending) of the period 
(2012-01-01T00:00:00Z), with the exclusive flag set as well.


 Add DateRangeField
 --

 Key: SOLR-6103
 URL: https://issues.apache.org/jira/browse/SOLR-6103
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0

 Attachments: SOLR-6103.patch


 LUCENE-5648 introduced a date range index  search capability in the spatial 
 module. This issue is for a corresponding Solr FieldType to be named 
 DateRangeField. LUCENE-5648 includes a parseCalendar(String) method that 
 parses a superset of Solr's strict date format.  It also parses partial dates 
 (e.g.: 2014-10  has month specificity), and the trailing 'Z' is optional, and 
 a leading +/- may be present (minus indicates BC era), and * means 
 all-time.  The proposed field type would use it to parse a string and also 
 both ends of a range query, but furthermore it will also allow an arbitrary 
 range query of the form {{calspec TO calspec}} such as:
 {noformat}2000 TO 2014-05-21T10{noformat}
 Which parses as the year 2000 thru 2014 May 21st 10am (GMT). 
 I suggest this syntax because it is aligned with Lucene's range query syntax. 
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5859) Remove Version.java completely

2014-07-31 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080778#comment-14080778
]

Jack Krupansky commented on LUCENE-5859:

bq. users don't even understand how this versioning works anyway

Anybody want to take a shot at a clear description that will make sense to the
rest of us?

I mean, don't normal users simply want precisely one thing - back compat with
their existing index, like always, plus auto upgrade when that is sensible?
Should a non-expert user EVER be setting the version explicitly? Some advanced
or expert users want to create indexes for a specific release, but let's not
confuse them with normal users.

I concede that this may be an overly simplistic view, but I think we should
start with where normal users should want to be, and at least elaborate in the
language of normal users precisely what additional considerations they need to
keep in mind and decisions they will have to make and what factors they will
need to consider, with specific recommendations.

And this is just Lucene. Solr... will it stay unchanged at the API level, or is
this Lucene change going to ripple out to Solr users as well?

Remove Version.java completely
--

Key: LUCENE-5859
URL: https://issues.apache.org/jira/browse/LUCENE-5859
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Fix For: 5.0

Attachments: LUCENE-5859_dead_code.patch

This has always been a mess: analyzers are easy enough to make on your own,
we don't need to take responsibility for the users analysis chain for 2
major releases.
The code maintenance is horrible here.
This creates a huge usability issue too, and as seen from numerous mailing
list issues, users don't even understand how this versioning works anyway.
I'm sure someone will whine if i try to remove these constants, but we can at
least make no-arg ctors forwarding to VERSION_CURRENT so that people who
don't care about back compat (e.g. just prototyping) don't have to deal with
the horribly complex versioning system.
If you want to make the argument that doing this is trappy (i heard this
before), i think thats bogus, and ill counter by trying to remove them.
Either way, I'm personally not going to add any of this kind of back compat
logic myself ever again.
Updated: description of the issue updated as expected. We should remove this
API completely. No one else on the planet has APIs that require a mandatory
version parameter.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5849) Scary read past EOF in RAMDir

2014-07-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14079873#comment-14079873
 ] 

Jack Krupansky commented on LUCENE-5849:


Any sense of whether this is JVM-dependent? Or whether it is an issue for the 
JVM itself?

 Scary read past EOF in RAMDir
 ---

 Key: LUCENE-5849
 URL: https://issues.apache.org/jira/browse/LUCENE-5849
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Attachments: TestBinaryDocIndex.java


 Nightly build hit this: 
 http://builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/91095
 And I'm able to repro at least once after beasting w/ the right JVM 
 (1.7.0_55) and G1GC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072300#comment-14072300
]

Jack Krupansky commented on LUCENE-5843:

That Solr Jira has my comments as well, but I just want to reiterate that the
actual limit should be more clearly documented. I filed a Jira for that quite
awhile ago - LUCENE-4104. And if this new issue will resolve the problem,
please mark my old LUCENE-4105 issue as a duplicate.

IndexWriter should refuse to create an index with more than INT_MAX docs

Key: LUCENE-5843
URL: https://issues.apache.org/jira/browse/LUCENE-5843
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, 4.10

It's more and more common for users these days to create very large indices,
e.g. indexing lines from log files, or packets on a network, etc., and it's
not hard to accidentally exceed the maximum number of documents in one index.
I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that
value as a sentinel during searching.
I'm not sure what IW does today if you create a too-big index but it's
probably horrible; it may succeed and then at search time you hit nasty
exceptions when we overflow int.
I think it should throw an IndexFullException instead. It'd be nice if we
could do this on the very doc that when added would go over the limit, but I
would also settle for just throwing at flush as well ... i.e. I think what's
really important is that the index does not become unusable.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6260) Rename DirectUpdateHandler2

2014-07-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069198#comment-14069198
 ] 

Jack Krupansky commented on SOLR-6260:
--

I noticed that the SolrCore code does in fact default the update handler class 
to DIH2/SIH if the class attribute is not specified, so maybe the upgrade 
instructions can simply be for users to remove the updateHandler class 
attribute, rather than for them to have to learn yet another internal name.

And I would reiterate my proposal to remove the class attribute from the 
example solrconfig.xml files, for both 5.0 and 4.x.

Either way, the patch should include changes to the Upgrading section of 
CHANGES.txt.

Do those three things and then I'm an easy +1!


 Rename DirectUpdateHandler2
 ---

 Key: SOLR-6260
 URL: https://issues.apache.org/jira/browse/SOLR-6260
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Mark Miller
Priority: Minor
 Attachments: SOLR-6260.patch, SOLR-6260.patch


 DirectUpdateHandler was removed, I think in Solr 4. DirectUpdateHandler2 
 should be renamed, at least remove that 2. I don't know really what 
 direct means here. Maybe it could be renamed to DefaultUpdateHandler, or 
 UpdateHandlerDefaultImpl, or other good suggestions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6260) Rename DirectUpdateHandler2

2014-07-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069198#comment-14069198
 ] 

Jack Krupansky edited comment on SOLR-6260 at 7/21/14 8:20 PM:
---

I noticed that the SolrCore code does in fact default the update handler class 
to DUH2/SUH if the class attribute is not specified, so maybe the upgrade 
instructions can simply be for users to remove the updateHandler class 
attribute, rather than for them to have to learn yet another internal name.

And I would reiterate my proposal to remove the class attribute from the 
example solrconfig.xml files, for both 5.0 and 4.x.

Either way, the patch should include changes to the Upgrading section of 
CHANGES.txt.

Do those three things and then I'm an easy +1!



was (Author: jkrupan):
I noticed that the SolrCore code does in fact default the update handler class 
to DIH2/SIH if the class attribute is not specified, so maybe the upgrade 
instructions can simply be for users to remove the updateHandler class 
attribute, rather than for them to have to learn yet another internal name.

And I would reiterate my proposal to remove the class attribute from the 
example solrconfig.xml files, for both 5.0 and 4.x.

Either way, the patch should include changes to the Upgrading section of 
CHANGES.txt.

Do those three things and then I'm an easy +1!


 Rename DirectUpdateHandler2
 ---

 Key: SOLR-6260
 URL: https://issues.apache.org/jira/browse/SOLR-6260
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Assignee: Mark Miller
Priority: Minor
 Attachments: SOLR-6260.patch, SOLR-6260.patch


 DirectUpdateHandler was removed, I think in Solr 4. DirectUpdateHandler2 
 should be renamed, at least remove that 2. I don't know really what 
 direct means here. Maybe it could be renamed to DefaultUpdateHandler, or 
 UpdateHandlerDefaultImpl, or other good suggestions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3619) Rename 'example' dir to 'server' and pull examples into an 'examples' directory

2014-07-20 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067950#comment-14067950
]

Jack Krupansky commented on SOLR-3619:
--

bq. a database like MySQL

But Solr isn't a database! (Nor is Elasticsearch.)

I think part of the issue here is that there are two distinct use cases: single
core and multi-core, or single collection and multiple collection. Solr is
perfectly usable in single-core/collection mode - the user need not concern
themselves with naming a collection. In that case, the fact that there is this
extra level of abstraction called a collection and it is named collection1
is a bit of an annoyance and distraction, so the less annoying the better.
Forcing the user to come up with a name and perform an extract step of naming
that default collection adds no significant value for the
single-core-collection use case, or the onboarding or introduction of new users
to Solr as a simple but powerful search platform.

Sure, once the user has decided that they indeed have the multi-core/collection
use case, THEN they will want to name their cores/collections with real
names. Sure, by all means make support for this use case as clean and
convenient as possible.

Why not simply give the user a choice, up front, and let them decide for
themselves what use case they want? Whether that is a separate download or a
separate startup command or a separate start directory seems like more of a
detail than an architectural choice for de-supporting one useful use case.

I would say leave the current example where it is, as it is, and have a
separate, clean download for multi-collection server mode. I'm sure people
deploying SolrCloud clusters in the cloud would appreciate the latter, without
any burden of example and tutorial fluff.

And maybe the use case distinction is simply SolrCloud vs. traditional Solr.
And then for the new (5.0) SolrCloud server mode, we can have a little script
for quick demo mode that is more like the current example/collection1 setup -
or a separate example/introduction/tutorial download from the raw server
download.

In short, don't sacrifice the current simplicity, but do pursue the 5.0 server
mode.

Maybe if progress were made on the 5.0 Solr server, some of these details
would just fall out or at least be more obvious and non-controversial.

As it is, this is feeling a lot more like rearranging deck chairs on the
Titanic than helping Solr to leapfrog to a whole new level in either
server-ness or ease-of-use-ness.

BTW, has any thought been given to including a packaging of the 5.0 Solr server
as a Windows service? That might also help to clarify some of this packaging
stuff.

Rename 'example' dir to 'server' and pull examples into an 'examples'
directory
---

Key: SOLR-3619
URL: https://issues.apache.org/jira/browse/SOLR-3619
Project: Solr
Issue Type: Improvement
Reporter: Mark Miller
Fix For: 4.9, 5.0

Attachments: SOLR-3619.patch, server-name-layout.png

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6260) Rename DirectUpdateHandler2

2014-07-20 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068068#comment-14068068
 ] 

Jack Krupansky commented on SOLR-6260:
--

Could we at least remove it from the example solrconfig in 5.0? Change the name 
as you see fit, and make it the default for the updateHandler class 
attribute? I mean, it always was kind of a wart to have to specify that kind 
of internal detail externally like that.



 Rename DirectUpdateHandler2
 ---

 Key: SOLR-6260
 URL: https://issues.apache.org/jira/browse/SOLR-6260
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Attachments: SOLR-6260.patch, SOLR-6260.patch


 DirectUpdateHandler was removed, I think in Solr 4. DirectUpdateHandler2 
 should be renamed, at least remove that 2. I don't know really what 
 direct means here. Maybe it could be renamed to DefaultUpdateHandler, or 
 UpdateHandlerDefaultImpl, or other good suggestions



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5746) solr.xml parsing of str vs int vs bool is brittle; fails silently; expects odd type for shareSchema

2014-07-12 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14059805#comment-14059805
 ] 

Jack Krupansky commented on SOLR-5746:
--

Will the changes for this issue result in a bump of the Solr schema version (to 
1.6), so that if existing apps do happen to work (albeit maybe incorrectly) 
with the current version 1.5 schema processing, they will still work in Solr 
4.10 (or whenever this ships)? I hope so.



 solr.xml parsing of str vs int vs bool is brittle; fails silently; 
 expects odd type for shareSchema   
 --

 Key: SOLR-5746
 URL: https://issues.apache.org/jira/browse/SOLR-5746
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.3, 4.4, 4.5, 4.6
Reporter: Hoss Man
 Attachments: SOLR-5746.patch, SOLR-5746.patch


 A comment in the ref guide got me looking at ConfigSolrXml.java and noticing 
 that the parsing of solr.xml options here is very brittle and confusing.  In 
 particular:
 * if a boolean option foo is expected along the lines of {{bool 
 name=footrue/bool}} it will silently ignore {{str 
 name=footrue/str}}
 * likewise for an int option {{int name=bar32/int}} vs {{str 
 name=bar32/str}}
 ... this is inconsistent with the way solrconfig.xml is parsed.  In 
 solrconfig.xml, the xml nodes are parsed into a NamedList, and the above 
 options will work in either form, but an invalid value such as {{bool 
 name=fooNOT A BOOLEAN/bool}} will generate an error earlier (when 
 parsing config) then {{str name=fooNOT A BOOLEAN/str}} (attempt to 
 parse the string as a bool the first time the config value is needed)
 In addition, i notice this really confusing line...
 {code}
 propMap.put(CfgProp.SOLR_SHARESCHEMA, 
 doSub(solr/str[@name='shareSchema']));
 {code}
 shareSchema is used internally as a boolean option, but as written the 
 parsing code will ignore it unless the user explicitly configures it as a 
 {{str/}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-247) Allow facet.field=* to facet on all fields (without knowing what they are)

2014-07-11 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058716#comment-14058716
 ] 

Jack Krupansky commented on SOLR-247:
-

The earlier commentary clearly lays out that the primary concern is that it 
would be a performance nightmare, but... that does depend on your particular 
use case.

Personally, I would say to go forward with adding this feature, but with a 
clear documentation caveat that this feature should be use with great care 
since it is likely to be extremely memory and performance intensive and more of 
a development testing tool than a production feature, although it could have 
value when wildcard patterns are crafted with care for a very limited number of 
fields.


 Allow facet.field=* to facet on all fields (without knowing what they are)
 --

 Key: SOLR-247
 URL: https://issues.apache.org/jira/browse/SOLR-247
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor
  Labels: beginners, newdev
 Attachments: SOLR-247-FacetAllFields.patch, SOLR-247.patch, 
 SOLR-247.patch, SOLR-247.patch


 I don't know if this is a good idea to include -- it is potentially a bad 
 idea to use it, but that can be ok.
 This came out of trying to use faceting for the LukeRequestHandler top term 
 collecting.
 http://www.nabble.com/Luke-request-handler-issue-tf3762155.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3451) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery

2014-07-03 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051934#comment-14051934
]

Jack Krupansky commented on LUCENE-3451:

[~yo...@apache.org] says:

bq. The current handling of boolean queries with only prohibited clauses is not
a bug, but working as designed, so this issue is about changing that behavior.
Currently working applications will now start unexpectedly throwing
exceptions... now that's trappy.

The fact that a pure negative query, actually a sub-query within parentheses in
the query parser, returns zero documents has been a MAJOR problem for Solr
users. I've lost count how many times it has come up on the user list and we
tell users to work around the problem by manually inserting \*:\* after the
left parenthesis.

But I am interested in hearing why it is believed that it is working as
designed and whether there are really applications that would intentionally
write a list of negative clauses when the design is that they will simply be
ignored and match no documents. If that kind of compatibility is really needed,
I would say it can be accommodated with a config setting, rather than give
unexpected and bad behavior for so many other people with the current behavior.

I would prefer to see a fix the problem by having BQ do the right thing by
implicitly starting with a MatchAllDocsQuery if only MUST_NOT clauses are
present, but... if that is not possible, an exception would be much better.

Alternatively, given the difficulty of doing almost anything with the various
query parsers, the method that generates the BQ for the query parser
(QueryParserBase .getBooleanQuery) should just check for pure negative clauses
and then add the MADQ. If this is massively controversial, just add a config
option to disable it.

Remove special handling of pure negative Filters in BooleanFilter, disallow
pure negative queries in BooleanQuery
-

Key: LUCENE-3451
URL: https://issues.apache.org/jira/browse/LUCENE-3451
Project: Lucene - Core
Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 4.9, 5.0

Attachments: LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch,
LUCENE-3451.patch, LUCENE-3451.patch

We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows
pure negative Filter clauses. This is not supported by BooleanQuery and
confuses users (I think that's the problem in LUCENE-3450).
The hack is buggy, as it does not respect deleted documents and returns them
in its DocIdSet.
Also we should think about disallowing pure-negative Queries at all and throw
UOE.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3451) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery

2014-07-03 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051978#comment-14051978
 ] 

Jack Krupansky commented on LUCENE-3451:


Thanks, [~yo...@apache.org]. Although the (a -x) stop word case seems to 
argue even more strenuously for at least an exception if ]\*:\* can't be 
inserted.

Besides, the stop word case is better handled by the Lucid approach of keeping 
all stop words (if they are indexed) if the sub-query terms are all stop words 
as in this case. So it would be only be problematic for the case of non-indexed 
stop words, which is really an anti-pattern anyway these days.

 Remove special handling of pure negative Filters in BooleanFilter, disallow 
 pure negative queries in BooleanQuery
 -

 Key: LUCENE-3451
 URL: https://issues.apache.org/jira/browse/LUCENE-3451
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.9, 5.0

 Attachments: LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch, 
 LUCENE-3451.patch, LUCENE-3451.patch


 We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows 
 pure negative Filter clauses. This is not supported by BooleanQuery and 
 confuses users (I think that's the problem in LUCENE-3450).
 The hack is buggy, as it does not respect deleted documents and returns them 
 in its DocIdSet.
 Also we should think about disallowing pure-negative Queries at all and throw 
 UOE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3451) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery

2014-07-03 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14051980#comment-14051980
]

Jack Krupansky commented on LUCENE-3451:

[~yo...@apache.org] says:

bq. I personally think it would be fine to insert *:* for the user where
appropriate.

Ah! Since the divorce that gave Solr custody of its own copy of
QueryParserBase, this change could be made there, right? I can file a Solr Jira
for that (or just use one of the two open Solr issues related to pure-negative
sub-queries), unless you want to do it. And then if the Solr people are happy
over there, the Lucene guys can have their exception here and close this issue,
and the everybody can live happily ever after, right?

Remove special handling of pure negative Filters in BooleanFilter, disallow
pure negative queries in BooleanQuery
-

Key: LUCENE-3451
URL: https://issues.apache.org/jira/browse/LUCENE-3451
Project: Lucene - Core
Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 4.9, 5.0

Attachments: LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch,
LUCENE-3451.patch, LUCENE-3451.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5791) QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046813#comment-14046813
 ] 

Jack Krupansky commented on LUCENE-5791:


At least consider clear Javadoc on limitations and performance, such as the 
need to keep wildcard patterns brief.

Maybe consider a limit of how many wildcards can be used in a single wildcard 
query. Possibly configurable.

Maybe consider a trim mode - if too many wildcards appear, simply trim 
trailing portions of the pattern to get under the limit. For example, this test 
case might get trimmed to abc*mno*xyz*. This would still match all of the 
intended matches, albeit also matching some unintended cases. Maybe a limit of 
three wildcards would be reasonable.

Does ? have the same issue, or is it much more linear? Would ???*???*???*??? be 
as bad as abc*mno*xyz*pqr* ?

Do adjacent ** get collapsed to a single * ?


 QueryParserUtil, big query with wildcards - runs endlessly and produces 
 heavy load
 ---

 Key: LUCENE-5791
 URL: https://issues.apache.org/jira/browse/LUCENE-5791
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
 Environment: Lucene 4.7.2
 Java 6
Reporter: Clemens Wyss
 Attachments: afterdet.png


 The following testcase runs endlessly and produces VERY heavy load.
 ...
 String query = Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
 diam nonumy eirmod tempor invidunt ut 
   + labore et dolore magna aliquyam erat, sed 
 diam voluptua. At vero eos et accusam et justo duo dolores et 
   + ea rebum. Stet clita kasd gubergren, no sea 
 takimata sanctus est Lorem ipsum dolor sit amet. 
   + Lorem ipsum dolor sit amet, consetetur 
 sadipscing elitr, sed diam nonumy eirmod tempor invidunt 
   + ut labore et dolore magna aliquyam erat, sed 
 diam voluptua. At vero eos et accusam et justo duo dolores 
   + et ea rebum. Stet clita kasd gubergren, no 
 sea takimata sanctus est Lorem ipsum dolor sit amet; String query  = 
 query.replaceAll( \\s+, * ); try { QueryParserUtil.parse( query, new 
 String[] { test }, new Occur[] { Occur.MUST }, new KeywordAnalyzer() ); } 
 catch ( Exception e ) { Assert.fail( e.getMessage() ); } ...
 I don't say this testcase makes sense, nevertheless the question remains 
 whether this is a bug or a feature?
 99% the threaddump/stacktrace looks as follows:
 BasicOperations.determinize(Automaton) line: 680  
 Automaton.determinize() line: 759 
 SpecialOperations.getCommonSuffixBytesRef(Automaton) line: 165
 CompiledAutomaton.init(Automaton, Boolean, boolean) line: 168   
 CompiledAutomaton.init(Automaton) line: 91  
 WildcardQuery(AutomatonQuery).init(Term, Automaton) line: 67
 WildcardQuery.init(Term) line: 57   
 WildcardQueryNodeBuilder.build(QueryNode) line: 42
 WildcardQueryNodeBuilder.build(QueryNode) line: 32
 StandardQueryTreeBuilder(QueryTreeBuilder).processNode(QueryNode, 
 QueryBuilder) line: 186 
 StandardQueryTreeBuilder(QueryTreeBuilder).process(QueryNode) line: 125   
 StandardQueryTreeBuilder(QueryTreeBuilder).build(QueryNode) line: 218 
 StandardQueryTreeBuilder.build(QueryNode) line: 82
 StandardQueryTreeBuilder.build(QueryNode) line: 53
 StandardQueryParser(QueryParserHelper).parse(String, String) line: 258
 StandardQueryParser.parse(String, String) line: 168   
 QueryParserUtil.parse(String, String[], BooleanClause$Occur[], Analyzer) 
 line: 119
 IndexingTest.queryParserUtilLimit() line: 1450



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5791) QueryParserUtil, big query with wildcards - runs endlessly and produces heavy load

2014-06-28 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046813#comment-14046813
 ] 

Jack Krupansky edited comment on LUCENE-5791 at 6/28/14 11:11 AM:
--

At least consider clear Javadoc on limitations and performance, such as the 
need to keep wildcard patterns brief.

Maybe consider a limit of how many wildcards can be used in a single wildcard 
query. Possibly configurable.

Maybe consider a trim mode - if too many wildcards appear, simply trim 
trailing portions of the pattern to get under the limit. For example, this test 
case might get trimmed to abc*mno*xyz*. This would still match all of the 
intended matches, albeit also matching some unintended cases. Maybe a limit of 
three wildcards would be reasonable.

Does ? have the same issue, or is it much more linear? Would ???*???*???*??? be 
as bad as abc*mno*xyz*pqr* ?

Do adjacent ** get collapsed to a single * ?

Fuzzy query has a very strict limit to assure that it is performant - I would 
think that these two query types should have the same performance goals.



was (Author: jkrupan):
At least consider clear Javadoc on limitations and performance, such as the 
need to keep wildcard patterns brief.

Maybe consider a limit of how many wildcards can be used in a single wildcard 
query. Possibly configurable.

Maybe consider a trim mode - if too many wildcards appear, simply trim 
trailing portions of the pattern to get under the limit. For example, this test 
case might get trimmed to abc*mno*xyz*. This would still match all of the 
intended matches, albeit also matching some unintended cases. Maybe a limit of 
three wildcards would be reasonable.

Does ? have the same issue, or is it much more linear? Would ???*???*???*??? be 
as bad as abc*mno*xyz*pqr* ?

Do adjacent ** get collapsed to a single * ?


 QueryParserUtil, big query with wildcards - runs endlessly and produces 
 heavy load
 ---

 Key: LUCENE-5791
 URL: https://issues.apache.org/jira/browse/LUCENE-5791
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
 Environment: Lucene 4.7.2
 Java 6
Reporter: Clemens Wyss
 Attachments: afterdet.png


 The following testcase runs endlessly and produces VERY heavy load.
 ...
 String query = Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed 
 diam nonumy eirmod tempor invidunt ut 
   + labore et dolore magna aliquyam erat, sed 
 diam voluptua. At vero eos et accusam et justo duo dolores et 
   + ea rebum. Stet clita kasd gubergren, no sea 
 takimata sanctus est Lorem ipsum dolor sit amet. 
   + Lorem ipsum dolor sit amet, consetetur 
 sadipscing elitr, sed diam nonumy eirmod tempor invidunt 
   + ut labore et dolore magna aliquyam erat, sed 
 diam voluptua. At vero eos et accusam et justo duo dolores 
   + et ea rebum. Stet clita kasd gubergren, no 
 sea takimata sanctus est Lorem ipsum dolor sit amet; String query  = 
 query.replaceAll( \\s+, * ); try { QueryParserUtil.parse( query, new 
 String[] { test }, new Occur[] { Occur.MUST }, new KeywordAnalyzer() ); } 
 catch ( Exception e ) { Assert.fail( e.getMessage() ); } ...
 I don't say this testcase makes sense, nevertheless the question remains 
 whether this is a bug or a feature?
 99% the threaddump/stacktrace looks as follows:
 BasicOperations.determinize(Automaton) line: 680  
 Automaton.determinize() line: 759 
 SpecialOperations.getCommonSuffixBytesRef(Automaton) line: 165
 CompiledAutomaton.init(Automaton, Boolean, boolean) line: 168   
 CompiledAutomaton.init(Automaton) line: 91  
 WildcardQuery(AutomatonQuery).init(Term, Automaton) line: 67
 WildcardQuery.init(Term) line: 57   
 WildcardQueryNodeBuilder.build(QueryNode) line: 42
 WildcardQueryNodeBuilder.build(QueryNode) line: 32
 StandardQueryTreeBuilder(QueryTreeBuilder).processNode(QueryNode, 
 QueryBuilder) line: 186 
 StandardQueryTreeBuilder(QueryTreeBuilder).process(QueryNode) line: 125   
 StandardQueryTreeBuilder(QueryTreeBuilder).build(QueryNode) line: 218 
 StandardQueryTreeBuilder.build(QueryNode) line: 82
 StandardQueryTreeBuilder.build(QueryNode) line: 53
 StandardQueryParser(QueryParserHelper).parse(String, String) line: 258
 StandardQueryParser.parse(String, String) line: 168   
 QueryParserUtil.parse(String, String[], BooleanClause$Occur[], Analyzer) 
 line: 119
 IndexingTest.queryParserUtilLimit() line: 1450



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail:

[jira] [Commented] (LUCENE-5785) White space tokenizer has undocumented limit of 256 characters per token

2014-06-25 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043454#comment-14043454
]

Jack Krupansky commented on LUCENE-5785:

It is worth keeping in mind that a token isn't necessarily the same as a
term. It may indeed be desirable to limit the length of terms in the Lucene
index for tokenized fields, but all too often an initial token is further
broken down using token filters (e.g., word delimiter filter) so that the final
term(s) are much shorter than the initial token. So, 256 may be a reasonable
limit for indexed terms, but not a great limit for initial tokenization in a
complex analysis chain.

Whether the default token length limit should be changed as part of this issue
is open. Personally I'd prefer a more reasonable limit such as 4096. But as
long as the limit can be upped using a tokenizer attribute, that should be
enough for now.

White space tokenizer has undocumented limit of 256 characters per token

Key: LUCENE-5785
URL: https://issues.apache.org/jira/browse/LUCENE-5785
Project: Lucene - Core
Issue Type: Improvement
Components: modules/analysis
Affects Versions: 4.8.1
Reporter: Jack Krupansky
Priority: Minor

The white space tokenizer breaks tokens at 256 characters, which is a
hard-wired limit of the character tokenizer abstract class.
The limit of 256 is obviously fine for normal, natural language text, but
excessively restrictive for semi-structured data.
1. Document the current limit in the Javadoc for the character tokenizer. Add
a note to any derived tokenizers (such as the white space tokenizer) that
token size is limited as per the character tokenizer.
2. Added the setMaxTokenLength method to the character tokenizer ala the
standard tokenizer so that an application can control the limit. This should
probably be added to the character tokenizer abstract class, and then other
derived tokenizer classes can inherit it.
3. Disallow a token size limit of 0.
4. A limit of -1 would mean no limit.
5. Add a token limit mode method - skip (what the standard tokenizer
does), break (current behavior of the white space tokenizer and its derived
tokenizers), and trim (what I think a lot of people might expect.)
6. Not sure whether to change the current behavior of the character tokenizer
(break mode) to fix it to match the standard tokenizer, or to be trim mode,
which is my choice and likely to be what people might expect.
7. Add matching attributes to the tokenizer factories for Solr, including
Solr XML javadoc.
At a minimum, this issue should address the documentation problem.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5785) White space tokenizer has undocumented limit of 256 characters per token

2014-06-25 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043670#comment-14043670
]

Jack Krupansky commented on LUCENE-5785:

bq. Make the limit configurable for all tokenizers, and expose that config
option in the Solr schema.

I wouldn't mind having a Solr-only, core/schema-specific default setting. Not
like max Boolean clause which was a Java static for Lucene and quite a mess in
terms of the order cores were loaded.

In short, leave the default as 256 in Lucene, but we could have Solr default to
something much less restrictive, like 4096, and in addition to the
tokenizer-specific attribute, the user could specify a global (for the
core/schema) override.

One key advantage of the schema-global override is that the user could leave
the existing field types intact.

White space tokenizer has undocumented limit of 256 characters per token

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5785) White space tokenizer has undocumented limit of 256 characters per token

2014-06-23 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041194#comment-14041194
]

Jack Krupansky commented on LUCENE-5785:

The pattern tokenizer can be used as a workaround for the white space tokenizer
since it doesn't have that hard-wired token length limit.

White space tokenizer has undocumented limit of 256 characters per token

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5785) White space tokenizer has undocumented limit of 256 characters per token

2014-06-22 Thread Jack Krupansky (JIRA)

Jack Krupansky created LUCENE-5785:
--

 Summary: White space tokenizer has undocumented limit of 256 
characters per token
 Key: LUCENE-5785
 URL: https://issues.apache.org/jira/browse/LUCENE-5785
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.8.1
Reporter: Jack Krupansky
Priority: Minor


The white space tokenizer breaks tokens at 256 characters, which is a 
hard-wired limit of the character tokenizer abstract class.

The limit of 256 is obviously fine for normal, natural language text, but 
excessively restrictive for semi-structured data.

1. Document the current limit in the Javadoc for the character tokenizer. Add a 
note to any derived tokenizers (such as the white space tokenizer) that token 
size is limited as per the character tokenizer.

2. Added the setMaxTokenLength method to the character tokenizer ala the 
standard tokenizer so that an application can control the limit. This should 
probably be added to the character tokenizer abstract class, and then other 
derived tokenizer classes can inherit it.

3. Disallow a token size limit of 0.

4. A limit of -1 would mean no limit.

5. Add a token limit mode method - skip (what the standard tokenizer does), 
break (current behavior of the white space tokenizer and its derived 
tokenizers), and trim (what I think a lot of people might expect.)

6. Not sure whether to change the current behavior of the character tokenizer 
(break mode) to fix it to match the standard tokenizer, or to be trim mode, 
which is my choice and likely to be what people might expect.

7. Add matching attributes to the tokenizer factories for Solr, including Solr 
XML javadoc.

At a minimum, this issue should address the documentation problem.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6113) Edismax doesn't parse well the query uf (User Fields)

2014-05-27 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009733#comment-14009733
 ] 

Jack Krupansky commented on SOLR-6113:
--

Better doc for the intended behavior would help, at least a little. At least we 
could point people to a clear description of what actually happens.


 Edismax doesn't parse well the query uf (User Fields)
 -

 Key: SOLR-6113
 URL: https://issues.apache.org/jira/browse/SOLR-6113
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Liram Vardi

 It seems that Edismax User Fields feature does not behave as expected.
 For instance, assuming the following query:
 _q= id:b* user:Anna CollinsdefType=edismaxuf=* -userrows=0_
 The parsed query (taken from query debug info) is:
 _+((id:b* (text:user) (text:anna collins))~1)_
 I expect that because user was filtered out in uf (User fields), the 
 parsed query should not contain the user search part.
 In another words, the parsed query should look simply like this:  _+id:b*_
 This issue is affected by a the patch on issue SOLR-2649: When changing the 
 default OP of Edismax to AND, the query results change.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6065) Solr / IndexWriter should prevent you from adding docs if it creates an index to big to open

2014-05-15 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996549#comment-13996549
 ] 

Jack Krupansky commented on SOLR-6065:
--

As a historical note, I had filed LUCENE-4104 and LUCENE-4105, as well as 
SOLR-3504 and SOLR-3505 to both document and check against the per-index 
document limit in both Lucene and Solr.

I think Lucene should check against the limit, and then Solr should respond to 
that condition.

Two interesting use cases:

1. Deleted documents exist, so Solr should tell the user that optimize can 
resolve the problem.
2. No deleted documents exist, Solr can only report that the document limit has 
been reached.

As an afterthought, maybe we should have a configurable Solr parameter for 
maximum documents per shard since anybody adding 2 billion documents to a 
shard is very likely to run into performance issues long before they get near 
the absolute maximum limit. I'd suggest a Solr configurable limit of like 250 
million. Alternatively, this configurable limit could simply be a (noisy) 
warning, or maybe it could be configurable as either a hard error or a soft 
warning.


 Solr / IndexWriter should prevent you from adding docs if it creates an index 
 to big to open
 

 Key: SOLR-6065
 URL: https://issues.apache.org/jira/browse/SOLR-6065
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 yamazaki reported an error on solr-user where, on opening a new searcher, he 
 got an IAE from BaseCompositeReader because the numDocs was greater then 
 Integer.MAX_VALUE.
 I'm surprised that in a straight forward setup (ie: no AddIndex merging) 
 IndexWriter will even let you add more docs then max int.  We should 
 investigate if this makes sense and either add logic in IndexWriter to 
 prevent this from happening, or add logic to Solr's UpdateHandler to prevent 
 things from getting that far.
 ie: we should be failing to add too many documents, and leaving the index 
 usable -- not accepting the add and leaving hte index in an unusable state.
 stack trace reported by user...
 {noformat}
 ERROR org.apache.solr.core.CoreContainer  – Unable to create core: collection1
 org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.init(SolrCore.java:821)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
 at 
 org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
 at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.solr.common.SolrException: Error opening new searcher
 at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:796)
 ... 13 more
 Caused by: org.apache.solr.common.SolrException: Error opening Reader
 at 
 org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183)
 at 
 org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179)
 at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
 ... 15 more
 Caused by: java.lang.IllegalArgumentException: Too many documents,
 composite IndexReaders cannot exceed 2147483647
 at 
 org.apache.lucene.index.BaseCompositeReader.init(BaseCompositeReader.java:77)
 at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:368)
 at 
 org.apache.lucene.index.StandardDirectoryReader.init(StandardDirectoryReader.java:42)
 at 
 org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71)
 at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
 at 
 org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
 at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
 at

[jira] [Commented] (SOLR-6036) Can't create collection with replicationFactor=0

2014-05-01 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986967#comment-13986967
 ] 

Jack Krupansky commented on SOLR-6036:
--

But I can sympathize - the term copies of the data is ambiguous and vague, 
unless you have seriously taken the mantra there is no master! to heart and 
etched it into your arms with acid. Maybe instances of the data would be a 
little less ambiguous.

 Can't create collection with replicationFactor=0
 

 Key: SOLR-6036
 URL: https://issues.apache.org/jira/browse/SOLR-6036
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1, 4.8
Reporter: John Wong
Priority: Trivial

 solrcloud$ curl 
 'http://localhost:8983/solr/admin/collections?action=CREATEname=collectionnumShards=2replicationFactor=0'
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status400/intint 
 name=QTime60052/int/lststr name=Operation createcollection caused 
 exception:org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
  replicationFactor must be greater than or equal to 0/strlst 
 name=exceptionstr name=msgreplicationFactor must be greater than or 
 equal to 0/strint name=rspCode400/int/lstlst name=errorstr 
 name=msgreplicationFactor must be greater than or equal to 0/strint 
 name=code400/int/lst
 /response
 I am using solr 4.3.1, but I peeked into the source up to 4.8 and the problem 
 still persists, but in 4.8, the exception message now is changed to be 
 greater than 0.
 The code snippet in OverseerCollectionProcessor.java:
   if (repFactor = 0) {
 throw new SolrException(ErrorCode.BAD_REQUEST, REPLICATION_FACTOR +  
 must be greater than 0);
   }
 I believe the = should just be  as it won't allow 0.  It may have been 
 legacy from when replicationFactor of 1 included the leader/master copy, 
 whereas in solr 4.x, replicationFactor is defined by additional replicas on 
 top of the leader.
 http://wiki.apache.org/solr/SolrCloud
 replicationFactor: The number of copies of each document (or, the number of 
 physical replicas to be created for each logical shard of the collection.) A 
 replicationFactor of 3 means that there will be 3 replicas (one of which is 
 normally designated to be the leader) for each logical shard. NOTE: in Solr 
 4.0, replicationFactor was the number of *additional* copies as opposed to 
 the total number of copies. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6003) JSON Update increment field with non-stored fields causes subtle problems

2014-04-25 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981086#comment-13981086
 ] 

Jack Krupansky commented on SOLR-6003:
--

It sounds like a separate Jira should be filed for some of these broader 
discussions.

This specific Jira should focus on the specific issue of increment for a 
non-stored field, and append to a non-stored multivalued field. Clearly this 
case should produce an exception since it can't possibly do anything reasonable 
since it needs to access the previous value before applying the increment or 
append.


 JSON Update increment field with non-stored fields causes subtle problems
 -

 Key: SOLR-6003
 URL: https://issues.apache.org/jira/browse/SOLR-6003
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.7.1
Reporter: Kingston Duffie

 In our application we have large multi-field documents.  We occasionally need 
 to increment one of the numeric fields or add a value to a multi-value text 
 field.  This appears to work correctly using JSON update.  But later we 
 discovered that documents were disappearing from search results and 
 eventually found the documentation that indicates that to use field 
 modification you must store all fields of the document.
 Perhaps you will argue that you need to impose this restriction -- which I 
 would hope could be overcome because of the cost of us having to store all 
 fields.  But in any case, it would be better for others if you could return 
 an error if someone tries to update a field on documents with non-stored 
 fields.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967913#comment-13967913
 ] 

Jack Krupansky commented on SOLR-5871:
--

I've lost count of how many times users have requested this feature. The basic 
request is for an easy way to determine which fields matched which values for 
each document, as opposed to having to sift through the debug explanation.

One technical difficulty is analysis - the results could report the analyzed 
field values which matched, which won't necessarily literally agree with the 
source terms due to case, stemming, synonyms, etc.

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13954438#comment-13954438
 ] 

Jack Krupansky commented on SOLR-5936:
--

As part of this cleanup, could somebody volunteer to create a plain-English 
summary of exactly what a trie field really is, what good it is, and why we 
can't live without them? I've read the code and, okay, there is a sequence of 
bit shifts and generation of extra terms, but in plain English, what's the 
point?

I'm not asking for a recitation of the actual algorithm(s), but some 
intuitively accessible summary. I would note that the typical examples are for 
strings with prefixes rather than binary numbers.

See:
http://en.wikipedia.org/wiki/Trie

And, is trie really the best solution for number types? Does it actually have 
real value for float and double values?

And I would really like to see some plain, easily readable explanation of 
precision step. Again, especially for real numbers.

And how should precision step be used for dates?

I mean, other than assuring sort order, why bother with trie? Or more 
specifically, why does a Solr (or Lucene) user need to know that trie is used 
for the implementation?

Specifically, for example, does it matter if a field has an evenly distributed 
range of numeric values with little repetition vs. numeric codes where there is 
a relatively small number of distinct values (e.g., 1-10, or scores of 0-100 or 
dates in years between 1970 and 2014) and relatively high cardinality? I mean, 
does trie do a uniformly great job for both of these extreme use cases, 
including for faceting?

And if trie really is the best approach for numeric fields, why not just do all 
of this under the hood instead of polluting the field type names with trie? 
IOW, rename TrieIntField to IntField, etc.

To me, trie just seems like unnecessary noise to average users.


 Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
 them from 5.0
 ---

 Key: SOLR-5936
 URL: https://issues.apache.org/jira/browse/SOLR-5936
 Project: Solr
  Issue Type: Task
  Components: Schema and Analysis
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch


 We've been discouraging people from using non-Trie numericdate field types 
 for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5896) Create and edit a CWiki page that describes UpdateRequestProcessors, especially FieldMutatingUpdateProcessors

2014-03-21 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943316#comment-13943316
 ] 

Jack Krupansky commented on SOLR-5896:
--

I have plenty of examples for these (and all other) update processors in my 
e-book:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html


 Create and edit a CWiki page that describes UpdateRequestProcessors, 
 especially FieldMutatingUpdateProcessors
 -

 Key: SOLR-5896
 URL: https://issues.apache.org/jira/browse/SOLR-5896
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 4.8, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson

 The capabilities here aren't really documented as a group anywhere I could 
 see in the official pages, there are a couple of references to them but 
 nothing that really serves draws attention. These need to be documented.
 Where does it make sense to put this? It doesn't really fit under 
 Understanding Analyzers, Tokenizers, and Filters, except kinda since they 
 can be used to alter how data gets indexed, think of the 
 Parse[Date|Int|Float..] factories.
 Straw-man: add child pages to Understanding Analyzers, Tokenizers, and 
 Filters for What is an UpdateRequestProcessor, UpdateRequestProcessors, 
 and probably something like How to configure your UpdateRequestProcessor. 
 Or???



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5654) Create a synonym filter factory that is (re)configurable, and capable of reporting its configuration, via REST API

2014-01-30 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886574#comment-13886574
]

Jack Krupansky commented on SOLR-5654:
--

Two reasonable and reliable use cases I have encountered:

1. Update or replace query-time synonyms - no risk for existing indexed data.

2. Add new index-time synonyms that will apply to new indexed documents -
again, no expectation that they would apply to existing documents, but
reindexing would of course apply them anyway.

Create a synonym filter factory that is (re)configurable, and capable of
reporting its configuration, via REST API
--

Key: SOLR-5654
URL: https://issues.apache.org/jira/browse/SOLR-5654
Project: Solr
Issue Type: Sub-task
Components: Schema and Analysis
Reporter: Steve Rowe

A synonym filter factory could be (re)configurable via REST API by
registering with the RESTManager described in SOLR-5653, and then responding
to REST API calls to modify its init params and its synonyms resource file.
Read-only (GET) REST API calls should also be provided, both for init params
and the synonyms resource file.
It should be possible to add/remove/modify one or more entries in the
synonyms resource file.
We should probably use JSON for the REST request body, as is done in the
Schema REST API methods.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5517) Treat POST with no Content-Type as application/x-www-form-urlencoded

2013-11-30 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13835838#comment-13835838
 ] 

Jack Krupansky commented on SOLR-5517:
--

What about curl commands? It is kind of an annoyance that you have to 
explicitly enter a Content-type.

 Treat POST with no Content-Type as application/x-www-form-urlencoded
 

 Key: SOLR-5517
 URL: https://issues.apache.org/jira/browse/SOLR-5517
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan Ernst
 Attachments: SOLR-5517.patch


 While the http spec states requests without a content-type should be treated 
 as application/octet-stream, the html spec says instead that post requests 
 without a content-type should be treated as a form 
 (http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1).  It would be 
 nice to allow large search requests from html forms, and not have to rely on 
 the browser to set the content type (since the spec says it doesn't have to).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5401) In Solr's ResourceLoader, add a check for @Deprecated annotation in the plugin/analysis/... class loading code, so we print a warning in the log if a deprecated factory

2013-10-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808292#comment-13808292
 ] 

Jack Krupansky commented on SOLR-5401:
--

Solr has a logging admin API that will return recent log entries.

For example:

{code}
curl 
http://localhost:8983/solr/admin/logging?threshold=WARNtestsince=0indent=true;
{code}

More examples and the API parameters are in the admin API section of my e-book 
that is currently in progress, but that isn't out yet. The source code is 
currently your best guide: org.apache.solr.handler.admin.LoggingHandler.



 In Solr's ResourceLoader, add a check for @Deprecated annotation in the 
 plugin/analysis/... class loading code, so we print a warning in the log if a 
 deprecated factory class is used
 --

 Key: SOLR-5401
 URL: https://issues.apache.org/jira/browse/SOLR-5401
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.5
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.6, 5.0

 Attachments: SOLR-5401.patch


 While changing an antique 3.6 schema.xml to Solr 4.5, I noticed that some 
 factories were deprecated in 3.x and were no longer available in 4.x (e.g. 
 solr._Language_PorterStemFilterFactory). If the user would have got a 
 notice before, this could have been prevented and user would have upgraded 
 before.
 In fact the factories were @Deprecated in 3.6, but the Solr loader does not 
 print any warning. My proposal is to add some simple code to 
 SolrResourceLoader that it prints a warning about the deprecated class, if 
 any configuartion setting loads a class with @Deprecated warning. So we can 
 prevent that problem in the future.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5401) In Solr's ResourceLoader, add a check for @Deprecated annotation in the plugin/analysis/... class loading code, so we print a warning in the log if a deprecated factory

2013-10-29 Thread Jack Krupansky (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808551#comment-13808551
 ] 

Jack Krupansky commented on SOLR-5401:
--

I suspect that all the needed logic is sprinkled throughout the Solr logging 
API. Yes, probably way too much effort for this one test, but it would be good 
to have lots of other Solr features fully test their error and warning 
handling, so eventually this piece of test infrastructure would be valuable.

 In Solr's ResourceLoader, add a check for @Deprecated annotation in the 
 plugin/analysis/... class loading code, so we print a warning in the log if a 
 deprecated factory class is used
 --

 Key: SOLR-5401
 URL: https://issues.apache.org/jira/browse/SOLR-5401
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.5
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.6, 5.0

 Attachments: SOLR-5401.patch


 While changing an antique 3.6 schema.xml to Solr 4.5, I noticed that some 
 factories were deprecated in 3.x and were no longer available in 4.x (e.g. 
 solr._Language_PorterStemFilterFactory). If the user would have got a 
 notice before, this could have been prevented and user would have upgraded 
 before.
 In fact the factories were @Deprecated in 3.6, but the Solr loader does not 
 print any warning. My proposal is to add some simple code to 
 SolrResourceLoader that it prints a warning about the deprecated class, if 
 any configuartion setting loads a class with @Deprecated warning. So we can 
 prevent that problem in the future.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 >

1 - 100 of 454 matches

Mail list logo