[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-04-09 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964600#comment-13964600
 ] 

ASF subversion and git services commented on SOLR-5936:
---

Commit 1586106 from sar...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1586106 ]

SOLR-5936: Removed deprecated non-Trie-based numeric & date field types.

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch, SOLR-5936.trunk.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-31 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955381#comment-13955381
 ] 

Hoss Man commented on SOLR-5936:


bq. +1 to rename for 5.0

What exactly do you suggest renaming these Solr FieldType's to?

If you are suggesting "TrieFooField -> FooField" then i am a _*HUGE*_ -1 to 
that idea.

It's one thing to say that things like the (text based) IntField is deprecated, 
and will not work in 5.0 and people have to reindex.  but if we _also_ rename 
TrieIntField to IntField, then people who are still using the (text based) 
IntField in their schema.xml and attempt upgrading will get really weird, and 
hard to understand errors.

If folks think Trie is a confusing word in the name and want to change that 
then fine -- I'm certainly open to the idea --  But we really should not re-use 
the name of an existing (deprecated/removed) field type in a way that isn't 
backcompat.



In any event, a lot of what's being discussed here in comments feels like it 
should really be tracked in discreet issues (these can all be dealt with 
independnet of this issue, and eachother):

* better jdocs for the trie numeric fields
* renaming the trie numeric fields
* simplifying configuration of the trie numeric fields

...let's please keep this issue focused on the deprecation & removal of the 
non-trie fields, and folks who care about these other idea can file other 
jira's to track them

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954874#comment-13954874
 ] 

ASF subversion and git services commented on SOLR-5936:
---

Commit 1583226 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1583226 ]

SOLR-5936: don't use deprecated solr.IntField field type in Solr example tests 
(dynamic field '*_pi', using solr.IntField via the 'pint' field type, was 
removed from the main example schema)

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-30 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954779#comment-13954779
 ] 

ASF subversion and git services commented on SOLR-5936:
---

Commit 1583179 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1583179 ]

SOLR-5936: Deprecate non-Trie-based numeric & date field types

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-30 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954778#comment-13954778
 ] 

Steve Rowe commented on SOLR-5936:
--

I'm going to commit the 4.x patch now - the SOLR-5937 changes will depend on 
the main example schema changes here.

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-30 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954722#comment-13954722
 ] 

Yonik Seeley commented on SOLR-5936:


+1 to commit last patch

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch, 
> SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-30 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954669#comment-13954669
 ] 

Yonik Seeley commented on SOLR-5936:


I think we can remove this from the example schema alltogether?
{code}
+   

{code}

I had added it a long time ago for manual testing purposes.

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-29 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954509#comment-13954509
 ] 

Steve Rowe commented on SOLR-5936:
--

bq. Could we take the pint, plong, pfloat and all that out of the example 
schema while we're at it? Maybe in trunk only? I think that trunk, at least, 
won't have to read indexes with these it it.

+1 - see the issue title :)

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-29 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954508#comment-13954508
 ] 

Erick Erickson commented on SOLR-5936:
--

Could we take the pint, plong, pfloat and all that out of the example schema 
while we're at it? Maybe in trunk only? I think that trunk, at least, won't 
have to read indexes with these it it.

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-29 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954476#comment-13954476
 ] 

Uwe Schindler commented on SOLR-5936:
-

Hi Jack,

bq. And if trie really is the best approach for numeric fields, why not just do 
all of this under the hood instead of polluting the field type names with 
"trie"? IOW, rename TrieIntField to IntField, etc.

This goes back to the introduction of that in Lucene 2.9 / Solr 1.4. At that 
time everybody was using other field types, and stuff like IntField, 
SortableIntField,.. was already used as *names*. Because of that it was 
introduced to Solr with the name based on the original donated code (by me). 
Shortly later, Lucene renamed the field to be "NumericField" and 
"NumericRangeQuery" the query. The term "trie" is no longer used in Lucene and 
only the term "precisionStep" as a configureable flag for the number of 
additional term remained (in the documentation). So 
"Trie(Int|Long|Float|Double|Date)Field" is just there for "backwards 
compatibility" with earlier indexes (in Solr 1.4) and now, because the name is 
baked in, no way to change anymore.

+1 to rename for 5.0

bq. As part of this cleanup, could somebody volunteer to create a plain-English 
summary of exactly what a trie field really is, what good it is, and why we 
can't live without them? I've read the code and, okay, there is a sequence of 
bit shifts and generation of extra terms, but in plain English, what's the 
point?

See javadocs of NumericRangeQuery.

bq. Specifically, for example, does it matter if a field has an evenly 
distributed range of numeric values with little repetition vs. numeric codes 
where there is a relatively small number of distinct values (e.g., 1-10, or 
scores of 0-100 or dates in years between 1970 and 2014) and relatively high 
cardinality?

This does not matter because of the structure of the additional terms. The 
number of terms used for actual ranges is almost always around the approx. 
expected number (see javadocs of NRQ). It also does not matter if it is a date 
or a int or a float. Internally, for trie, there are no floats or dates at all. 
Everything is mapped to the sortable bits (means if value_a < value_b also the 
bits_of_value_a < bits_of_value_b). It also has no real effect on the size of 
the range. Lucene always matches approximately the same number of terms (a few 
hundreds at maximum).

Simply said, you are indexing all numbers as bits like strings formed as 
"10110110" (just in a better compressed way), with additional terms stripping 
some bits from the right (like "10110110", "101101", "1011", "10"). Ranges are 
then simplified to match middle parts of the range with shorter terms that 
match more documents. For that algorithm, the distribution of values is not 
that important. Index size only grows by a minimum size, because the shorter 
terms are more rare (approx. 12% more terms), with large posting lists (many 
docs match). But as those terms match many sequential docs, the posting lists 
are not so big (because of the delta encoding). So trie terms raise the index 
size only by a few percents, but make range queries ultimatively fast, because 
ranges can be matched with few terms hitting many documents.

bq. I mean, does trie do a uniformly great job for both of these extreme use 
cases, including for faceting?

It is not used for facetting. Facetting does not use the additional terms. For 
facetting use DocValues instead of indexed fields. If you want to use Trie 
fields, and don't want to search on them with ranges, you can switch of the 
additional terms by setting precStep to 0.

One last note from my side:
I agree with removing the impl details from the user. The user in my opinion 
only needs 2 types of numerics: precisionStep=4 or 8 (I think the default in 
solr is 8, although I disagree - e.g., Elasticsearch uses the Lucene default of 
4) and another one with precisonStep=infinity (0 in solr would) for numerics 
that are only for sorting and don't need range queries.

> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-m

[jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0

2014-03-29 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954438#comment-13954438
 ] 

Jack Krupansky commented on SOLR-5936:
--

As part of this cleanup, could somebody volunteer to create a plain-English 
summary of exactly what a trie field really is, what good it is, and why we 
can't live without them? I've read the code and, okay, there is a sequence of 
bit shifts and generation of extra terms, but in plain English, what's the 
point?

I'm not asking for a recitation of the actual algorithm(s), but some 
intuitively accessible summary. I would note that the typical examples are for 
strings with prefixes rather than binary numbers.

See:
http://en.wikipedia.org/wiki/Trie

And, is trie really the best solution for number types? Does it actually have 
real value for float and double values?

And I would really like to see some plain, easily readable explanation of 
precision step. Again, especially for real numbers.

And how should precision step be used for dates?

I mean, other than assuring sort order, why bother with trie? Or more 
specifically, why does a Solr (or Lucene) user need to know that trie is used 
for the implementation?

Specifically, for example, does it matter if a field has an evenly distributed 
range of numeric values with little repetition vs. numeric codes where there is 
a relatively small number of distinct values (e.g., 1-10, or scores of 0-100 or 
dates in years between 1970 and 2014) and relatively high cardinality? I mean, 
does trie do a uniformly great job for both of these extreme use cases, 
including for faceting?

And if trie really is the best approach for numeric fields, why not just do all 
of this under the hood instead of polluting the field type names with "trie"? 
IOW, rename TrieIntField to IntField, etc.

To me, trie just seems like unnecessary noise to average users.


> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove 
> them from 5.0
> ---
>
> Key: SOLR-5936
> URL: https://issues.apache.org/jira/browse/SOLR-5936
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.8, 5.0
>
> Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types 
> for years, it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org