[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029412#comment-13029412
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/5/11 4:22 PM:
---

Revert of deletion of Mike's first comment (sorry)

{quote}
Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's 
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's 
flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, 
when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did 
before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that 
must now be fixed to handle the fact that a field can come back as 
NumericField?  Anyone know where...?)
{quote}

  was (Author: thetaphi):
Patch against 3.x.

I moved the to/from byte[] methods from Solr's TrieField into Lucene's 
NumericUtils, and fixed FieldsWriter/Reader to use free bits in the field's 
flags to know if the field is Numeric, and which type.

I added a random test case to verify we now get the right NumericField back, 
when we stored NumericField during indexing.

Old indices are handled fine (you'll get a String-ified Field back like you did 
before).

Spookily, nothing failed in Solr... I assume there's somewhere in Solr that 
must now be fixed to handle the fact that a field can come back as 
NumericField?  Anyone know where...?)

  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028707#comment-13028707
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 11:06 AM:


This patch adds some refactoring because FieldSelectorResult is an enum since 
3.0, so the (slow) queue of if-statements can be replaced by a fast switch.

Also some minor comments and a missing  0xFF when casting byte to int.

  was (Author: thetaphi):
This patch adds some refactoring because FieldSelectorResult is an enum 
since 3.0, so the (slow) queue of id-statements can be replaced by a fast 
switch.

Also some minor comments and a missing  0xFF when casting byte to int.
  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-04 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028764#comment-13028764
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/4/11 2:44 PM:
---

I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in LUCENE-2310]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not 
include the Field into the returned array, if its not instanceof Field? We can 
add that to documentation, that lazy loaded and numerical fields are not 
returned.
- I would also like to add a method Document.getNumericValue(s), that returns 
Number[] or Number like the NumericField one. Like above getField() it can 
return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under bugs - we shold 
move.

  was (Author: thetaphi):
I added some javadocs to Document class:
- getField() / getFields() is deprecated [we may change this in ]

Some thoughts:
- maybe we should make getField()/getFields() simply return null or does not 
include the Field into the returned array, if its not instanceof Field? We can 
add that to documentation, that lazy loaded and numerical fields are not 
returned.
- I would also like to add a method Document.getNumericValue(s), that returns 
Number[] or Number like the NumericField one. Like above getField() it can 
return null/empty array if the field name has no numeric Fields?

The CHANGES entry may also be extended, currently it under bugs - we shold 
move.
  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, 
 LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch, LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3065) NumericField should be stored in binary format in index (matching Solr's format)

2011-05-03 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13028454#comment-13028454
 ] 

Uwe Schindler edited comment on LUCENE-3065 at 5/3/11 10:01 PM:


There is still a problem - first the good news:

- If user calls Document.get(field), the returned string is as before, so there 
is no break at all. The reason is the implementation of 
NumericField.stringValue(), it returns what the user is used to from 3.0
- If a user calls getFieldable(field) all is fine, too. The only change is that 
it could return NumericField now. If the user simply calls stringValue() all is 
identical to 3.0

Problems start with:

- If user calls Document.getField(name) it returns Field (internally it casts 
the getFieldable()) result to Field. But NumericField does not subclass Field 
- ClassCastException. 

How to handle this?

- Maybe change those methods to return AbstractField, but thats a binary break 
and users will complain, because not everything works as expected
- Make NumericField subclass Field (and Field is unfinalized) - thats a bad 
idea, because Field has too many methods / members that are out of scope
- Deprecate Document.getField() and make it internally do an instanceof check, 
if it gets NumericField transform to a backwards-compatible Field? - This 
method is already broken. If you request Lazy field loading it also throws 
ClassCastEx (e.g. LUCENE-609).

Not sure how to proceed. Else the patch looks fine. I think simply ignoring 
LazyField loading is fine, as numeric fields are a maximum of 8 bytes Else 
we would need LazyNumericField :(

  was (Author: thetaphi):
There is still a problem - first the good news:

- If user calls Document.get(field), the returned string is as before, so there 
is no break at all. The reason is the implementation of 
NumericField.stringValue(), it returns what the user is used to from 3.0
- If a user calls getFieldable(field) all is fine, too. The only change is that 
it not could return NumericField. If the user simply calls stringValue() all is 
identical to 3.0

Problems start with:

- If user calls Document.getField(name) it returns Field (internally it casts 
the getFieldable()) result to Field. But NumericField does not subclass Field 
- ClassCastException. 

How to handle this?

- Maybe change those methods to return AbstractField, but thats a binary break 
and users will complain, because not everything works as expected
- Make NumericField subclass Field (and Field is unfinalized) - thats a bad 
idea, because Field has too many methods / members that are out of scope
- Deprecate Document.getField() and make it internally do an instanceof check, 
if it gets NumericField transform to a backwards-compatible Field? - This 
method is already broken. If you request Lazy field loading it also throws 
ClassCastEx (e.g. LUCENE-609).

Not sure how to proceed. Else the patch looks fine. I think simply ignoring 
LazyField loading is fine, as numeric fields are a maximum of 8 bytes Else 
we would need LazyNumericField :(
  
 NumericField should be stored in binary format in index (matching Solr's 
 format)
 

 Key: LUCENE-3065
 URL: https://issues.apache.org/jira/browse/LUCENE-3065
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.2, 4.0

 Attachments: LUCENE-3065.patch


 (Spinoff of LUCENE-3001)
 Today when writing stored fields we don't record that the field was a 
 NumericField, and so at IndexReader time you get back an ordinary Field and 
 your number has turned into a string.  See 
 https://issues.apache.org/jira/browse/LUCENE-1701?focusedCommentId=12721972page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12721972
 We have spare bits already in stored fields, so, we should use one to record 
 that the field is numeric, and then encode the numeric field in Solr's 
 more-compact binary format.
 A nice side-effect is we fix the long standing issue that you don't get a 
 NumericField back when loading your document.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org