[JENKINS] Solr-3.x - Build # 451 - Failure

2011-08-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-3.x/451/

1 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload

Error Message:
Number of registered MBeans is not the same as info registry size expected:50 
but was:51

Stack Trace:
junit.framework.AssertionFailedError: Number of registered MBeans is not the 
same as info registry size expected:50 but was:51
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload(TestJmxIntegration.java:137)




Build Log (for compile errors):
[...truncated 19353 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093483#comment-13093483
 ] 

Chris Male commented on LUCENE-3312:


I'm almost done getting an initial patch for this, just one issue remaining - 
IndexDocValues.  IndexDocValues can be both not indexed and not stored.  
Therefore when you retrieve the indexed fields and then the stored fields, you 
can miss some IndexDocValues.  It seems to be that we might need a 3rd 
interface to cover these fields? 

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093499#comment-13093499
 ] 

Simon Willnauer commented on LUCENE-3312:
-

bq. I'm almost done getting an initial patch for this, just one issue remaining 
- IndexDocValues. IndexDocValues can be both not indexed and not stored. 
Therefore when you retrieve the indexed fields and then the stored fields, you 
can miss some IndexDocValues. It seems to be that we might need a 3rd interface 
to cover these fields?

To me it appears that we need some clarification what DocValues are. Actually, 
when you think about it Stored Fields and DocValues have a lot in common. A 
Stored Field is basically a DocValues DerefVarBytes type and maybe down the 
road we should think about merge those two types together. It would be nice to 
have only one typesafe API that can store whatever you want and based on the 
codec lucene would decide how to store it on disk ie. if it is a multi field 
container like Stored Fields are done today or if the values are split appart 
like DocValues does it today.
For now we should try to differentiate between and InvertedField and a 
StoredField ie. everything which is not an InvertedField is a StoredField. The 
API could basically already reflect that DocValues and StoredFields are the 
same and simply specify a type like Store.Packed vs. Store.ColumnStride or 
something like that. If we do that we could also expose loading Packed Fields 
via PerDocValues and have one API for our users.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093513#comment-13093513
 ] 

Chris Male commented on LUCENE-3312:


Just for clarification, Packed refers to the notion of a stored field? or am I 
lost?

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093520#comment-13093520
 ] 

Simon Willnauer commented on LUCENE-3312:
-

bq. Just for clarification, Packed refers to the notion of a stored field? or 
am I lost?
yes, since we pack all fields together into one location and store only the 
offset to the first field per document. I just used that term here to 
differentiate, sorry for the confusion

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093525#comment-13093525
 ] 

Chris Male commented on LUCENE-3312:


Given what you say about the similarities between stored fields and DocValues 
and the direction we seem to be heading, I think its a good term to start using.

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 357 - Failure

2011-08-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/357/

No tests ran.

Build Log (for compile errors):
[...truncated 10533 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093698#comment-13093698
 ] 

Chris Male commented on LUCENE-3312:


{quote}
Good question... I think the userland Field (oal.document) should
implement both IndexableField and StorableField? And then
oal.document.Document holds Field instances?
{quote}

Hm I'm going round in circles on this.  For building and indexing a Document, 
having the class hold Field instances is easiest and the most clean option.  
However this then means we are once again providing Field instances in the 
Document returned by reader.document(), meaning we lose:

{quote}
So I consider this (these indexing details are no longer available
when you pull the document) a big benefit of cutting over to
StorableField. Ie, its trappy today since it's buggy, so we'd be
removing that trap.
{quote}

Thoughts? :/

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)
wrong stats/scoring from MemoryCodec


 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407_test.patch

I hit some random failures in the flexscoring branch: wierd because its not a 
random test.

I noticed the test always failed with memorycodec, and wrote a specific test 
for it.

I haven't traced thru it yet, but I think its likely the issue that memorycodec 
is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3407:


Attachment: LUCENE-3407_test.patch

patch with a test, goes against the flex scoring branch.

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093718#comment-13093718
 ] 

Robert Muir commented on LUCENE-3407:
-

the issue only happens with omitTF, my guess is that its returning something 
strange for totalTermFreq... will try to make a better test.

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3407:


Attachment: LUCENE-3407.patch

patch for trunk.

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3407.
-

Resolution: Fixed

I committed this: no problems with any other codecs.

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093791#comment-13093791
 ] 

Michael McCandless commented on LUCENE-3407:


Phew, nice catch!

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093802#comment-13093802
 ] 

Chris Male commented on LUCENE-3312:


So one thought would be to have a different class being returned 
reader.document(), we could call it StoredDocument and it would only make 
access to StorableFields.  I like this idea since it gets people over the hump 
of piggybacking Field, but it is a bw compat break.  Any objections?

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093802#comment-13093802
 ] 

Chris Male edited comment on LUCENE-3312 at 8/30/11 3:17 PM:
-

So one thought would be to have a different class being returned by 
reader.document(), we could call it StoredDocument and it would only make 
access to StorableFields.  I like this idea since it gets people over the hump 
of piggybacking Field, but it is a bw compat break.  Any objections?

  was (Author: cmale):
So one thought would be to have a different class being returned 
reader.document(), we could call it StoredDocument and it would only make 
access to StorableFields.  I like this idea since it gets people over the hump 
of piggybacking Field, but it is a bw compat break.  Any objections?
  
 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093818#comment-13093818
 ] 

Michael McCandless commented on LUCENE-3312:


I agree DocValues seem both indexed and stored, but they are closer to
stored field so let's put them under StorableField.

And indeed we could impl stored fields as a DerefVarBytes doc values
field, but I think we should hold off on unifying this in the APIs we
are creating here?

Ie, StorableField should just have a .docValues() method and if that
returns non-null value the indexer will index those doc values
(likewise for .stringValue(), binaryValue(), etc.)?


 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093816#comment-13093816
 ] 

Robert Muir commented on LUCENE-2308:
-

Thanks Mike! This is really helpful!

While reviewing this/merging the flexscoring branch, I had a few ideas of 
improvements:
* I think FT should be immutable? Personally, I don't think we should enforce 
patterns like Freezable or Builder at this low-level, instead I think 
FieldType should be a simple immutable class with a single ctor that takes the 
minimal stuff that we (core lucene) need. It can still be concrete, but then 
you have to specify everything. Then, things like TextField/StringField are 
sugar APIs for common configurations. I don't like the idea of mutable 
FieldTypes that are reused across different fields because I am concerned that 
somehow the 'wrong configuration' will be applied accidentally.
* Along these lines, we can then remove the copy constructor, which also 
seems unnatural to java users, since FieldType would then be immutable there is 
no reason to ever copy it.
* I think BinaryField should be able to index as binary? This is a new feature 
in Lucene 4 but its unfortunately really hard to do: there are a few 
approaches, but they are all difficult: custom 
tokenstream/AttributeImpls/AttributeFactory etc.
* In the future, a BinaryField like this could be a base impl for 
CollationField: due to historical reasons we expose this capability as an 
Analyzer but I think this isn't great: its really an implementation detail. For 
example in Solr, its a real FieldType that uses an analyzer behind the scenes. 
So in this sense I think its more consistent with NumericField.


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField

2011-08-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093825#comment-13093825
 ] 

Michael McCandless commented on LUCENE-3312:


bq. So one thought would be to have a different class being returned by 
reader.document(), we could call it StoredDocument and it would only make 
access to StorableFields.

+1

 Break out StorableField from IndexableField
 ---

 Key: LUCENE-3312
 URL: https://issues.apache.org/jira/browse/LUCENE-3312
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
 Fix For: Field Type branch


 In the field type branch we have strongly decoupled
 Document/Field/FieldType impl from the indexer, by having only a
 narrow API (IndexableField) passed to IndexWriter.  This frees apps up
 use their own documents instead of the user-space impls we provide
 in oal.document.
 Similarly, with LUCENE-3309, we've done the same thing on the
 doc/field retrieval side (from IndexReader), with the
 StoredFieldsVisitor.
 But, maybe we should break out StorableField from IndexableField,
 such that when you index a doc you provide two Iterables -- one for the
 IndexableFields and one for the StorableFields.  Either can be null.
 One downside is possible perf hit for fields that are both indexed 
 stored (ie, we visit them twice, lookup their name in a hash twice,
 etc.).  But the upside is a cleaner separation of concerns in API

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3206) FST package API refactoring

2011-08-30 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093830#comment-13093830
 ] 

David Smiley commented on LUCENE-3206:
--

I'm looking forward to these FST API improvements. It's a bit obtuse for 
something that is basically a SortedMap.

 FST package API refactoring
 ---

 Key: LUCENE-3206
 URL: https://issues.apache.org/jira/browse/LUCENE-3206
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/FSTs
Affects Versions: 3.2
Reporter: Dawid Weiss
Assignee: Dawid Weiss
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3206.patch


 The current API is still marked @experimental, so I think there's still time 
 to fiddle with it. I've been using the current API for some time and I do 
 have some ideas for improvement. This is a placeholder for these -- I'll post 
 a patch once I have a working proof of concept.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093856#comment-13093856
 ] 

Michael McCandless commented on LUCENE-2308:



bq.  I think FT should be immutable?
bq.  I don't like the idea of mutable FieldTypes that are reused across 
different fields because I am concerned that somehow the 'wrong configuration' 
will be applied accidentally.

This is why we have FT.freeze, and why an FT is frozen as soon as it's
used in a Field.  But I agree it'd be even better if we had true
immutability (all fields in FT are final).

bq. I think FieldType should be a simple immutable class with a single ctor 
that takes the minimal stuff that we (core lucene) need.
bq. It can still be concrete, but then you have to specify everything. Then, 
things like TextField/StringField are sugar APIs for common configurations.

This is a neat idea!

Another plus is this is a single place where we can check consistency
of the settings (eg you cannot enable term vectors if indexed is
false).

So this would mean we'd have alternate ctors to the sugar classes for
the common cases, like maybe:
{noformat}
   new StringField(name, value)
   new StoredStringField(name, value)
{noformat}

StringField would always omitNorms, not tokenize, index DOCS_ONLY.

For TextField maybe:
{noformat}
   new TextField(name, value)
   new TextField(name, value, omitNorms)
   new TextField(name, value, omitNorms, indexTVPos, indexTVOffsets)
   new StoredTextField(name, value)
   new StoredTextField(name, value, omitNorms)
   new StoredTextField(name, value, omitNorms, indexTVPos, indexTVOffsets)
{noformat}

Expert usage would always have the out of invoking FT directly with
all options.  Even more expert usage can bypass the userspace
FieldType/Field/Document entirely and code directly to IndexableField
instead.

bq. I think BinaryField should be able to index as binary?

I agree!  Not sure on the details of how we'd do that though... Today
this field is only stored byte[].


 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10390 - Failure

2011-08-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10390/

All tests passed

Build Log (for compile errors):
[...truncated 12938 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093880#comment-13093880
 ] 

Chris Male commented on LUCENE-2308:


Much of what has been suggested here I'm looking at incorporating in 
LUCENE-3312 but perhaps its best to do it in small steps. 

What I want to do is as follows:

- Change FieldType to an interface inside index.* and use it for the source of 
properties about an IndexableField.  It will be simple and immutable and won't 
enforce any creation techniques.
- Add a builder for FieldType to document.* which will create FieldType 
instances. 
- Add the syntactic sugar ctors suggested above which would use the builder to 
instantiate the FieldTypes they need.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer

2011-08-30 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2312:
-

Attachment: LUCENE-2312.patch

Here's a new patch that incrementally adds field cache and norms values.  
Meaning that as documents are added / indexed, norms and field cache values are 
automatically created.  The field cache values are only added to if they have 
already been created.  

The field cache functionality needs to be completed for all types.

We probably need to get the indexing lock while the field cache value is 
initially being created (eg, the terms enumeration).

We're more or less feature complete now. 

 Search on IndexWriter's RAM Buffer
 --

 Key: LUCENE-2312
 URL: https://issues.apache.org/jira/browse/LUCENE-2312
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/search
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
Assignee: Michael Busch
 Fix For: Realtime Branch

 Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, 
 LUCENE-2312.patch, LUCENE-2312.patch


 In order to offer user's near realtime search, without incurring
 an indexing performance penalty, we can implement search on
 IndexWriter's RAM buffer. This is the buffer that is filled in
 RAM as documents are indexed. Currently the RAM buffer is
 flushed to the underlying directory (usually disk) before being
 made searchable. 
 Todays Lucene based NRT systems must incur the cost of merging
 segments, which can slow indexing. 
 Michael Busch has good suggestions regarding how to handle deletes using max 
 doc ids.  
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923
 The area that isn't fully fleshed out is the terms dictionary,
 which needs to be sorted prior to queries executing. Currently
 IW implements a specialized hash table. Michael B has a
 suggestion here: 
 https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10391 - Still Failing

2011-08-30 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10391/

No tests ran.

Build Log (for compile errors):
[...truncated 5361 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093895#comment-13093895
 ] 

Robert Muir commented on LUCENE-2308:
-

{quote}
Add a builder for FieldType to document.* which will create FieldType instances.
{quote}

Can we avoid the builder API? I think we shouldnt invite accidental creation of 
lots of FieldType instances during indexing... why not just a single ctor in 
fieldtype that takes all the parameters the base class cares about? then it 
serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField 
are just the sugar.

If someone wants to implement their *subclass* with a builder, they could still 
do this, but I don't think we should force the builder API on people with such 
a user-facing API: I think we should just keep it a very simple immutable class 
with one ctor.

Given the choice between builder and freezable though, I'll take freezable any 
day... but as I said before I think we should keep this even simpler.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3406) Add source distribution packaging targets that make a tarball from a local working copy

2011-08-30 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned LUCENE-3406:
---

Assignee: Steven Rowe

 Add source distribution packaging targets that make a tarball from a local 
 working copy
 ---

 Key: LUCENE-3406
 URL: https://issues.apache.org/jira/browse/LUCENE-3406
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 4.0
Reporter: Seung-Yeoul Yang
Assignee: Steven Rowe
Priority: Minor
  Labels: patch
 Fix For: 4.0

 Attachments: LUCENE-3406.patch, LUCENE-3406.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am adding back targets that were removed in 
 https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create 
 source distribution packaging from a local working copy as new Ant targets.
 2 things to note about the patch:
 1) For package-local-src-tgz in solr/build.xml, I had to specify additional 
 directories under solr/ that have been added since LUCENE-2973.
 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate 
 the docs folder, which does get added by package-tgz-src. 
 The patch is against the trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093928#comment-13093928
 ] 

Robert Muir commented on LUCENE-3407:
-

SimpleText had this issue too... i had not run the tests enough, we 
intentionally return SimpleText less often :)

 wrong stats/scoring from MemoryCodec
 

 Key: LUCENE-3407
 URL: https://issues.apache.org/jira/browse/LUCENE-3407
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: flexscoring branch, 4.0
Reporter: Robert Muir
 Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch


 I hit some random failures in the flexscoring branch: wierd because its not a 
 random test.
 I noticed the test always failed with memorycodec, and wrote a specific test 
 for it.
 I haven't traced thru it yet, but I think its likely the issue that 
 memorycodec is somehow returning wrong stats here?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3406) Add source distribution packaging targets that make a tarball from a local working copy

2011-08-30 Thread Seung-Yeoul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094089#comment-13094089
 ] 

Seung-Yeoul Yang commented on LUCENE-3406:
--

I've tested both targets locally, and they work fine.

Thanks Steve!

 Add source distribution packaging targets that make a tarball from a local 
 working copy
 ---

 Key: LUCENE-3406
 URL: https://issues.apache.org/jira/browse/LUCENE-3406
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 4.0
Reporter: Seung-Yeoul Yang
Assignee: Steven Rowe
Priority: Minor
  Labels: patch
 Fix For: 4.0

 Attachments: LUCENE-3406.patch, LUCENE-3406.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 I am adding back targets that were removed in 
 https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create 
 source distribution packaging from a local working copy as new Ant targets.
 2 things to note about the patch:
 1) For package-local-src-tgz in solr/build.xml, I had to specify additional 
 directories under solr/ that have been added since LUCENE-2973.
 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate 
 the docs folder, which does get added by package-tgz-src. 
 The patch is against the trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2714) JsonLoader does not handle null field values

2011-08-30 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2714:
---

  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)
   Summary: JsonLoader does not handle null field values  (was: JsonLoader 
does not handle null fields)

 JsonLoader does not handle null field values
 

 Key: SOLR-2714
 URL: https://issues.apache.org/jira/browse/SOLR-2714
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.3
Reporter: Trygve Laugstøl
Priority: Minor
 Attachments: SOLR-2714.patch


 The parser in JsonLoader does not handle null fields when adding a document 
 over http+json.
 Given this document:
 {code}
 [{
   timestamp:2011-08-17T14:11:49.201Z,
   correlationId:N44YFGSQNC,
   logType:event,
   short:Invalidating session: 4zy6cvdtmvu1erlay0sn6rhz,
   long:null
 }]
 {code}
 I'm getting a response code=400 and the error message should finish doc 
 first in the logs.
 It seems that JsonLoader is missing case for JSONParser.NULL in the parser 
 even switch.
 * 
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/JsonLoader.java
 * 
 https://svn.apache.org/repos/asf/labs/noggit/src/main/java/org/apache/noggit/JSONParser.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-30 Thread Simon Willnauer (JIRA)
Remove unnecessary memory barriers in DWPT
--

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0


Currently DWPT still uses AtomicLong to count the bytesUsed. Each write access 
issues an implicite memory barrier which is totally unnecessary since we doing 
everything single threaded on that level. This might be very minor but we 
shouldn't issue unnecessary memory barriers causing processors to lock their 
instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-30 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3408:


Attachment: LUCENE-3408.patch

here is a patch that replaces the AtomicLong with a simple counter class that 
provides threadsafe and serial implementations.

 Remove unnecessary memory barriers in DWPT
 --

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3408.patch


 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write 
 access issues an implicite memory barrier which is totally unnecessary since 
 we doing everything single threaded on that level. This might be very minor 
 but we shouldn't issue unnecessary memory barriers causing processors to lock 
 their instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3408) Remove unnecessary memory barriers in DWPT

2011-08-30 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3408:
---

Assignee: Simon Willnauer

 Remove unnecessary memory barriers in DWPT
 --

 Key: LUCENE-3408
 URL: https://issues.apache.org/jira/browse/LUCENE-3408
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3408.patch


 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write 
 access issues an implicite memory barrier which is totally unnecessary since 
 we doing everything single threaded on that level. This might be very minor 
 but we shouldn't issue unnecessary memory barriers causing processors to lock 
 their instruction pipeline for no reason.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2714) JsonLoader does not handle null field values

2011-08-30 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2714.


   Resolution: Fixed
Fix Version/s: 3.4

 JsonLoader does not handle null field values
 

 Key: SOLR-2714
 URL: https://issues.apache.org/jira/browse/SOLR-2714
 Project: Solr
  Issue Type: Improvement
Affects Versions: 3.3
Reporter: Trygve Laugstøl
Priority: Minor
 Fix For: 3.4

 Attachments: SOLR-2714.patch


 The parser in JsonLoader does not handle null fields when adding a document 
 over http+json.
 Given this document:
 {code}
 [{
   timestamp:2011-08-17T14:11:49.201Z,
   correlationId:N44YFGSQNC,
   logType:event,
   short:Invalidating session: 4zy6cvdtmvu1erlay0sn6rhz,
   long:null
 }]
 {code}
 I'm getting a response code=400 and the error message should finish doc 
 first in the logs.
 It seems that JsonLoader is missing case for JSONParser.NULL in the parser 
 even switch.
 * 
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/JsonLoader.java
 * 
 https://svn.apache.org/repos/asf/labs/noggit/src/main/java/org/apache/noggit/JSONParser.java

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser

2011-08-30 Thread Vinicius Barros
Hi Uwe,

Yes, the patch is almost ready, I was waiting your response to the question 
below from LUCENE-1768, once I have the answer, I will submit the patch.

I have some few changes on 3x to submit yet, but I was wondering: is it 
necessary to deprecate a class in 3x if it's ONLY going to be removed in 4.0? 
Not sure if I understand how these things work yet.
 
Regards,
Vinicius Barros



De: Uwe Schindler u...@thetaphi.de
Para: dev@lucene.apache.org; 'Vinicius Barros' 
viniciusbarros.g...@yahoo.com.br
Enviadas: Segunda-feira, 29 de Agosto de 2011 7:37
Assunto: RE: Final Evaluations results processed for LUCENE-1768: NumericRange 
support for new query parser


Hi Vinicius,
 
Thank you very much!
 
Will you submit the patch for upgrading trunk, too (after the 3.x changes)? 
Once this is finished, we will commit changes.txt entries and close the issue.
 
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
 
From:Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] 
Sent: Sunday, August 28, 2011 4:43 AM
To: dev@lucene.apache.org
Subject: Enc: Final Evaluations results processed for LUCENE-1768: NumericRange 
support for new query parser
 
Hi,
 
As you can see in the email below, I have successfully passed on my GSOC 2011 
program. I am really glad right now when I look behind and see how much I have 
learned and contributed, much more than I had expected.
 
Thanks everyone for giving me this chance, specially to Uwe and Adriano who 
helped me a lot along the project.
 
Thanks again,
Vinicius Barros
 
- Mensagem encaminhada -
De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com
Para: viniciusbarros.g...@yahoo.com.br
Cc: nor...@apache.org; u...@apache.org; uschind...@apache.org
Enviadas: Sexta-feira, 26 de Agosto de 2011 15:10
Assunto: Final Evaluations results processed for LUCENE-1768: NumericRange 
support for new query parser
Hi Vinicius Barros, 
We have processed the evaluation for your project named LUCENE-1768: 
NumericRange support for new query parser with Apache Software Foundation. 
Congratulations, from our data it seems that you have successfully passed the 
Final Evaluations. Please contact your mentor to discuss the results of your 
evaluation and to plan your goals and development plan for the rest of the 
program 
Greetings, 
The Google Open Source Programs Team 

RE: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser

2011-08-30 Thread Uwe Schindler
Hi,

 

You should deprecate classes in 3.x that will go away in 4.0. Same applies
to methods and other code. Deprecation means that this code will no longer
work in 4.0 if it keeps unchanged (which is not completely true anymore in
4.0, as we also have completely backwards incompatible changes to lower
level APIs in Lucene).

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] 
Sent: Wednesday, August 31, 2011 2:42 AM
To: dev@lucene.apache.org
Subject: Re: Final Evaluations results processed for LUCENE-1768:
NumericRange support for new query parser

 

Hi Uwe,

 

Yes, the patch is almost ready, I was waiting your response to the question
below from LUCENE-1768, once I have the answer, I will submit the patch.

 

I have some few changes on 3x to submit yet, but I was wondering: is it
necessary to deprecate a class in 3x if it's ONLY going to be removed in
4.0? Not sure if I understand how these things work yet.

 

Regards,
Vinicius Barros

  _  

De: Uwe Schindler u...@thetaphi.de
Para: dev@lucene.apache.org; 'Vinicius Barros'
viniciusbarros.g...@yahoo.com.br
Enviadas: Segunda-feira, 29 de Agosto de 2011 7:37
Assunto: RE: Final Evaluations results processed for LUCENE-1768:
NumericRange support for new query parser

Hi Vinicius,

 

Thank you very much!

 

Will you submit the patch for upgrading trunk, too (after the 3.x changes)?
Once this is finished, we will commit changes.txt entries and close the
issue.

 

Uwe

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] 
Sent: Sunday, August 28, 2011 4:43 AM
To: dev@lucene.apache.org
Subject: Enc: Final Evaluations results processed for LUCENE-1768:
NumericRange support for new query parser

 

Hi,

 

As you can see in the email below, I have successfully passed on my GSOC
2011 program. I am really glad right now when I look behind and see how much
I have learned and contributed, much more than I had expected.

 

Thanks everyone for giving me this chance, specially to Uwe and Adriano who
helped me a lot along the project.

 

Thanks again,
Vinicius Barros

 

- Mensagem encaminhada -
De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com
Para: viniciusbarros.g...@yahoo.com.br
Cc: nor...@apache.org; u...@apache.org; uschind...@apache.org
Enviadas: Sexta-feira, 26 de Agosto de 2011 15:10
Assunto: Final Evaluations results processed for LUCENE-1768: NumericRange
support for new query parser

Hi Vinicius Barros, 

We have processed the evaluation for your project named LUCENE-1768:
NumericRange support for new query parser with Apache Software Foundation. 

Congratulations, from our data it seems that you have successfully passed
the Final Evaluations. Please contact your mentor to discuss the results of
your evaluation and to plan your goals and development plan for the rest of
the program 

Greetings, 
The Google Open Source Programs Team 

 

 



[jira] [Commented] (SOLR-752) Allow better Field Compression options

2011-08-30 Thread Kim Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094219#comment-13094219
 ] 

Kim Taylor commented on SOLR-752:
-

I've had a look into this. Simon is right, DefaultSolrHighlighter uses 
doc.getValues(fieldName) to retrieve the field. 
lucene.document.Document.getValues() calls stringValue on the appropriate 
field. The problem is that when FieldsWriter/Reader read fields from segments, 
the supplied CompressedField gets converted into a Field, which does not know 
how to interpret fieldsData. I've added another patch that alters 
DefaultSolrHighlighter to use the schema FieldType (in this case 
CompressedField) to properly interpred fieldsData.

 Allow better Field Compression options
 --

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: compressedtextfield.patch


 See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
 It would be good if Solr handled field compression outside of Lucene's 
 Field.COMPRESS capabilities, since those capabilities are less than ideal 
 when it comes to control over compression.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-752) Allow better Field Compression options

2011-08-30 Thread Kim Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kim Taylor updated SOLR-752:


Attachment: compressed_field.patch

Patch updated to modify DefaultSolrHighlighter

 Allow better Field Compression options
 --

 Key: SOLR-752
 URL: https://issues.apache.org/jira/browse/SOLR-752
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Priority: Minor
 Attachments: compressed_field.patch, compressedtextfield.patch


 See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression
 It would be good if Solr handled field compression outside of Lucene's 
 Field.COMPRESS capabilities, since those capabilities are less than ideal 
 when it comes to control over compression.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094256#comment-13094256
 ] 

Chris Male commented on LUCENE-2308:


I'm definitely -1 for a constructor for all the properties.  We may only have a 
few properties today but it's not going to stay that way.  One of the benefits 
I see of FieldType is that we can extend it to have a greater range of 
properties that allow more customized handling of fields.  If we put everything 
into a constructor then it'll grow out of control, prevent us from adding more 
properties and we'll end up having another project just to break it up more.

With an interface, we're not forcing anything on anyone.  Users can create 
FieldTypes however they like.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094258#comment-13094258
 ] 

Robert Muir commented on LUCENE-2308:
-

bq. I'm definitely -1 for a constructor for all the properties. We may only 
have a few properties today but it's not going to stay that way

What is the problem? I am talking about the core FieldType that is the main 
stuff we need: e.g. indexed/stored/etc

Subclasses can do whatever they want (builders/freezable/i dont care), but we 
should make a simple extendable immutable core class for the limited
set of properties that really need to be in the fieldtype subclass.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094263#comment-13094263
 ] 

Robert Muir commented on LUCENE-2308:
-

it doesn't lock us into anything. we deprecate the old ctor and make a new one 
with the correct default.

This is easy!

Besides, you arent assuring me? what new things really need to be added that 
apply to *all* field types across the board?!

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094265#comment-13094265
 ] 

Chris Male commented on LUCENE-2308:


Alright, I'll wait to see your patch using the ctor approach.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094266#comment-13094266
 ] 

Robert Muir commented on LUCENE-2308:
-

I'm not really motivated to work on a patch with a solution you already said 
you are -1 on?

Again, I'm totally against any builder here, I'm ok with freezable that we have 
now, but i think a plain immutable java object would be a lot simpler and 
clear... at the end of the day I'm ok with compromising with what we have 
already, I just brought this immutability idea up as a suggestion.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094272#comment-13094272
 ] 

Chris Male commented on LUCENE-2308:


Then we're at a bit of an impasse because you're totally against what I 
suggested as well.  I'm suggesting you cut a patch and we can go over it, I 
might see how it can be simpler and clearer.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2308) Separately specify a field's type

2011-08-30 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094293#comment-13094293
 ] 

Chris Male commented on LUCENE-2308:


What about a compromise and have both the plain Java object and a builder? That 
way if someone wants the simple clear way you describe, they have the ctor they 
can go to.  If they want what I feel would be more readable code, they can use 
the builder.

 Separately specify a field's type
 -

 Key: LUCENE-2308
 URL: https://issues.apache.org/jira/browse/LUCENE-2308
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, 
 LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, 
 LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, 
 LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, 
 LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, 
 LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, 
 LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, 
 LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, 
 LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, 
 LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, 
 LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, 
 LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch


 This came up from dicussions on IRC.  I'm summarizing here...
 Today when you make a Field to add to a document you can set things
 index or not, stored or not, analyzed or not, details like omitTfAP,
 omitNorms, index term vectors (separately controlling
 offsets/positions), etc.
 I think we should factor these out into a new class (FieldType?).
 Then you could re-use this FieldType instance across multiple fields.
 The Field instance would still hold the actual value.
 We could then do per-field analyzers by adding a setAnalyzer on the
 FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
 for per-field codecs (with flex), where we now have
 PerFieldCodecWrapper).
 This would NOT be a schema!  It's just refactoring what we already
 specify today.  EG it's not serialized into the index.
 This has been discussed before, and I know Michael Busch opened a more
 ambitious (I think?) issue.  I think this is a good first baby step.  We could
 consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
 off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org