[JENKINS] Solr-3.x - Build # 451 - Failure
Build: https://builds.apache.org/job/Solr-3.x/451/ 1 tests failed. REGRESSION: org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload Error Message: Number of registered MBeans is not the same as info registry size expected:50 but was:51 Stack Trace: junit.framework.AssertionFailedError: Number of registered MBeans is not the same as info registry size expected:50 but was:51 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.solr.core.TestJmxIntegration.testJmxOnCoreReload(TestJmxIntegration.java:137) Build Log (for compile errors): [...truncated 19353 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093483#comment-13093483 ] Chris Male commented on LUCENE-3312: I'm almost done getting an initial patch for this, just one issue remaining - IndexDocValues. IndexDocValues can be both not indexed and not stored. Therefore when you retrieve the indexed fields and then the stored fields, you can miss some IndexDocValues. It seems to be that we might need a 3rd interface to cover these fields? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093499#comment-13093499 ] Simon Willnauer commented on LUCENE-3312: - bq. I'm almost done getting an initial patch for this, just one issue remaining - IndexDocValues. IndexDocValues can be both not indexed and not stored. Therefore when you retrieve the indexed fields and then the stored fields, you can miss some IndexDocValues. It seems to be that we might need a 3rd interface to cover these fields? To me it appears that we need some clarification what DocValues are. Actually, when you think about it Stored Fields and DocValues have a lot in common. A Stored Field is basically a DocValues DerefVarBytes type and maybe down the road we should think about merge those two types together. It would be nice to have only one typesafe API that can store whatever you want and based on the codec lucene would decide how to store it on disk ie. if it is a multi field container like Stored Fields are done today or if the values are split appart like DocValues does it today. For now we should try to differentiate between and InvertedField and a StoredField ie. everything which is not an InvertedField is a StoredField. The API could basically already reflect that DocValues and StoredFields are the same and simply specify a type like Store.Packed vs. Store.ColumnStride or something like that. If we do that we could also expose loading Packed Fields via PerDocValues and have one API for our users. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093513#comment-13093513 ] Chris Male commented on LUCENE-3312: Just for clarification, Packed refers to the notion of a stored field? or am I lost? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093520#comment-13093520 ] Simon Willnauer commented on LUCENE-3312: - bq. Just for clarification, Packed refers to the notion of a stored field? or am I lost? yes, since we pack all fields together into one location and store only the offset to the first field per document. I just used that term here to differentiate, sorry for the confusion Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093525#comment-13093525 ] Chris Male commented on LUCENE-3312: Given what you say about the similarities between stored fields and DocValues and the direction we seem to be heading, I think its a good term to start using. Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 357 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/357/ No tests ran. Build Log (for compile errors): [...truncated 10533 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093698#comment-13093698 ] Chris Male commented on LUCENE-3312: {quote} Good question... I think the userland Field (oal.document) should implement both IndexableField and StorableField? And then oal.document.Document holds Field instances? {quote} Hm I'm going round in circles on this. For building and indexing a Document, having the class hold Field instances is easiest and the most clean option. However this then means we are once again providing Field instances in the Document returned by reader.document(), meaning we lose: {quote} So I consider this (these indexing details are no longer available when you pull the document) a big benefit of cutting over to StorableField. Ie, its trappy today since it's buggy, so we'd be removing that trap. {quote} Thoughts? :/ Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3407) wrong stats/scoring from MemoryCodec
wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3407: Attachment: LUCENE-3407_test.patch patch with a test, goes against the flex scoring branch. wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093718#comment-13093718 ] Robert Muir commented on LUCENE-3407: - the issue only happens with omitTF, my guess is that its returning something strange for totalTermFreq... will try to make a better test. wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3407: Attachment: LUCENE-3407.patch patch for trunk. wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3407. - Resolution: Fixed I committed this: no problems with any other codecs. wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093791#comment-13093791 ] Michael McCandless commented on LUCENE-3407: Phew, nice catch! wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093802#comment-13093802 ] Chris Male commented on LUCENE-3312: So one thought would be to have a different class being returned reader.document(), we could call it StoredDocument and it would only make access to StorableFields. I like this idea since it gets people over the hump of piggybacking Field, but it is a bw compat break. Any objections? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093802#comment-13093802 ] Chris Male edited comment on LUCENE-3312 at 8/30/11 3:17 PM: - So one thought would be to have a different class being returned by reader.document(), we could call it StoredDocument and it would only make access to StorableFields. I like this idea since it gets people over the hump of piggybacking Field, but it is a bw compat break. Any objections? was (Author: cmale): So one thought would be to have a different class being returned reader.document(), we could call it StoredDocument and it would only make access to StorableFields. I like this idea since it gets people over the hump of piggybacking Field, but it is a bw compat break. Any objections? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093818#comment-13093818 ] Michael McCandless commented on LUCENE-3312: I agree DocValues seem both indexed and stored, but they are closer to stored field so let's put them under StorableField. And indeed we could impl stored fields as a DerefVarBytes doc values field, but I think we should hold off on unifying this in the APIs we are creating here? Ie, StorableField should just have a .docValues() method and if that returns non-null value the indexer will index those doc values (likewise for .stringValue(), binaryValue(), etc.)? Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093816#comment-13093816 ] Robert Muir commented on LUCENE-2308: - Thanks Mike! This is really helpful! While reviewing this/merging the flexscoring branch, I had a few ideas of improvements: * I think FT should be immutable? Personally, I don't think we should enforce patterns like Freezable or Builder at this low-level, instead I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations. I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally. * Along these lines, we can then remove the copy constructor, which also seems unnatural to java users, since FieldType would then be immutable there is no reason to ever copy it. * I think BinaryField should be able to index as binary? This is a new feature in Lucene 4 but its unfortunately really hard to do: there are a few approaches, but they are all difficult: custom tokenstream/AttributeImpls/AttributeFactory etc. * In the future, a BinaryField like this could be a base impl for CollationField: due to historical reasons we expose this capability as an Analyzer but I think this isn't great: its really an implementation detail. For example in Solr, its a real FieldType that uses an analyzer behind the scenes. So in this sense I think its more consistent with NumericField. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3312) Break out StorableField from IndexableField
[ https://issues.apache.org/jira/browse/LUCENE-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093825#comment-13093825 ] Michael McCandless commented on LUCENE-3312: bq. So one thought would be to have a different class being returned by reader.document(), we could call it StoredDocument and it would only make access to StorableFields. +1 Break out StorableField from IndexableField --- Key: LUCENE-3312 URL: https://issues.apache.org/jira/browse/LUCENE-3312 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Fix For: Field Type branch In the field type branch we have strongly decoupled Document/Field/FieldType impl from the indexer, by having only a narrow API (IndexableField) passed to IndexWriter. This frees apps up use their own documents instead of the user-space impls we provide in oal.document. Similarly, with LUCENE-3309, we've done the same thing on the doc/field retrieval side (from IndexReader), with the StoredFieldsVisitor. But, maybe we should break out StorableField from IndexableField, such that when you index a doc you provide two Iterables -- one for the IndexableFields and one for the StorableFields. Either can be null. One downside is possible perf hit for fields that are both indexed stored (ie, we visit them twice, lookup their name in a hash twice, etc.). But the upside is a cleaner separation of concerns in API -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3206) FST package API refactoring
[ https://issues.apache.org/jira/browse/LUCENE-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093830#comment-13093830 ] David Smiley commented on LUCENE-3206: -- I'm looking forward to these FST API improvements. It's a bit obtuse for something that is basically a SortedMap. FST package API refactoring --- Key: LUCENE-3206 URL: https://issues.apache.org/jira/browse/LUCENE-3206 Project: Lucene - Java Issue Type: Improvement Components: core/FSTs Affects Versions: 3.2 Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 3.4, 4.0 Attachments: LUCENE-3206.patch The current API is still marked @experimental, so I think there's still time to fiddle with it. I've been using the current API for some time and I do have some ideas for improvement. This is a placeholder for these -- I'll post a patch once I have a working proof of concept. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093856#comment-13093856 ] Michael McCandless commented on LUCENE-2308: bq. I think FT should be immutable? bq. I don't like the idea of mutable FieldTypes that are reused across different fields because I am concerned that somehow the 'wrong configuration' will be applied accidentally. This is why we have FT.freeze, and why an FT is frozen as soon as it's used in a Field. But I agree it'd be even better if we had true immutability (all fields in FT are final). bq. I think FieldType should be a simple immutable class with a single ctor that takes the minimal stuff that we (core lucene) need. bq. It can still be concrete, but then you have to specify everything. Then, things like TextField/StringField are sugar APIs for common configurations. This is a neat idea! Another plus is this is a single place where we can check consistency of the settings (eg you cannot enable term vectors if indexed is false). So this would mean we'd have alternate ctors to the sugar classes for the common cases, like maybe: {noformat} new StringField(name, value) new StoredStringField(name, value) {noformat} StringField would always omitNorms, not tokenize, index DOCS_ONLY. For TextField maybe: {noformat} new TextField(name, value) new TextField(name, value, omitNorms) new TextField(name, value, omitNorms, indexTVPos, indexTVOffsets) new StoredTextField(name, value) new StoredTextField(name, value, omitNorms) new StoredTextField(name, value, omitNorms, indexTVPos, indexTVOffsets) {noformat} Expert usage would always have the out of invoking FT directly with all options. Even more expert usage can bypass the userspace FieldType/Field/Document entirely and code directly to IndexableField instead. bq. I think BinaryField should be able to index as binary? I agree! Not sure on the details of how we'd do that though... Today this field is only stored byte[]. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10390 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10390/ All tests passed Build Log (for compile errors): [...truncated 12938 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093880#comment-13093880 ] Chris Male commented on LUCENE-2308: Much of what has been suggested here I'm looking at incorporating in LUCENE-3312 but perhaps its best to do it in small steps. What I want to do is as follows: - Change FieldType to an interface inside index.* and use it for the source of properties about an IndexableField. It will be simple and immutable and won't enforce any creation techniques. - Add a builder for FieldType to document.* which will create FieldType instances. - Add the syntactic sugar ctors suggested above which would use the builder to instantiate the FieldTypes they need. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2312: - Attachment: LUCENE-2312.patch Here's a new patch that incrementally adds field cache and norms values. Meaning that as documents are added / indexed, norms and field cache values are automatically created. The field cache values are only added to if they have already been created. The field cache functionality needs to be completed for all types. We probably need to get the indexing lock while the field cache value is initially being created (eg, the terms enumeration). We're more or less feature complete now. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch, LUCENE-2312.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10391 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10391/ No tests ran. Build Log (for compile errors): [...truncated 5361 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093895#comment-13093895 ] Robert Muir commented on LUCENE-2308: - {quote} Add a builder for FieldType to document.* which will create FieldType instances. {quote} Can we avoid the builder API? I think we shouldnt invite accidental creation of lots of FieldType instances during indexing... why not just a single ctor in fieldtype that takes all the parameters the base class cares about? then it serves double-duty as the 'expert' fieldtype anyway, subclasses like TextField are just the sugar. If someone wants to implement their *subclass* with a builder, they could still do this, but I don't think we should force the builder API on people with such a user-facing API: I think we should just keep it a very simple immutable class with one ctor. Given the choice between builder and freezable though, I'll take freezable any day... but as I said before I think we should keep this even simpler. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3406) Add source distribution packaging targets that make a tarball from a local working copy
[ https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned LUCENE-3406: --- Assignee: Steven Rowe Add source distribution packaging targets that make a tarball from a local working copy --- Key: LUCENE-3406 URL: https://issues.apache.org/jira/browse/LUCENE-3406 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 4.0 Reporter: Seung-Yeoul Yang Assignee: Steven Rowe Priority: Minor Labels: patch Fix For: 4.0 Attachments: LUCENE-3406.patch, LUCENE-3406.patch Original Estimate: 24h Remaining Estimate: 24h I am adding back targets that were removed in https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create source distribution packaging from a local working copy as new Ant targets. 2 things to note about the patch: 1) For package-local-src-tgz in solr/build.xml, I had to specify additional directories under solr/ that have been added since LUCENE-2973. 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate the docs folder, which does get added by package-tgz-src. The patch is against the trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3407) wrong stats/scoring from MemoryCodec
[ https://issues.apache.org/jira/browse/LUCENE-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093928#comment-13093928 ] Robert Muir commented on LUCENE-3407: - SimpleText had this issue too... i had not run the tests enough, we intentionally return SimpleText less often :) wrong stats/scoring from MemoryCodec Key: LUCENE-3407 URL: https://issues.apache.org/jira/browse/LUCENE-3407 Project: Lucene - Java Issue Type: Bug Affects Versions: flexscoring branch, 4.0 Reporter: Robert Muir Attachments: LUCENE-3407.patch, LUCENE-3407_test.patch I hit some random failures in the flexscoring branch: wierd because its not a random test. I noticed the test always failed with memorycodec, and wrote a specific test for it. I haven't traced thru it yet, but I think its likely the issue that memorycodec is somehow returning wrong stats here? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3406) Add source distribution packaging targets that make a tarball from a local working copy
[ https://issues.apache.org/jira/browse/LUCENE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094089#comment-13094089 ] Seung-Yeoul Yang commented on LUCENE-3406: -- I've tested both targets locally, and they work fine. Thanks Steve! Add source distribution packaging targets that make a tarball from a local working copy --- Key: LUCENE-3406 URL: https://issues.apache.org/jira/browse/LUCENE-3406 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 4.0 Reporter: Seung-Yeoul Yang Assignee: Steven Rowe Priority: Minor Labels: patch Fix For: 4.0 Attachments: LUCENE-3406.patch, LUCENE-3406.patch Original Estimate: 24h Remaining Estimate: 24h I am adding back targets that were removed in https://issues.apache.org/jira/browse/LUCENE-2973 that are used to create source distribution packaging from a local working copy as new Ant targets. 2 things to note about the patch: 1) For package-local-src-tgz in solr/build.xml, I had to specify additional directories under solr/ that have been added since LUCENE-2973. 2) I couldn't get the package-tgz-local-src in lucene/build.xml to generate the docs folder, which does get added by package-tgz-src. The patch is against the trunk. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2714) JsonLoader does not handle null field values
[ https://issues.apache.org/jira/browse/SOLR-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2714: --- Priority: Minor (was: Major) Issue Type: Improvement (was: Bug) Summary: JsonLoader does not handle null field values (was: JsonLoader does not handle null fields) JsonLoader does not handle null field values Key: SOLR-2714 URL: https://issues.apache.org/jira/browse/SOLR-2714 Project: Solr Issue Type: Improvement Affects Versions: 3.3 Reporter: Trygve Laugstøl Priority: Minor Attachments: SOLR-2714.patch The parser in JsonLoader does not handle null fields when adding a document over http+json. Given this document: {code} [{ timestamp:2011-08-17T14:11:49.201Z, correlationId:N44YFGSQNC, logType:event, short:Invalidating session: 4zy6cvdtmvu1erlay0sn6rhz, long:null }] {code} I'm getting a response code=400 and the error message should finish doc first in the logs. It seems that JsonLoader is missing case for JSONParser.NULL in the parser even switch. * https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/JsonLoader.java * https://svn.apache.org/repos/asf/labs/noggit/src/main/java/org/apache/noggit/JSONParser.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3408) Remove unnecessary memory barriers in DWPT
Remove unnecessary memory barriers in DWPT -- Key: LUCENE-3408 URL: https://issues.apache.org/jira/browse/LUCENE-3408 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Currently DWPT still uses AtomicLong to count the bytesUsed. Each write access issues an implicite memory barrier which is totally unnecessary since we doing everything single threaded on that level. This might be very minor but we shouldn't issue unnecessary memory barriers causing processors to lock their instruction pipeline for no reason. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3408) Remove unnecessary memory barriers in DWPT
[ https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3408: Attachment: LUCENE-3408.patch here is a patch that replaces the AtomicLong with a simple counter class that provides threadsafe and serial implementations. Remove unnecessary memory barriers in DWPT -- Key: LUCENE-3408 URL: https://issues.apache.org/jira/browse/LUCENE-3408 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3408.patch Currently DWPT still uses AtomicLong to count the bytesUsed. Each write access issues an implicite memory barrier which is totally unnecessary since we doing everything single threaded on that level. This might be very minor but we shouldn't issue unnecessary memory barriers causing processors to lock their instruction pipeline for no reason. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3408) Remove unnecessary memory barriers in DWPT
[ https://issues.apache.org/jira/browse/LUCENE-3408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3408: --- Assignee: Simon Willnauer Remove unnecessary memory barriers in DWPT -- Key: LUCENE-3408 URL: https://issues.apache.org/jira/browse/LUCENE-3408 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Minor Fix For: 4.0 Attachments: LUCENE-3408.patch Currently DWPT still uses AtomicLong to count the bytesUsed. Each write access issues an implicite memory barrier which is totally unnecessary since we doing everything single threaded on that level. This might be very minor but we shouldn't issue unnecessary memory barriers causing processors to lock their instruction pipeline for no reason. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2714) JsonLoader does not handle null field values
[ https://issues.apache.org/jira/browse/SOLR-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2714. Resolution: Fixed Fix Version/s: 3.4 JsonLoader does not handle null field values Key: SOLR-2714 URL: https://issues.apache.org/jira/browse/SOLR-2714 Project: Solr Issue Type: Improvement Affects Versions: 3.3 Reporter: Trygve Laugstøl Priority: Minor Fix For: 3.4 Attachments: SOLR-2714.patch The parser in JsonLoader does not handle null fields when adding a document over http+json. Given this document: {code} [{ timestamp:2011-08-17T14:11:49.201Z, correlationId:N44YFGSQNC, logType:event, short:Invalidating session: 4zy6cvdtmvu1erlay0sn6rhz, long:null }] {code} I'm getting a response code=400 and the error message should finish doc first in the logs. It seems that JsonLoader is missing case for JSONParser.NULL in the parser even switch. * https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/JsonLoader.java * https://svn.apache.org/repos/asf/labs/noggit/src/main/java/org/apache/noggit/JSONParser.java -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser
Hi Uwe, Yes, the patch is almost ready, I was waiting your response to the question below from LUCENE-1768, once I have the answer, I will submit the patch. I have some few changes on 3x to submit yet, but I was wondering: is it necessary to deprecate a class in 3x if it's ONLY going to be removed in 4.0? Not sure if I understand how these things work yet. Regards, Vinicius Barros De: Uwe Schindler u...@thetaphi.de Para: dev@lucene.apache.org; 'Vinicius Barros' viniciusbarros.g...@yahoo.com.br Enviadas: Segunda-feira, 29 de Agosto de 2011 7:37 Assunto: RE: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi Vinicius, Thank you very much! Will you submit the patch for upgrading trunk, too (after the 3.x changes)? Once this is finished, we will commit changes.txt entries and close the issue. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de From:Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] Sent: Sunday, August 28, 2011 4:43 AM To: dev@lucene.apache.org Subject: Enc: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi, As you can see in the email below, I have successfully passed on my GSOC 2011 program. I am really glad right now when I look behind and see how much I have learned and contributed, much more than I had expected. Thanks everyone for giving me this chance, specially to Uwe and Adriano who helped me a lot along the project. Thanks again, Vinicius Barros - Mensagem encaminhada - De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com Para: viniciusbarros.g...@yahoo.com.br Cc: nor...@apache.org; u...@apache.org; uschind...@apache.org Enviadas: Sexta-feira, 26 de Agosto de 2011 15:10 Assunto: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi Vinicius Barros, We have processed the evaluation for your project named LUCENE-1768: NumericRange support for new query parser with Apache Software Foundation. Congratulations, from our data it seems that you have successfully passed the Final Evaluations. Please contact your mentor to discuss the results of your evaluation and to plan your goals and development plan for the rest of the program Greetings, The Google Open Source Programs Team
RE: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser
Hi, You should deprecate classes in 3.x that will go away in 4.0. Same applies to methods and other code. Deprecation means that this code will no longer work in 4.0 if it keeps unchanged (which is not completely true anymore in 4.0, as we also have completely backwards incompatible changes to lower level APIs in Lucene). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de/ http://www.thetaphi.de eMail: u...@thetaphi.de From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] Sent: Wednesday, August 31, 2011 2:42 AM To: dev@lucene.apache.org Subject: Re: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi Uwe, Yes, the patch is almost ready, I was waiting your response to the question below from LUCENE-1768, once I have the answer, I will submit the patch. I have some few changes on 3x to submit yet, but I was wondering: is it necessary to deprecate a class in 3x if it's ONLY going to be removed in 4.0? Not sure if I understand how these things work yet. Regards, Vinicius Barros _ De: Uwe Schindler u...@thetaphi.de Para: dev@lucene.apache.org; 'Vinicius Barros' viniciusbarros.g...@yahoo.com.br Enviadas: Segunda-feira, 29 de Agosto de 2011 7:37 Assunto: RE: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi Vinicius, Thank you very much! Will you submit the patch for upgrading trunk, too (after the 3.x changes)? Once this is finished, we will commit changes.txt entries and close the issue. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de http://www.thetaphi.de/ eMail: u...@thetaphi.de From: Vinicius Barros [mailto:viniciusbarros.g...@yahoo.com.br] Sent: Sunday, August 28, 2011 4:43 AM To: dev@lucene.apache.org Subject: Enc: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi, As you can see in the email below, I have successfully passed on my GSOC 2011 program. I am really glad right now when I look behind and see how much I have learned and contributed, much more than I had expected. Thanks everyone for giving me this chance, specially to Uwe and Adriano who helped me a lot along the project. Thanks again, Vinicius Barros - Mensagem encaminhada - De: no-re...@socghop.appspotmail.com no-re...@socghop.appspotmail.com Para: viniciusbarros.g...@yahoo.com.br Cc: nor...@apache.org; u...@apache.org; uschind...@apache.org Enviadas: Sexta-feira, 26 de Agosto de 2011 15:10 Assunto: Final Evaluations results processed for LUCENE-1768: NumericRange support for new query parser Hi Vinicius Barros, We have processed the evaluation for your project named LUCENE-1768: NumericRange support for new query parser with Apache Software Foundation. Congratulations, from our data it seems that you have successfully passed the Final Evaluations. Please contact your mentor to discuss the results of your evaluation and to plan your goals and development plan for the rest of the program Greetings, The Google Open Source Programs Team
[jira] [Commented] (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094219#comment-13094219 ] Kim Taylor commented on SOLR-752: - I've had a look into this. Simon is right, DefaultSolrHighlighter uses doc.getValues(fieldName) to retrieve the field. lucene.document.Document.getValues() calls stringValue on the appropriate field. The problem is that when FieldsWriter/Reader read fields from segments, the supplied CompressedField gets converted into a Field, which does not know how to interpret fieldsData. I've added another patch that alters DefaultSolrHighlighter to use the schema FieldType (in this case CompressedField) to properly interpred fieldsData. Allow better Field Compression options -- Key: SOLR-752 URL: https://issues.apache.org/jira/browse/SOLR-752 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Attachments: compressedtextfield.patch See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression It would be good if Solr handled field compression outside of Lucene's Field.COMPRESS capabilities, since those capabilities are less than ideal when it comes to control over compression. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-752) Allow better Field Compression options
[ https://issues.apache.org/jira/browse/SOLR-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kim Taylor updated SOLR-752: Attachment: compressed_field.patch Patch updated to modify DefaultSolrHighlighter Allow better Field Compression options -- Key: SOLR-752 URL: https://issues.apache.org/jira/browse/SOLR-752 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Attachments: compressed_field.patch, compressedtextfield.patch See http://lucene.markmail.org/message/sd4mgwud6caevb35?q=compression It would be good if Solr handled field compression outside of Lucene's Field.COMPRESS capabilities, since those capabilities are less than ideal when it comes to control over compression. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094256#comment-13094256 ] Chris Male commented on LUCENE-2308: I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way. One of the benefits I see of FieldType is that we can extend it to have a greater range of properties that allow more customized handling of fields. If we put everything into a constructor then it'll grow out of control, prevent us from adding more properties and we'll end up having another project just to break it up more. With an interface, we're not forcing anything on anyone. Users can create FieldTypes however they like. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094258#comment-13094258 ] Robert Muir commented on LUCENE-2308: - bq. I'm definitely -1 for a constructor for all the properties. We may only have a few properties today but it's not going to stay that way What is the problem? I am talking about the core FieldType that is the main stuff we need: e.g. indexed/stored/etc Subclasses can do whatever they want (builders/freezable/i dont care), but we should make a simple extendable immutable core class for the limited set of properties that really need to be in the fieldtype subclass. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094263#comment-13094263 ] Robert Muir commented on LUCENE-2308: - it doesn't lock us into anything. we deprecate the old ctor and make a new one with the correct default. This is easy! Besides, you arent assuring me? what new things really need to be added that apply to *all* field types across the board?! Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094265#comment-13094265 ] Chris Male commented on LUCENE-2308: Alright, I'll wait to see your patch using the ctor approach. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094266#comment-13094266 ] Robert Muir commented on LUCENE-2308: - I'm not really motivated to work on a patch with a solution you already said you are -1 on? Again, I'm totally against any builder here, I'm ok with freezable that we have now, but i think a plain immutable java object would be a lot simpler and clear... at the end of the day I'm ok with compromising with what we have already, I just brought this immutability idea up as a suggestion. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094272#comment-13094272 ] Chris Male commented on LUCENE-2308: Then we're at a bit of an impasse because you're totally against what I suggested as well. I'm suggesting you cut a patch and we can go over it, I might see how it can be simpler and clearer. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094293#comment-13094293 ] Chris Male commented on LUCENE-2308: What about a compromise and have both the plain Java object and a builder? That way if someone wants the simple clear way you describe, they have the ctor they can go to. If they want what I feel would be more readable code, they can use the builder. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-10.patch, LUCENE-2308-11.patch, LUCENE-2308-12.patch, LUCENE-2308-13.patch, LUCENE-2308-14.patch, LUCENE-2308-15.patch, LUCENE-2308-16.patch, LUCENE-2308-17.patch, LUCENE-2308-18.patch, LUCENE-2308-19.patch, LUCENE-2308-2.patch, LUCENE-2308-20.patch, LUCENE-2308-21.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308-7.patch, LUCENE-2308-8.patch, LUCENE-2308-9.patch, LUCENE-2308-branch.patch, LUCENE-2308-final.patch, LUCENE-2308-ltc.patch, LUCENE-2308-merge-1.patch, LUCENE-2308-merge-2.patch, LUCENE-2308-merge-3.patch, LUCENE-2308.branchdiffs, LUCENE-2308.branchdiffs.moved, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org