[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-05 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675907#comment-13675907
 ] 

Grant Ingersoll commented on LUCENE-4055:
-

Hmm, Mike, CODEC_FILE_PATTERN is package access only.  Easy enough to 
replicate/fix, any reason not too?

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676083#comment-13676083
 ] 

Michael McCandless commented on LUCENE-4055:


Hmm looks like it's package private in 4.3 but is (will be) public in 
4.x/trunk.  Just replicate for now :)

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-02 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672556#comment-13672556
 ] 

Grant Ingersoll commented on LUCENE-4055:
-

[~mikemccand] what's the replacement strategy for IndexFileNameFilter?  I'm 
updating some old code from 3.x to 4.3 (MAHOUT-944) and not sure on what the 
equivalent approach is, or whether I even need it.  (Note, I'm still trying to 
figure out whether the patch for MAHOUT-944 is even the best way to do what it 
is trying to do, but I want to at least get it compiling first)

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672622#comment-13672622
 ] 

Michael McCandless commented on LUCENE-4055:


Hi [~gsingers], I think you can use the IndexFileNames.CODEC_FILE_PATTERN in 
your filter?  You may need to add in segments_N and segments.gen as well ...

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0-ALPHA

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-31 Thread Renaud Delbru

Thanks Robert for the answers,
I'll investigate this approach.
--
Renaud Delbru

On 28/05/12 21:59, Robert Muir (JIRA) wrote:


 [ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284553#comment-13284553
 ]

Robert Muir commented on LUCENE-4055:
-

Well you can do postingsFormat instanceof PerFieldPostingsFormat + 
postingsFormat.getPostingsFormatForField if you really want.

But keep in mind PerFieldPostingsFormat is not really special and just one we 
provide for convenience, obviously one could write their own PostingsFormat
that implements the same thing in a different way.



Refactor SegmentInfo / FieldInfo to make them extensible


 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should 
be made abstract so that they can be extended by Codec-s.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284523#comment-13284523
 ] 

Robert Muir commented on LUCENE-4055:
-

Renaud, what you are talking about is not really related to this issue: this 
issue is about allowing codecs to add metadata to SegmentInfo and FieldInfo.

what you are talking about is the indexing chain, if you want to customize 
that, just set a custom one on IndexWriterConfig.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284533#comment-13284533
 ] 

Renaud Delbru commented on LUCENE-4055:
---

Hi Robert,

sorry if it seemed a bit out of context, but I am trying to understand how to 
properly do it.

Indeed, I can create my own indexing chain which includes my TermsHashConsumer 
customisation. However, I would still need the codec metadata for every field. 
But from what you told me, it seems that this codec specific metadata could be 
now added to FeildInfo. Is that correct ?

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284537#comment-13284537
 ] 

Robert Muir commented on LUCENE-4055:
-

{quote}
However, I would still need the codec metadata for every field.
{quote}

What codec metadata? this change only allows codecs to add additional things to 
fieldinfo/segmentinfo so they can later be read when the segment is opened. 
E.g. a CompressingStoredFieldsWriter could put in the segmentinfo an additional 
key-value like CompressingStoredFieldsWriter.algorithm=deflate

These are private to that component basically: i don't understand why your 
indexing chain would care about this? its at a level above all that.


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284544#comment-13284544
 ] 

Renaud Delbru commented on LUCENE-4055:
---

{quote}
What codec metadata?
{quote}

Metadata that indicates which codec is used for a particular field.

Let say I want to have a specific TermsHashConsumerPerField depending on the 
codec used by a field. For example, for field A and field B which use the 
Lucen40 codec, we need to use the FreqProxTermsWriterPerField. And for field C 
that uses my own specific codec, I need to use the MyOwnTermsWriterPerField.

My current understanding tells me that to do this, the only way is to customise 
the IndexingChain with a new TermsHashConsumer that overrides the method 
TermsHashConsumer#addField(TermsHashPerField termsHashPerField, FieldInfo 
fieldInfo). This method addField will be able to instantiate the correct 
TermsHashConsumerPerField if and only if there is codec metadata in the 
FieldInfo parameter. That's why I am interested of using a customised FieldInfo 
to store codec-related metadata about a field.

Or is there a better way to get codec-related information about a field in the 
IndexingChain ?


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284548#comment-13284548
 ] 

Robert Muir commented on LUCENE-4055:
-

Codecs are not per-field, they encode the entire inverted index segment.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284551#comment-13284551
 ] 

Renaud Delbru commented on LUCENE-4055:
---

Sorry, I meant PostingsFormat instead of Codec.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284553#comment-13284553
 ] 

Robert Muir commented on LUCENE-4055:
-

Well you can do postingsFormat instanceof PerFieldPostingsFormat + 
postingsFormat.getPostingsFormatForField if you really want.

But keep in mind PerFieldPostingsFormat is not really special and just one we 
provide for convenience, obviously one could write their own PostingsFormat 
that implements the same thing in a different way.


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-27 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284153#comment-13284153
 ] 

Renaud Delbru commented on LUCENE-4055:
---

Does this patch allows FreqProxTermsWriterPerField to be dependent to the 
Codec/Field ? I have a use case where I need my own FreqProxTermsWriterPerField 
for certain fields.

I am asking this because I see in your patch the following line:

{code}
final FreqProxTermsWriterPerField fieldWriter = allFields.get(fieldNumber);
{code}

which seems to retrieve a particular FreqProxTermsWriterPerField for each field 
type.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-25 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283725#comment-13283725
 ] 

Andrzej Bialecki  commented on LUCENE-4055:
---

+1, this looks very good.

One comment re. SegmentInfoPerCommit. This class is not extensible and contains 
a fixed set of attributes. In LUCENE-3837 this or similar place would be the 
ideal mechanism to carry info about stacked segments, since this information is 
specific to a commit point. Unfortunately, there are no MapString,String 
attributes on this level, so I guess for now this type of aux data will have to 
be put in SegmentInfos.userData even though it's not index global but 
segment-specific.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283738#comment-13283738
 ] 

Robert Muir commented on LUCENE-4055:
-

Right, but I think this is correct: the codec should be responsible for 
encode/decode of inverted index segments only (the whole problem here 
originally was trying to have it also look after commits).

So it really shouldn't be customizing things about the commit, as that creates 
a confusing impedance mismatch.

I think things like stacked segments in LUCENE-3837 that need to do things 
other than implement encoding/decoding of segment should be above the codec 
level: since its a separate concern, if someone wants to have updatable fields 
thats unrelated to the integer compression algorithm used or what not.


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-25 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283755#comment-13283755
 ] 

Andrzej Bialecki  commented on LUCENE-4055:
---

bq. stacked segments in LUCENE-3837 that need to do things other than implement 
encoding/decoding of segment should be above the codec level ..
Certainly, that's why it would make sense to put this extended info in 
SegmentInfoPerCommit and not in any file handled by Codec.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283793#comment-13283793
 ] 

Robert Muir commented on LUCENE-4055:
-

{quote}
My comment was about the lack of easy extensibility of the codec-independent 
per-segment data (SegmentInfoPerCommit - info about stacked data is per-segment 
and per-commit), so LUCENE-3837 will need to use for now the codec-independent 
index-global data (SegmentInfos).
{quote}

Well SegmentInfos is just the list of SegmentInfoPerCommit, so I think in the 
case of LUCENE-3837 we would just place this data alongside the only other 
per-segment-per-commit data: deletes (e.g. we would add something like 
updatesGen and updatesCount or whatever).

Sure, we have to bump the file header and what not, but the file is so simple 
now in the sense its just a list of segment names, the codec to decode them, 
with their deletes, that it wouldn't be a big deal.

And I think per-segment-per-commit data is pretty rare: we only have deletes, 
and in the future updates, but I can't imagine lots of other stuff belonging in 
this category (versions the per-segment metadata in SI which has been pretty 
volatile in the past).


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-4055.patch


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281105#comment-13281105
 ] 

Robert Muir commented on LUCENE-4055:
-

Just some updates from the work in the branch (scary changes but proceeding 
nicely since Mike jumped in and did a lot of it).
Here's a list of the current progress:

* on disk, the segments_N is reduced to the stuff that actually is per-commit: 
a list of segments and deleted gens/counts, etc.
* per-segment metadata (doc count, diagnostics, etc) that is write-once is 
encoded by the codec, e.g. for 4.0's codec this is in the .si file.
* removed backwards-seeking on segments_N. so appendingcodec still works but 
doesn't need any special hacks.
* flush/merge order is changed so that fieldinfos are written last so codecs 
have a chance to add metadata to it.
* fieldinfo has a codec metadata api that codec components can use, and that 
metadata will be available on reading the segment. this metadata 
  is for the codec to use to extend fieldinfo, its not carried along during 
merge or anything. 
* PerFieldPostingsFormat is changed to use the fieldinfo metadata api rather 
than a separate .per file (e.g. it records that the id field uses Pulsing).
* all the hairiness involving files() is really nice now, instead we simply 
track which files were created, and add them to the .si file. Previously
  there was a lot of logic to compute this in a symmetric way at both read and 
write time, and if you had a bug, your punishment was FNFE.

not yet done:
* add metadata api to segmentinfo too, so that codec components can record 
per-segment information that they care about.
* see if we can implement 3.x's shared doc stores support with segmentinfo 
metadata api. This is tricky to do and for addIndexes/indexSplitter etc which
  do sneaky things to still work.
* see if we can implement 3.x normGen (separate norms) with segmentinfo 
metadata. while in 3.x lucene this was actually per-commit, since 3.x support
  is read-only we can effectively treat it as per-segment this way.
* rename stuff so that we have a clearer naming for some of these classes.

I'm also probably missing a few other things. In general I'm pretty happy with 
the metadata key-value attributes api versus subclassing. 

I tried to make subclassing work, but subclassing turned really ugly fast and 
made various codec components too tightly-coupled, e.g. 
if someone wants to combine a CompressedStoredFields with a 
PerFieldPostingsFormat and SpecialTermVectors, what would the impls be :). 

So the overly simple MapString,String avoids these issues, and hey its just 
metadata after all so I don't think anything more complex is really needed. 


 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276363#comment-13276363
 ] 

Robert Muir commented on LUCENE-4055:
-

Branch location for this issue and LUCENE-4050: 
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4055



 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-13 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274412#comment-13274412
 ] 

Andrzej Bialecki  commented on LUCENE-4055:
---

+1. Per commit information could be named CommitInfo (which it really is). I 
like SegmentMetadata - if we left SegmentInfo as a name it would be confusing 
with its current functionality.

Introspection could return MapString,Object to avoid converting e.g. numeric 
values back and forth.

 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2012-05-13 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274401#comment-13274401
 ] 

Robert Muir commented on LUCENE-4055:
-

rough plan:
* create branch for this and LUCENE-4050, the work is related really.
* separate per commit information (SegmentList? Commit?) from per-segment 
metadata (SegmentInfo? SegmentMetadata?). the latter is basically si and fi. 
these names can change. maybe fi is still preserved mostly asis.
* clean up the way per-segment metadata is used in such a way that it is 
abstract and minimal to what IndexWriter/MP needs, not full of codec specific 
stuff.
* ensure codecs can privately write any metadat they need (eg hasProx) and that 
its accessible via all codec apis.
* try to ensure codecs write things they need like hasProx to compute files() 
without iterating over fieldinfos. this would be a bonus, its currently a 
concern of mine.
* add a basic introspection api (like mapstring,string) for tools like luke 
to be able to display codec private values.



 Refactor SegmentInfo / FieldInfo to make them extensible
 

 Key: LUCENE-4055
 URL: https://issues.apache.org/jira/browse/LUCENE-4055
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Robert Muir
 Fix For: 4.0


 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
 should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org