Re: [jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-30 Thread Andrzej Bialecki

On 30/05/2012 17:09, Robert Muir (JIRA) wrote:

I'm not sure this is true: e.g. if your postings format requires parameters to 
decode the segment, then this enforces that it records said parameters,
e.g. Pulsing records these parameters.

Codec parameters are at index-time, at read-time its your responsibility to be 
able to decode them solely from the index (this enforces that there doesnt need
to be a crazy matching of user configuration at write and read time).


I think what Mark is missing (and I saw as a limiting factor in 
developing other codecs) is to make it easier to customize Codec-s based 
on composition of reusable blocks, without necessarily needing a 
separate Codec class implementation.


This could be worked around by having a configurable codec that stores 
its configuration and instantiates necessary reusable blocks, available 
using the SPI mechanism. On writing you could specify this configuration 
as Codec attributes, and they could be written out e.g. to SegmentInfos, 
and on read they would become available from SegmentInfos.attributes.


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-4069) Segment-level Bloom filters for a 2 x speed up on rare term searches

2012-05-30 Thread Robert Muir
On Wed, May 30, 2012 at 11:43 AM, Andrzej Bialecki a...@getopt.org wrote:
 On 30/05/2012 17:09, Robert Muir (JIRA) wrote:

 I'm not sure this is true: e.g. if your postings format requires
 parameters to decode the segment, then this enforces that it records said
 parameters,
 e.g. Pulsing records these parameters.

 Codec parameters are at index-time, at read-time its your responsibility
 to be able to decode them solely from the index (this enforces that there
 doesnt need
 to be a crazy matching of user configuration at write and read time).


 I think what Mark is missing (and I saw as a limiting factor in developing
 other codecs) is to make it easier to customize Codec-s based on composition
 of reusable blocks, without necessarily needing a separate Codec class
 implementation.

 This could be worked around by having a configurable codec that stores its
 configuration and instantiates necessary reusable blocks, available using
 the SPI mechanism. On writing you could specify this configuration as Codec
 attributes, and they could be written out e.g. to SegmentInfos, and on read
 they would become available from SegmentInfos.attributes.


Well I think honestly here a bug in PerFieldPostingsFormat is
definitely confusing the situation (LUCENE-4090).

You should be able to set Pulsing(1) on id field and Pulsing(2) on
date field and everything just work: but I broke that. I think thats
whats causing the most grief.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org