A couple of thoughts...
- should this be specific to highlighting? (if not, the name should change)
- compression options make sense for both text and string fields...
perhaps it should just be added there.
- if you store term vectors for longer fields, shouldn't you just
store them for all fields (the longer ones will presumably take up the
bulk of the index anyway)
Regarding term vectors... like some other field properties, they are
per-field and not per-field-instance (so you can't turn it on for some
and off for others). On document retrieval, I think one would detect
that term vectors were stored, but one wouldn't get back any terms (I
haven't tried this though). I doubt the highlighter handles this
case.
-Yonik
On 8/31/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
Hi,
I was thinking of enabling the compressed=True field option (which
currently has no effect), as compression is important for highlighting
large fields (since they must be stored).
However, rather than exposing a lucene implementation detail, I
decided to create a FieldType which dynamically chooses to compress
and/or term-vector a field depending on the field length (configurable
in the field type).
Any objections to commiting this?
-Mike
public class HighlitTextField extends TextField {
/* if field size (in characters) is greater than this threshold, the field
will be stored compressed */
public static int DEFAULT_COMPRESS_THRESHOLD = 200;
/* if field size (in characters) is greater than this threshold, the field
will have term vector data stored */
public static int DEFAULT_TERMVEC_THRESHOLD = 500;
int compressThreshold;
int termVecThreshold;
private static String CT = "compressThreshold";
private static String TV = "termVecThreshold";
protected void init(IndexSchema schema, Map<String,String> args) {
SolrParams p = new MapSolrParams(args);
compressThreshold = p.getInt(CT, DEFAULT_COMPRESS_THRESHOLD);
termVecThreshold = p.getInt(TV, DEFAULT_TERMVEC_THRESHOLD);
for(String prop: new String[]{CT, TV})
args.remove(prop);
super.init(schema, args);
}
/* Helpers for field construction */
protected Field.TermVector getFieldTermVec(SchemaField field,
String internalVal) {
/* store all termvec data if field length exceeds threshold */
return internalVal.length() >= termVecThreshold ?
Field.TermVector.WITH_POSITIONS_OFFSETS : Field.TermVector.NO;
}
protected Field.Store getFieldStore(SchemaField field,
String internalVal) {
/* compress field if length exceeds threshold */
return internalVal.length() >= compressThreshold ?
Field.Store.COMPRESS : Field.Store.YES;
}
}