Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
> I'm not sure what to make of "or index using a heterogeneous field
> schema, grouping the different doc type instances with a unique key
> (the one) to form a composite doc"

Lucene is schema-free - you can mix and match different document types in a 
single index.  You could emulate this in Solr by merging the two document types 
and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type: 
        Field: Unique-key
        Field: Street
        Field: City
        ...
        
Everything-else-doc-type:
        Field: Unique-key
        Field: Blob-o'-text
        ...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...
....

> I could use the scheme you mention provided with the spanNear query but
> it conflates different fields into one indexed field which will mess
> with the scoring and make queries like range queries if there are dates
> involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use 
cases where the attendant scoring distortion would be acceptable, e.g. 
non-scoring filters.  (Stuffing a variable number of addresses into a single 
document will also "mess with the scoring" unless you turn off norms, which is 
of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery 
implementations on the mailing lists, so range queries likely wouldn't be a 
problem for long, should it become a general issue.

> This "solution" is really a hack workaround to a limitation in
> Lucene/Solr.  I was hoping to start a conversation to a more
> truer resolution to this problem rather than these workarounds
> which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.  

"Solutions":
        1. Hack workaround
        2. Rewrite Solr/Lucene to be a database
        3. ? (fill in "more truer resolution" here)

Good luck,
Steve

Reply via email to