Hi David, On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote: > I'm not sure what to make of "or index using a heterogeneous field > schema, grouping the different doc type instances with a unique key > (the one) to form a composite doc"
Lucene is schema-free - you can mix and match different document types in a single index. You could emulate this in Solr by merging the two document types and leaving blank the parts that are inapplicable to a given instance. E.g.: Address-doc-type: Field: Unique-key Field: Street Field: City ... Everything-else-doc-type: Field: Unique-key Field: Blob-o'-text ... Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ... Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ... Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ... .... > I could use the scheme you mention provided with the spanNear query but > it conflates different fields into one indexed field which will mess > with the scoring and make queries like range queries if there are dates > involved next to impossible. I agree, dimensional reduction can be an issue, though I'm sure there are use cases where the attendant scoring distortion would be acceptable, e.g. non-scoring filters. (Stuffing a variable number of addresses into a single document will also "mess with the scoring" unless you turn off norms, which is of course another form of scoring-messing.) I've seen a couple of different mentions of private SpanRangeQuery implementations on the mailing lists, so range queries likely wouldn't be a problem for long, should it become a general issue. > This "solution" is really a hack workaround to a limitation in > Lucene/Solr. I was hoping to start a conversation to a more > truer resolution to this problem rather than these workarounds > which aren't always satisfactory. Limitation: Solr/Lucene is not a database. "Solutions": 1. Hack workaround 2. Rewrite Solr/Lucene to be a database 3. ? (fill in "more truer resolution" here) Good luck, Steve