On Dec 10, 2009, at 1:44 PM, Grant Ingersoll wrote: > > On Dec 10, 2009, at 1:26 PM, Yonik Seeley wrote: > >> I wouldn't necessarily link FieldType.isPolyField() to the idea of a >> poly value source... they are two different things. > > Yep. The word Poly is overloaded here to mean multiple ValueSources, but it > isn't necessarily tied to there being a poly field, even though a PolyField > likely would create a PolyValueSource > >> For example, if NumericField had not already been written in Lucene, I >> would have perhaps just indexed both the lat and lon into the same >> lucene field. That part can be more of an implementation detail, and >> does not reflect the semantics of the field (the fact that it contains >> both a lat and lon). > > Maybe, there are tradeoffs though. > > Let's get concrete and look at the VectorDistanceFunction (dist()). It can > currently take in an even number of ValueSource instances, and the distance > method essentially boils down to (for Euclidean Distance): > for (int i = 0; i < docValues1.length; i++) { > double v = docValues1[i].doubleVal(doc) - docValues2[i].doubleVal(doc); > result += v * v; > } > result = Math.sqrt(result); > > For example, a call to this might be: > dist(power = 2, x1, y1, x2, y2) - where xi, yi are ValueSources. //note power > = 2 is just me showing what the first parameter is so that no one wonders why > there is this extra number in there > > Now, assuming a PointType fields named point1 and point 2 (along with the > others above), one could have: > > dist(2, point1, point2) //distance between two PointTypes > dist(2, point1, x1, y2) //distance between a PointType and a user defined > point. >
D'oh, I see another way of doing this, namely the Distance functions only work with points. Namely, the second case above becomes: dist(2, point1, point(x1, y1)); -Grant > While I think this can be coded up in the lat/lon case (i.e. two values) I > think it gets hairy when you consider a point in n-dim. space. > > My inclination is to fudge on this and do something in ValueSourceParser for > each of the functions that can deal w/ poly fields (my gut says most can't) > like: > addParser("dist", new ValueSourceParser() { > public ValueSource parse(FunctionQParser fp) throws ParseException { > float power = fp.parseFloat(); > List<ValueSource> sources = fp.parseValueSourceList(); > if (sources.size() % 2 != 0) { > //expand if needed > List newSources = new ...; > for each sources > if (source is a PolyValueSource){ > List<ValueSource> tmp = > ((PolyValueSource)source).getValueSources(); > newSources.addAll(tmp); > else > newSources.add(source); //just like the old one > sources = newSources; > //Do the even check again here > if (sources.size() % 2 != 0){ > throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, > "Illegal number of sources. There must be an even number of sources"); > } > } > int dim = sources.size() / 2; > List<ValueSource> sources1 = new ArrayList<ValueSource>(dim); > List<ValueSource> sources2 = new ArrayList<ValueSource>(dim); > splitSources(dim, sources, sources1, sources2); > return new VectorDistanceFunction(power, sources1, sources2); > } > }); > > Of course, this requires documentation, etc. for others to be able to do the > same for their custom Functions, but that is surmountable. > > -Grant > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search