To eliminate the possibility of errors, you need to buffer the query as
indicated in the wiki.  If you don't and you use a super-small maxDistErr
as you tell me you are doing, then you are merely making the probability
of hitting an error small (perhaps even very very small), but not
nonexistent.  I wish there was a field type that wrapped all this up so
that users wouldn't have to concern themselves with these tricky details.
I created an issue to track it:

~ David

On 7/24/13 9:26 AM, "Kevin Stone" <> wrote:

>I tried reducing the maxDistErr to "0.01", just to test making it smaller.
>I got maxLevels down to 45, and slightly better query times (Indexing time
>was about the same). However, my queries are not accurate anymore. I need
>to pad by 2 or 3 whole numbers to get a hit now, which won't work in real
>use. I can play with the number a bit more, but I didn't see anything
>wrong when I had it at "0.000000009". I do know about using a small
>decimal value to pad around my coordinates, and I'll probably do that for
>the real implementation, but for testing, whole numbers were working for
>all my edge cases.
>On 7/23/13 10:45 PM, "Smiley, David W." <> wrote:
>>Those are some good query response times but they could be better.
>>configured the field type sub-optimally.  Look again at
>> and note in
>>maxDistErr.  You've left it at the value that comes pre-configured with
>>Solr, 0.000000009, which is ~1 meter measured in degrees, and this value
>>makes no sense when your numeric range is in whole numbers.  I suspect
>>inherited this value from Hoss's slides.  **Instead use 1.** (as shown on
>>the wiki). This affects performance in a big way since you've configured
>>the prefixTree to hold 2.22e18 values (calculated via (max-min) /
>>maxDistErr) as opposed to "just" 2e10.  Your log shows maxLevels is 50
>>quad tree.  The comments in QuadPrefixTree (and I put them there once)
>>indicate maxLevels of 50 is about as much as is supported.  But again,
>>not certain what the limit really is without validating.  Hopefully you
>>can stay clear of 50.  To do some tests, try querying just on the edge on
>>either side of an indexed value to make sure you match the point and then
>>don't match the indexed point as you would expect based on the
>>instructions.  Also, be sure to read more of the details on "Search" on
>>this wiki page in which you are advised to buffer the query shape
>>slightly; you didn't do this in your examples below.  This is all a bit
>>a hack when using a field that internally is using floating point instead
>>of fixed precision.
>>~ David Smiley
>>On 7/23/13 9:32 PM, "Kevin Stone" <> wrote:
>>>Sorry for the late response. I needed to find the time to load a lot of
>>>extra data (closer to what we're anticipating). I have an index with
>>>to 220,000 documents, each with at least two coordinate regions anywhere
>>>between -10 billion to +10 billion, but could potentially have up to
>>>half dozen regions in one document. The reason for the negatives, is
>>>because you can read a chromosome either backwards or forwards, so many
>>>coordinates can be minus.
>>>Here is the schema field definition:
>>>        <fieldType name="geneticLocation"
>>>         class="solr.SpatialRecursivePrefixTreeFieldType"
>>>         multiValued="true"
>>>         geo="false"
>>>         worldBounds="-100000000000 -100000000000 100000000000
>>>         distErrPct="0"
>>>         maxDistErr="0.000000009"
>>>         units="degrees"
>>>         />
>>>Here is the first query in the log:
>>>rrPct=0, geo=false, multiValued=true, worldBounds=-100000000000
>>>-100000000000 100000000000 100000000000, maxDistErr=0.000000009,
>>>units=degrees}} strat:
>>>evels:50,ctx:SpatialContext{geo=false, calculator=CartesianDistCalc,
>>>maxLevels: 50
>>>Jul 23, 2013 9:11:45 PM org.apache.solr.core.SolrCore execute
>>>INFO: [testIndex] webapp=/solr path=/select
>>>)"&rows=100} hits=81112 status=0 QTime=122
>>>Here are some other queries to give different timings (the one above
>>>brings back quite a lot):
>>>INFO: [testIndex] webapp=/solr path=/select
>>>00000)"&rows=100} hits=6031 status=0 QTime=10
>>>Jul 23, 2013 9:13:43 PM org.apache.solr.core.SolrCore execute
>>>INFO: [testIndex] webapp=/solr path=/select
>>>s=100} hits=500 status=0 QTime=15
>>>Jul 23, 2013 9:14:14 PM org.apache.solr.core.SolrCore execute
>>>INFO: [testIndex] webapp=/solr path=/select
>>>"&rows=100} hits=4 status=0 QTime=17
>>>INFO: [testIndex] webapp=/solr path=/select
>>>057963+0)"&rows=100} hits=661 status=0 QTime=8
>>>The query times look pretty fast to me. Certainly I'm pretty impressed.
>>>Our other backup solutions (involving SQL) likely wouldn't even touch
>>>in terms of speed.
>>>We will be testing this more in depth in the coming month. I am sort of
>>>jumping ahead of our team to research possible solutions, since this is
>>>something that worried us. Looks like it might work!
>>>On 7/23/13 1:47 PM, "David Smiley (" <>
>>>>Oh cool!  I'm glad it at least seemed to work.  Can you post your
>>>>configuration of the field type and report from Solr's logs what the
>>>>"maxLevels" is used for this field, which is logged the first time you
>>>>the field type?
>>>>Maybe there isn't a limit under 10B after all.  Some quick'n'dirty
>>>>calculations I just did indicate there shouldn't be a problem but
>>>>usage will be a better proof.  Indexing probably won't be terribly
>>>>queries could get pretty slow if the amount of indexed data is really
>>>>I'd love to hear how it works out for you.  Your use-case would benefit
>>>>lot from an improved prefix tree implementation.
>>>>I don't gather how a 3rd dimension would play into this.  Support for
>>>>multi-dimensional spatial is on the drawing board.
>>>>~ David
>>>>Kevin Stone wrote
>>>>> What are the dangers of trying to use a range of 10 billion? Simply a
>>>>> slower index time? Or will I get inaccurate results?
>>>>> I have tried it on a very small sample of documents, and it seemed to
>>>>> work. I could spend some time this week trying to get a more robust
>>>>> accurate) dataset loaded to play around with. The reason for the 10
>>>>> billion is to support being able to query for a region on a
>>>>> A user might want to know what genes overlap a point on a specific
>>>>> chromosome. Unless I can use 3 dimensional coordinates (which gave an
>>>>> error when I tried it), I'll need to multiply the coordinates by some
>>>>> offset for each chromosome to be able to normalise the data (at both
>>>>> and query time). The largest chromosome (chr 1) has almost
>>>>> base pairs. I could probably squeeze the rest a bit smaller, but I'd
>>>>> rather use one size for all chromosomes, since we have more than just
>>>>> human data to deal with. It would get quite messy otherwise.
>>>>> On 7/22/13 11:50 AM, "David Smiley (" &lt;
>>>>> DSMILEY@
>>>>> &gt; wrote:
>>>>>>Like Hoss said, you're going to have to solve this using
>>>>>>Using PointType is *not* going to work because your durations are
>>>>>>multi-valued per document.
>>>>>>It would be useful to create a custom field type that wraps the
>>>>>>outlined on the wiki to make it easier to use without requiring the
>>>>>>think spatially.
>>>>>>You mentioned that these numeric ranges extend upwards of 10 billion
>>>>>>Unfortunately, the current "prefix tree" implementation under the
>>>>>>non-geodetic spatial, the QuadTree, is unlikely to scale to numbers
>>>>>>big.  I don't know where the boundary is, but I doubt 10B.  You could
>>>>>>and see what happens.  I'm working (very slowly on very little spare
>>>>>>on improving the PrefixTree implementations to scale to such large
>>>>>>I hope something will be available this fall.
>>>>>>~ David Smiley
>>>>>>Kevin Stone wrote
>>>>>>> I have a particular use case that I think might require a custom
>>>>>>> type, however I am having trouble getting the plugin to work.
>>>>>>> My use case has to do with genetics data, and we are running into
>>>>>>> situations were we need to be able to query multiple regions of a
>>>>>>> chromosome (or gene, or other object types). All that really boils
>>>>>>> is being able to give a number, e.g. 10234, and return documents
>>>>>>> regions containing the number. So you'd have a document with a list
>>>>>>> ["10000:16090","400:8000","40123:43564"], and it should come back
>>>>>>> 10234 falls between "10000:16090". If there is a better or easier
>>>>>>> do this please speak up. I'd rather not have to use a "join" on
>>>>>>> index, because 1) it's more complex to set up, and 2) we might need
>>>>>>> join against something else and you can only do one join at a time.
>>>>>>> AnywayŠ I tried creating a field type similar to a PointType just
>>>>>>> if I could get one working. I added the following jars to get it to
>>>>>>> compile:
>>>>>>> I am running solr 4.0.0 on jetty, and put my jar file in a
>>>>>>> folder, and specified it in my solr.xml (I have multiple cores).
>>>>>>> After starting up solr, I got the line that it picked up the jar:
>>>>>>> INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader
>>>>>>> But I get this error about it not being able to find the
>>>>>>> AbstractSubTypeFieldType class.
>>>>>>> Here is the first bit of the trace:
>>>>>>> SEVERE: null:java.lang.NoClassDefFoundError:
>>>>>>> org/apache/solr/schema/AbstractSubTypeFieldType
>>>>>>> at java.lang.ClassLoader.defineClass1(Native Method)
>>>>>>> at java.lang.ClassLoader.defineClass(
>>>>>>> at
>>>>>>> at
>>>>>>> at$100(
>>>>>>> at$
>>>>>>> at$
>>>>>>> ...etcŠ
>>>>>>> Any hints as to what I did wrong? I can provide source code, or a
>>>>>>> stack trace, config settings, etc.
>>>>>>> Also, I did try to unpack the solr.war, stick my jar in
>>>>>>> repack. However, when I did that, I get a NoClassDefFoundError for
>>>>>>> plugin itself.
>>>>>>> Thanks,
>>>>>>> Kevin
>>>>>>> The information in this email, including attachments, may be
>>>>>>> and is intended solely for the addressee(s). If you believe you
>>>>>>> this email by mistake, please notify the sender by return email as
>>>>>>> possible.
>>>>>> Author:
>>>>>>View this message in context:
>>>>>>Sent from the Solr - User mailing list archive at
>>>>> The information in this email, including attachments, may be
>>>>> and is intended solely for the addressee(s). If you believe you
>>>>> this email by mistake, please notify the sender by return email as
>>>>> possible.
>>>> Author: 
>>>>View this message in context:
>>>>Sent from the Solr - User mailing list archive at
>>>The information in this email, including attachments, may be
>>>and is intended solely for the addressee(s). If you believe you received
>>>this email by mistake, please notify the sender by return email as soon
>>>as possible.
>The information in this email, including attachments, may be confidential
>and is intended solely for the addressee(s). If you believe you received
>this email by mistake, please notify the sender by return email as soon
>as possible.

Reply via email to