To eliminate the possibility of errors, you need to buffer the query as indicated in the wiki. If you don't and you use a super-small maxDistErr as you tell me you are doing, then you are merely making the probability of hitting an error small (perhaps even very very small), but not nonexistent. I wish there was a field type that wrapped all this up so that users wouldn't have to concern themselves with these tricky details. I created an issue to track it: https://issues.apache.org/jira/browse/SOLR-5072
~ David On 7/24/13 9:26 AM, "Kevin Stone" <kevin.st...@jax.org> wrote: >I tried reducing the maxDistErr to "0.01", just to test making it smaller. >I got maxLevels down to 45, and slightly better query times (Indexing time >was about the same). However, my queries are not accurate anymore. I need >to pad by 2 or 3 whole numbers to get a hit now, which won't work in real >use. I can play with the number a bit more, but I didn't see anything >wrong when I had it at "0.000000009". I do know about using a small >decimal value to pad around my coordinates, and I'll probably do that for >the real implementation, but for testing, whole numbers were working for >all my edge cases. > >-Kevin > >On 7/23/13 10:45 PM, "Smiley, David W." <dsmi...@mitre.org> wrote: > >>Kevin, >> >>Those are some good query response times but they could be better. >>You've >>configured the field type sub-optimally. Look again at >>http://wiki.apache.org/solr/SpatialForTimeDurations and note in >>particular >>maxDistErr. You've left it at the value that comes pre-configured with >>Solr, 0.000000009, which is ~1 meter measured in degrees, and this value >>makes no sense when your numeric range is in whole numbers. I suspect >>you >>inherited this value from Hoss's slides. **Instead use 1.** (as shown on >>the wiki). This affects performance in a big way since you've configured >>the prefixTree to hold 2.22e18 values (calculated via (max-min) / >>maxDistErr) as opposed to "just" 2e10. Your log shows maxLevels is 50 >>for >>quad tree. The comments in QuadPrefixTree (and I put them there once) >>indicate maxLevels of 50 is about as much as is supported. But again, >>I'm >>not certain what the limit really is without validating. Hopefully you >>can stay clear of 50. To do some tests, try querying just on the edge on >>either side of an indexed value to make sure you match the point and then >>don't match the indexed point as you would expect based on the >>instructions. Also, be sure to read more of the details on "Search" on >>this wiki page in which you are advised to buffer the query shape >>slightly; you didn't do this in your examples below. This is all a bit >>of >>a hack when using a field that internally is using floating point instead >>of fixed precision. >> >>~ David Smiley >> >>On 7/23/13 9:32 PM, "Kevin Stone" <kevin.st...@jax.org> wrote: >> >>>Sorry for the late response. I needed to find the time to load a lot of >>>extra data (closer to what we're anticipating). I have an index with >>>close >>>to 220,000 documents, each with at least two coordinate regions anywhere >>>between -10 billion to +10 billion, but could potentially have up to >>>maybe >>>half dozen regions in one document. The reason for the negatives, is >>>because you can read a chromosome either backwards or forwards, so many >>>coordinates can be minus. >>> >>>Here is the schema field definition: >>> >>> <fieldType name="geneticLocation" >>> class="solr.SpatialRecursivePrefixTreeFieldType" >>> multiValued="true" >>> geo="false" >>> worldBounds="-100000000000 -100000000000 100000000000 >>>100000000000" >>> distErrPct="0" >>> maxDistErr="0.000000009" >>> units="degrees" >>> /> >>> >>> >>>Here is the first query in the log: >>> >>>INFO: >>>geneticLocation{class=org.apache.solr.schema.SpatialRecursivePrefixTreeF >>>i >>>e >>>l >>>dType,analyzer=org.apache.solr.schema.FieldType$DefaultAnalyzer,args={di >>>s >>>t >>>E >>>rrPct=0, geo=false, multiValued=true, worldBounds=-100000000000 >>>-100000000000 100000000000 100000000000, maxDistErr=0.000000009, >>>units=degrees}} strat: >>>RecursivePrefixTreeStrategy(prefixGridScanLevel:46,SPG:(QuadPrefixTree(m >>>a >>>x >>>L >>>evels:50,ctx:SpatialContext{geo=false, calculator=CartesianDistCalc, >>>worldBounds=Rect(minX=-1.0E11,maxX=1.0E11,minY=-1.0E11,maxY=1.0E11)}))) >>>maxLevels: 50 >>>Jul 23, 2013 9:11:45 PM org.apache.solr.core.SolrCore execute >>>INFO: [testIndex] webapp=/solr path=/select >>>params={wt=xml&q=humanCoordinate:"Intersects(0+60330+6033041244+10000000 >>>0 >>>0 >>>0 >>>)"&rows=100} hits=81112 status=0 QTime=122 >>> >>> >>> >>> >>> >>>Here are some other queries to give different timings (the one above >>>brings back quite a lot): >>> >>>INFO: [testIndex] webapp=/solr path=/select >>>params={wt=xml&q=humanCoordinate:"Intersects(0+6000000000+6900000000+100 >>>0 >>>0 >>>0 >>>00000)"&rows=100} hits=6031 status=0 QTime=10 >>>Jul 23, 2013 9:13:43 PM org.apache.solr.core.SolrCore execute >>>INFO: [testIndex] webapp=/solr path=/select >>>params={wt=xml&q=humanCoordinate:"Intersects(0+0+10000000+10000000000)"& >>>r >>>o >>>w >>>s=100} hits=500 status=0 QTime=15 >>>Jul 23, 2013 9:14:14 PM org.apache.solr.core.SolrCore execute >>>INFO: [testIndex] webapp=/solr path=/select >>>params={wt=xml&q=humanCoordinate:"Intersects(0+7831329+7831329+100000000 >>>0 >>>0 >>>) >>>"&rows=100} hits=4 status=0 QTime=17 >>>INFO: [testIndex] webapp=/solr path=/select >>>params={wt=xml&q=humanCoordinate:"Intersects(-10000000000+-1051057963+-1 >>>0 >>>0 >>>1 >>>057963+0)"&rows=100} hits=661 status=0 QTime=8 >>> >>> >>> >>>The query times look pretty fast to me. Certainly I'm pretty impressed. >>>Our other backup solutions (involving SQL) likely wouldn't even touch >>>this >>>in terms of speed. >>> >>> >>> >>>We will be testing this more in depth in the coming month. I am sort of >>>jumping ahead of our team to research possible solutions, since this is >>>something that worried us. Looks like it might work! >>> >>>Thanks, >>>-Kevin >>> >>>On 7/23/13 1:47 PM, "David Smiley (@MITRE.org)" <dsmi...@mitre.org> >>>wrote: >>> >>>>Oh cool! I'm glad it at least seemed to work. Can you post your >>>>configuration of the field type and report from Solr's logs what the >>>>"maxLevels" is used for this field, which is logged the first time you >>>>use >>>>the field type? >>>> >>>>Maybe there isn't a limit under 10B after all. Some quick'n'dirty >>>>calculations I just did indicate there shouldn't be a problem but >>>>real-world >>>>usage will be a better proof. Indexing probably won't be terribly >>>>slow, >>>>queries could get pretty slow if the amount of indexed data is really >>>>high. >>>>I'd love to hear how it works out for you. Your use-case would benefit >>>>a >>>>lot from an improved prefix tree implementation. >>>> >>>>I don't gather how a 3rd dimension would play into this. Support for >>>>multi-dimensional spatial is on the drawing board. >>>> >>>>~ David >>>> >>>> >>>>Kevin Stone wrote >>>>> What are the dangers of trying to use a range of 10 billion? Simply a >>>>> slower index time? Or will I get inaccurate results? >>>>> I have tried it on a very small sample of documents, and it seemed to >>>>> work. I could spend some time this week trying to get a more robust >>>>>(and >>>>> accurate) dataset loaded to play around with. The reason for the 10 >>>>> billion is to support being able to query for a region on a >>>>>chromosome. >>>>> >>>>> A user might want to know what genes overlap a point on a specific >>>>> chromosome. Unless I can use 3 dimensional coordinates (which gave an >>>>> error when I tried it), I'll need to multiply the coordinates by some >>>>> offset for each chromosome to be able to normalise the data (at both >>>>>index >>>>> and query time). The largest chromosome (chr 1) has almost >>>>>250,000,000 >>>>> base pairs. I could probably squeeze the rest a bit smaller, but I'd >>>>> rather use one size for all chromosomes, since we have more than just >>>>> human data to deal with. It would get quite messy otherwise. >>>>> >>>>> >>>>> On 7/22/13 11:50 AM, "David Smiley (@MITRE.org)" < >>>> >>>>> DSMILEY@ >>>> >>>>> > wrote: >>>>> >>>>>>Like Hoss said, you're going to have to solve this using >>>>>>http://wiki.apache.org/solr/SpatialForTimeDurations >>>>>>Using PointType is *not* going to work because your durations are >>>>>>multi-valued per document. >>>>>> >>>>>>It would be useful to create a custom field type that wraps the >>>>>>capability >>>>>>outlined on the wiki to make it easier to use without requiring the >>>>>>user >>>>>>to >>>>>>think spatially. >>>>>> >>>>>>You mentioned that these numeric ranges extend upwards of 10 billion >>>>>>or >>>>>>so. >>>>>>Unfortunately, the current "prefix tree" implementation under the >>>>>>hood >>>>>>for >>>>>>non-geodetic spatial, the QuadTree, is unlikely to scale to numbers >>>>>>that >>>>>>big. I don't know where the boundary is, but I doubt 10B. You could >>>>>>try >>>>>>and see what happens. I'm working (very slowly on very little spare >>>>>>time) >>>>>>on improving the PrefixTree implementations to scale to such large >>>>>>numbers; >>>>>>I hope something will be available this fall. >>>>>> >>>>>>~ David Smiley >>>>>> >>>>>> >>>>>>Kevin Stone wrote >>>>>>> I have a particular use case that I think might require a custom >>>>>>>field >>>>>>> type, however I am having trouble getting the plugin to work. >>>>>>> My use case has to do with genetics data, and we are running into >>>>>>>several >>>>>>> situations were we need to be able to query multiple regions of a >>>>>>> chromosome (or gene, or other object types). All that really boils >>>>>>>down >>>>>>>to >>>>>>> is being able to give a number, e.g. 10234, and return documents >>>>>>>that >>>>>>>have >>>>>>> regions containing the number. So you'd have a document with a list >>>>>>>like >>>>>>> ["10000:16090","400:8000","40123:43564"], and it should come back >>>>>>>because >>>>>>> 10234 falls between "10000:16090". If there is a better or easier >>>>>>>way >>>>>>>to >>>>>>> do this please speak up. I'd rather not have to use a "join" on >>>>>>>another >>>>>>> index, because 1) it's more complex to set up, and 2) we might need >>>>>>>to >>>>>>> join against something else and you can only do one join at a time. >>>>>>> >>>>>>> AnywayŠ I tried creating a field type similar to a PointType just >>>>>>>to >>>>>>>see >>>>>>> if I could get one working. I added the following jars to get it to >>>>>>> compile: >>>>>>> >>>>>>>apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache >>>>>>>- >>>>>>>s >>>>>>>o >>>>>>>lr >>>>>>>-solrj-4.0.0. >>>>>>> I am running solr 4.0.0 on jetty, and put my jar file in a >>>>>>>sharedLib >>>>>>> folder, and specified it in my solr.xml (I have multiple cores). >>>>>>> >>>>>>> After starting up solr, I got the line that it picked up the jar: >>>>>>> INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader >>>>>>> >>>>>>> But I get this error about it not being able to find the >>>>>>> AbstractSubTypeFieldType class. >>>>>>> Here is the first bit of the trace: >>>>>>> >>>>>>> SEVERE: null:java.lang.NoClassDefFoundError: >>>>>>> org/apache/solr/schema/AbstractSubTypeFieldType >>>>>>> at java.lang.ClassLoader.defineClass1(Native Method) >>>>>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:791) >>>>>>> at >>>>>>>java.security.SecureClassLoader.defineClass(SecureClassLoader.java:1 >>>>>>>4 >>>>>>>2 >>>>>>>) >>>>>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) >>>>>>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71) >>>>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) >>>>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>>>>>> ...etcŠ >>>>>>> >>>>>>> >>>>>>> Any hints as to what I did wrong? I can provide source code, or a >>>>>>>fuller >>>>>>> stack trace, config settings, etc. >>>>>>> >>>>>>> Also, I did try to unpack the solr.war, stick my jar in >>>>>>>WEB-INF/lib, >>>>>>>then >>>>>>> repack. However, when I did that, I get a NoClassDefFoundError for >>>>>>>my >>>>>>> plugin itself. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Kevin >>>>>>> >>>>>>> The information in this email, including attachments, may be >>>>>>>confidential >>>>>>> and is intended solely for the addressee(s). If you believe you >>>>>>>received >>>>>>> this email by mistake, please notify the sender by return email as >>>>>>>soon >>>>>>>as >>>>>>> possible. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>----- >>>>>> Author: >>>>>>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >>>>>>-- >>>>>>View this message in context: >>>>>>http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086 >>>>>>p >>>>>>4 >>>>>>0 >>>>>>79 >>>>>>494.html >>>>>>Sent from the Solr - User mailing list archive at Nabble.com. >>>>> >>>>> >>>>> The information in this email, including attachments, may be >>>>>confidential >>>>> and is intended solely for the addressee(s). If you believe you >>>>>received >>>>> this email by mistake, please notify the sender by return email as >>>>>soon >>>>>as >>>>> possible. >>>> >>>> >>>> >>>> >>>> >>>>----- >>>> Author: >>>>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book >>>>-- >>>>View this message in context: >>>>http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4 >>>>0 >>>>7 >>>>9 >>>>822.html >>>>Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >>>The information in this email, including attachments, may be >>>confidential >>>and is intended solely for the addressee(s). If you believe you received >>>this email by mistake, please notify the sender by return email as soon >>>as possible. >> > > >The information in this email, including attachments, may be confidential >and is intended solely for the addressee(s). If you believe you received >this email by mistake, please notify the sender by return email as soon >as possible.