On Tue, Dec 29, 2009 at 8:41 PM, patrick o'leary <pj...@pjaol.com> wrote:

> Afraid I just took a sample set of data that was available to me at my last
> job, and ran the test.
> It kind of matched my expectations in terms of locallucene at the time, and
> what Ure predicted for Trie.
>

Do you still think there would be such a drastic difference in a lower
density situation?


>
> To give you an idea of it's performance in production, the bounding box
> retrieval for a single solr core of about 3million docs
> on a dual core 2.3ghz server with I think 8gb of ram, was about 8 - 12ms
> avg. And had ~ 3,000 results per result set.
>
> The slow part for geo search was always the distance calculation not the
> bounding box retrieval.
> I've seen feedback of where hilbert curve is meant to be faster again by an
> average of 40%, so say 4-6 ms for bounding box retrieval.
>

Yeah I have looked into hilbert curve a little myself.  Do you think its an
approach worth investigating? or will it add more complexity?


> But that still doesn't solve the long haul of distance calculations, which
> has been one of my focuses recently with a new projection and
> distance calculation based up that projection.
>
>
Tell us more! Yeah I also ran into the cost of the distance calculations,
which is why I went down the road of doing the calculations in parallel, and
addressing the cost of actual calculations themselves.  This has been pretty
effective, but I am very interested in this new projection idea?


>
>
> On Tue, Dec 29, 2009 at 11:31 AM, Chris Male <gento...@gmail.com> wrote:
>
> > Hi,
> >
> > I had never done any experiments comparing them, that was what I was
> hoping
> > was going to be explored more and it seems you have done that.  Do you
> have
> > more statistics by chance?  Does the difference (which is pretty
> dramatic)
> > stay a constant ratio as you change the density and/or distances?
> >
> > On Tue, Dec 29, 2009 at 8:25 PM, patrick o'leary <pj...@pjaol.com>
> wrote:
> >
> > > Hmm, so it's faster to do 2 range searches than use the TermEnumerator
> to
> > > find maybe 4-6 individual CartesianTier id's?
> > >
> > > I had similar approaches in the past like 2 years ago, that just
> weren't
> > > fast enough, and I've even published comparisons with Trie data types,
> > and
> > > find CartesianTier id's
> > >
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12708605&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708605
> > >
> > > The speed of Trie match what Ure's expectations were about 100ms, but
> > > Cartesian is just 12ms.
> > >
> > > The custom code, well you'd have to have custom code to figure out the
> > > bounding box from a point, unless you want to user to figure that out?
> > > And the Cartesian stuff is pretty small, it's underlying structure can
> /
> > > and
> > > now does use Trie (simply because it's the only numeric field cache
> > > interface common between lucene and solr).
> > >
> > > P
> > >
> > >
> > > On Tue, Dec 29, 2009 at 11:11 AM, Chris Male (JIRA) <j...@apache.org>
> > > wrote:
> > >
> > > >
> > > >    [
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SOLR-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795112#action_12795112
> > > ]
> > > >
> > > > Chris Male commented on SOLR-1586:
> > > > ----------------------------------
> > > >
> > > > Ah yes sorry TrieFields.  I don't see searching 2 fields as a
> downside
> > > > since that's just an implementation detail like the Spatial Tile
> (which
> > > > requires you to have upto 15 fields).  Assuming you can use the Point
> > > > FieldType to index an x and y field, then it just becomes another
> > option
> > > > like Spatial Tile.  The fact they are supported out of box is part of
> > the
> > > > attraction, as it would reduce how much custom code has to be
> > maintained.
> > > >
> > > > > Create Spatial Point FieldTypes
> > > > > -------------------------------
> > > > >
> > > > >                 Key: SOLR-1586
> > > > >                 URL:
> https://issues.apache.org/jira/browse/SOLR-1586
> > > > >             Project: Solr
> > > > >          Issue Type: Improvement
> > > > >            Reporter: Grant Ingersoll
> > > > >            Assignee: Grant Ingersoll
> > > > >            Priority: Minor
> > > > >             Fix For: 1.5
> > > > >
> > > > >         Attachments: examplegeopointdoc.patch.txt,
> > > > SOLR-1586-geohash.patch,
> > > SOLR-1586.Mattmann.112209.geopointonly.patch.txt,
> > > > SOLR-1586.Mattmann.112209.geopointonly.patch.txt,
> > > > SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt,
> > > > SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt,
> > > > SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt,
> > > > SOLR-1586.Mattmann.120709.geohashonly.patch.txt,
> > > > SOLR-1586.Mattmann.121209.geohash.outarr.patch.txt,
> > > > SOLR-1586.Mattmann.121209.geohash.outstr.patch.txt,
> > > > SOLR-1586.Mattmann.122609.patch.txt, SOLR-1586.patch, SOLR-1586.patch
> > > > >
> > > > >
> > > > > Per SOLR-773, create field types that hid the details of creating
> > > tiers,
> > > > geohash and lat/lon fields.
> > > > > Fields should take in lat/lon points in a single form, as in:
> > > > > <field name="foo">lat lon</field>
> > > >
> > > > --
> > > > This message is automatically generated by JIRA.
> > > > -
> > > > You can reply to this email to add a comment to the issue online.
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Chris Male | Software Developer | JTeam BV.| www.jteam.nl
> >
>



-- 
Chris Male | Software Developer | JTeam BV.| www.jteam.nl

Reply via email to