Re: [Neo4j] Server Plugin Example to help with large queries over REST API

Jacob Hansson Wed, 04 May 2011 12:11:26 -0700

On Tue, May 3, 2011 at 12:06 AM, Todd Chaffee <t...@mikamai.com> wrote:


> Hi Jake,
>
> The short answer to "should we?" is right in the neo4j REST API
> documentation: "The query syntax used here depends on what index provider
> you chose when you created the index."
>

That's a really good point. You'd think I would have realized that our API
is already provider-specific since I was the one who wrote that very
statement in the docs. I think I was wearing my "inside the server" hat
(where the implementation is provider-agnostic, we just pipe that query
string through), instead of my "REST API user" hat while writing my previous
email.


>
> Since the provider is "discoverable" via the REST API, the client app can
> decide if it speaks that provider's query language or indicate an
> "unsupported" error as the case may be.  No reason for the REST API to not
> provide "discoverable" URIs or parameters to all the capabilities of each
> provider.
>

If we decide to continue down the path of different APIs for different
index-providers, then I absolutely agree. My concern is that by going that
path, we push of the problem to the server clients, based on the assumption
that they will be the ones adding indexes other than Lucene, and so they
should know how to use them.

I'm not sure that is necessarily true. There is currently work going on
trying out alternatives to Lucene. If other index implementations emerge, it
would be super-awesome if server clients could "just swap" providers and see
speed improvements.

That would mean providing a uniform way to do index queries.

Perhaps a combination of both is possible - provide a fixed set of index
functions that should be available from all providers, but also let
providers extend the API to add extra features.


>
> The longer version:
>
> Concerning "We really don't want the API to be Lucene-specific (which is
> why
> we're currently only allowing that one query string as input)",
>
> I wonder if this a case of not providing "what would work right now" just
> for maintaining "future flexibility" for a requirement that doesn't yet
> exist?
>
> REST paging and better access to the full Lucene API are open requests from
> some of the community.  Are there any customers / community members
> currently asking for a pluggable alternative to Lucene as of today?  It's
> not a rhetorical question because maybe I'm just not aware of any such
> requests.
>

"Opening up" the index API to reveal more of the underlying Lucene features
would indeed allow us to do both sorting and paging on index searches.

There are two things that still leaves me hesitant to investing more time
into solving this problem with this seemingly simple solution:

One - There is currently work underway looking over our take on indexing,
work that will hopefully lead to simplifying index usage a lot. The timeline
for that is rather short, and it is likely that any changes done to the
index API now will have to be redone within a short time frame.

Two - I would much rather like a solution to paging and sorting that is the
same for both traversals and indexes. The Lucene solution does solve some of
the pain we've discussed in other threads, but not all of it. Rather than
investing a bit of time to get a halfway-solution, I'd love to invest a
little more time to get the full hullaballooza.

It should also be noted that extending the Indexing API in a way where
different implementations can have different APIs is not a *super* easy
thing to do, it would take a bit of effort to do it well.


>
> The goal of providing a pluggable index implementation is worthy. Is it
> maybe too early? Are index provider APIs  so standardized we can drop them
> in as easily as we could for example switch database providers these days?
>  Although the neo4j Index API is an abstraction of the Lucene API, the two
> look tightly coupled to me as of today.  (E.g. neo4j IndexHits is Lucene
> Hits).
>
> The approach of "hiding away" advanced functionality until some standard is
> reached, or until we better understand the problem space, is not optimal.
>  Not that there aren't good motivations.  We want to create a durable REST
> API with URIs that don't change or break with each release.
>
> I have some ideas on how this could be done at the same time as providing
> full functionality today.
>
> Mostly based around 1) using "vendor MIME media types" instead of the
> generic Accept:application/json


I'm not sure what you mean by this, could you elaborate on where we could
use vendor MIME types, and for what purpose?


> and 2) enhancing the REST API so it is fully
> HATEOAS and "responses from the server will be documents that include URIs
> to everything you can do next".
>
> Point 2 also becomes very interesting around paging (think "next" and
> "previous" URIs as part of the returned document).
>
> I can provide more details and concrete examples if this approach sounds
> interesting.
>
> Further reading:
>
> http://barelyenough.org/blog/2008/05/versioning-rest-web-services/
>
>
> http://barelyenough.org/blog/2007/05/hypermedia-as-the-engine-of-application-state/
>
> Todd
>
>
>
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 2 May 2011 13:45:00 +0200
> > From: Jacob Hansson <ja...@voltvoodoo.com>
> > Subject: Re: [Neo4j] Server Plugin Example to help with large queries
> >        over REST API
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID: <BANLkTi=xvR-UdkHimK6M=az3m+e0azu...@mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > I did a bit of poking at this last week, and noticed the same thing you
> did
> > - we don't expose much of the power of the lucene index through the REST
> > API.
> >
> > The question is, how should we go about doing it (and perhaps even
> "Should
> > we?")? We really don't want the API to be Lucene-specific (which is why
> > we're currently only allowing that one query string as input).. I suppose
> > we
> > could keep a generic API like we have now, but also allow index
> > implementations to expose "extra" API goodies in some reasonable way..
> >
> > /jake
> >
> > On Sun, May 1, 2011 at 4:12 PM, Todd Chaffee <t...@mikamai.com> wrote:
> >
> > > Hi Michael,
> > >
> > > I actually appreciated the suggestion because it got me looking closer
> at
> > > the lucene query syntax and thinking about it's limitations.  I would
> > like
> > > to see limits, sorting, and operators like > and <.  If I remember
> right
> > > they are available in the lucene java api so a shame they haven't been
> > > added
> > > to the query syntax yet.
> > >
> > > Thanks,
> > > Todd
> > >
> > >
> > > > ------------------------------
> > > >
> > > > Message: 2
> > > > Date: Sat, 30 Apr 2011 15:09:50 +0200
> > > > From: Michael Hunger <michael.hun...@neotechnology.com>
> > > > Subject: Re: [Neo4j] User Digest, Vol 49, Issue 85
> > > > To: Neo4j user discussions <user@lists.neo4j.org>
> > > > Message-ID: <bbb22a8d-32db-481a-bd2b-a5fc0faa9...@neotechnology.com>
> > > > Content-Type: text/plain; charset=us-ascii
> > > >
> > > > Todd,
> > > >
> > > > Sorry, you're right, I mixed up range queries with limits.
> > > >
> > > > Thought that would be also be possible via the query parser syntax.
> > > >
> > > > It would be nice if lucene supported query limits via their parsed
> > query
> > > > syntax.
> > > >
> > > > Sorry for the confusion.
> > > >
> > > > Michael
> > > >
> > > > Am 30.04.2011 um 14:48 schrieb Todd Chaffee:
> > > >
> > > > > Hi Michael,
> > > > >
> > > > > Unless I'm misunderstanding something, what you suggested won't
> help.
> > >  I
> > > > > have only 1 key: "name".  If I search on it with a query like
> > > > > ?query=name:*a* it is going to return all nodes with the letter 'a'
> > in
> > > > the
> > > > > name.   The result set could be over 100,000 nodes.  I want it to
> > > return
> > > > > just the first 4 nodes.  Does that make sense?  If there is a
> simpler
> > > way
> > > > of
> > > > > achieving this aside from a custom plugin I am all ears.
> > > > >
> > > > > When did the full lucene query API syntax become available with the
> > > REST
> > > > > API?  Docs have only changed in the last few days but I'm guessing
> > > > (hoping)
> > > > > the docs were a bit behind and now reflect version 1.3 of the
> server?
> > > > >
> > > > > Thanks,
> > > > > Todd
> > > > >
> > > > >
> > > > >
> > > > >> Todd,
> > > > >>
> > > > >> what about the full lucene query API syntax available with the
> REST
> > > API
> > > > >> changes ?
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://components.neo4j.org/neo4j-server/snapshot/rest.html#Index_search_-_Using_a_query_language
> > > > >>
> > > > >> e.g.
> > > > >>
> > > > >> GET /index/node/my_nodes?query=the_key:the_* AND the_other_key:[1
> TO
> > > > 100]
> > > > >>
> > > > >> with curl:
> > > > >> curl -H Accept:application/json
> > > > >>
> > > >
> > >
> >
> http://localhost:7474/db/data/index/node/my_nodes?query=the_key:the_*%20AND%20the_other_key:[1%20TO%20100]
> > > > >>
> > > > >> Shouldn't that help too?
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> Michael
> > > > >>
> > > > >>
> > > > >> Am 30.04.2011 um 01:46 schrieb Todd Chaffee:
> > > > >>
> > > > >>> This doesn't solve the lack of paging in the REST API, but it is
> a
> > > > small
> > > > >>> example of how I was able to limit the returned results for a
> large
> > > > >> query.
> > > > >>>
> > > > >>> Using the REST API query, looking for all nodes with the letter
> 'a'
> > > > >>> somewhere in the name key, would look something like this:
> > > > >>>
> > > > >>> curl -H Accept:application/json
> > > > >>> http://localhost:7474/db/data/index/node/names/name?query=*a*
> > > > >>>
> > > > >>>
> > > > >>> For my installation, that tries to return a few hundred thousand
> > > > >>> nodes.  Not a good idea.
> > > > >>>
> > > > >>>
> > > > >>> The new version looks like this, using the server plugin feature
> > > > >>> (http://docs.neo4j.org/chunked/stable/server-plugins.html):
> > > > >>>
> > > > >>>
> > > > >>> curl -X POST -H Accept:appplication/json -H
> > > > >>> Content-Type:application/json
> > > > >>>
> > http://localhost:7474/db/data/ext/NodeIndex/graphdb/limit_by_count-d
> > > > >>> '{"index":"names", "key":"name", "query":"*a*", "count":4}'
> > > > >>>
> > > > >>>
> > > > >>> With the interesting part being the count of 4 at the end.  Only
> 4
> > > > >>> nodes are returned and it happens FAST because on the server side
> > the
> > > > >>> iteration stops at 4, only 4 nodes are created, and only 4 sent
> > over
> > > > >>> the wire.
> > > > >>>
> > > > >>>
> > > > >>> More documentation and source code showing how I did this over at
> > > > github.
> > > > >>> Hope this helps some of you out there while we wait for the REST
> > api
> > > to
> > > > >>> support paging.
> > > > >>>
> > > > >>> https://github.com/tchaffee/Neo4J-REST-PHP-API-client
> > > > >>>
> > > > >>> Todd
> > > > >>>
> > > > >>> --
> > > > >>>
> > > > >>> MIKAMAI | Making Media Social
> > > > >>> http://mikamai.com
> > > > >>> +447868260229
> > > > >>> _______________________________________________
> > > > >>> Neo4j mailing list
> > > > >>> User@lists.neo4j.org
> > > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > > >>
> > > > >>
> > > > >>
> > > > > _______________________________________________
> > > > > Neo4j mailing list
> > > > > User@lists.neo4j.org
> > > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > > >
> > > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> >
> >
> >
> > --
> > Jacob Hansson
> > Phone: +46 (0) 763503395
> > Twitter: @jakewins
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 2 May 2011 14:10:50 +0200
> > From: Peter Neubauer <peter.neuba...@neotechnology.com>
> > Subject: Re: [Neo4j] User Digest, Vol 49, Issue 83
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID: <banlktim9j12dyvfseqkzhysgct1u3-3...@mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Very cool Todd!
> > Keep us updated on the progress!
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:? ? ? neubauer.peter
> > Skype? ? ?? peter.neubauer
> > Phone? ? ?? +46 704 106975
> > LinkedIn?? http://www.linkedin.com/in/neubauer
> > Twitter? ? ? http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org? ? ? ? ? ? ?? - Your high performance graph
> database.
> > http://startupbootcamp.org/ ? ?- ?resund - Innovation happens HERE.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> >
> > On Sat, Apr 30, 2011 at 12:14 AM, Todd Chaffee <t...@mikamai.com> wrote:
> > > Hi Kobla,
> > >
> > > I have been working day and night to get the PHP REST API working for
> > > version 1.3 of Neo4j. ?Available now on git:
> > >
> > > https://github.com/tchaffee/Neo4J-REST-PHP-API-client
> > >
> > > It's a very robust implementation, except for Traversals which I will
> try
> > to
> > > finish in the next few days. ?Fully unit tested.
> > >
> > > Now you have another option as well.
> > >
> > > Todd
> > >
> > > --
> > >
> > > MIKAMAI | Making Media Social
> > > http://mikamai.com
> > > +447868260229
> > >
> > >
> > >
> > >> Message: 1
> > >> Date: Fri, 29 Apr 2011 14:46:20 +0200
> > >> From: Kobla Gbenyo <ko...@riastudio.fr>
> > >> Subject: [Neo4j] Framework for CRUD operations on Neo4j Server.
> > >> To: user@lists.neo4j.org
> > >> Message-ID: <4dbab31c.5080...@riastudio.fr>
> > >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> > >>
> > >> Hello,
> > >>
> > >> Is there any framework (in Java or PHP) by which I can perform CRUD
> > >> operations (Create, Read, Update, Delete) ?on Neo4j REST Server?
> > >>
> > >> Thanks!
> > >>
> > >> --
> > >> Kobla GBENYO,
> > >> S/C M. Jean MATHE,
> > >> 28 Rue de la Normandie,
> > >> 79 000 Niort.
> > >>
> > >> (+33) 6 26 07 93 41 / 6 62 26 64 47
> > >> http://www.gbenyo-expo.fr
> > >>
> > >>
> > >>
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 2 May 2011 13:47:42 +0100
> > From: Jim Webber <j...@neotechnology.com>
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID: <5c3ac568-e5af-4fbc-bbc8-3f7de7915...@neotechnology.com>
> > Content-Type: text/plain; charset=us-ascii
> >
> > I actually think this discussion on supernodes is very valuable - we've
> > seen it crop up in a small, bit significant, number of implementations
> and
> > we've seen credible and hairy workarounds.
> >
> > [pure speculation follows]
> >
> > I'm wondering whether it would be possible to intercept the write to the
> > "red" colour node (as in Rick's domain) and redirect that to an
> underlying
> > balanced tree (effectively an index of red things). That is, the "red"
> node
> > is actually a function which as a side-effect attaches relationships to
> > nodes in a balanced tree-of-red. All of which happens without any
> explicit
> > stopping and calling out to separate indexes.
> >
> > I suspect this is non-trivial given we optimise around a stable,
> performant
> > on-disk structure, but I'd love* to hear the kernel hacker's views on
> this.
> >
> > Jim
> >
> > * Unless those views are that I'm a bozo, then I'll just reluctantly hear
> > them for the sake of completeness.
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Mon, 2 May 2011 06:31:43 -0700
> > From: Rick Bullotta <rick.bullo...@thingworx.com>
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID:
> >        <
> >
> 09df3402c845ec489a3323a06208f20d06376...@p3pw5ex1mb14.ex1.secureserver.net
> > >
> >
> > Content-Type: text/plain; charset="us-ascii"
> >
> > No doubt it could be done, Jim - but then the traversals get more complex
> > of course.  Ideally it would largely transparent via the index framework.
> >  Alternatively, I wonder if there is work that could be done at the
> kernel
> > level to deal with optimizing frequent relationship attachment/detachment
> on
> > "hotspot" situations/supernodes.
> >
> >
> > -----Original Message-----
> > From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> > On Behalf Of Jim Webber
> > Sent: Monday, May 02, 2011 8:48 AM
> > To: Neo4j user discussions
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> >
> > I actually think this discussion on supernodes is very valuable - we've
> > seen it crop up in a small, bit significant, number of implementations
> and
> > we've seen credible and hairy workarounds.
> >
> > [pure speculation follows]
> >
> > I'm wondering whether it would be possible to intercept the write to the
> > "red" colour node (as in Rick's domain) and redirect that to an
> underlying
> > balanced tree (effectively an index of red things). That is, the "red"
> node
> > is actually a function which as a side-effect attaches relationships to
> > nodes in a balanced tree-of-red. All of which happens without any
> explicit
> > stopping and calling out to separate indexes.
> >
> > I suspect this is non-trivial given we optimise around a stable,
> performant
> > on-disk structure, but I'd love* to hear the kernel hacker's views on
> this.
> >
> > Jim
> >
> > * Unless those views are that I'm a bozo, then I'll just reluctantly hear
> > them for the sake of completeness.
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Mon, 2 May 2011 16:54:47 +0200
> > From: Niels Hoogeveen <pd_aficion...@hotmail.com>
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > To: <user@lists.neo4j.org>
> > Message-ID: <col110-w47f98f7dbe872e1145b36a8b...@phx.gbl>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> >
> > Have you thought about using the in-graph Timeline index for this? Make
> > each color node the root of a Timeline and add the car nodes as entries
> to
> > that index. This may reduce your synchronization problems and is
> something
> > you can probably test without having to make too much of an investment.
> >
> > > From: rick.bullo...@thingworx.com
> > > To: user@lists.neo4j.org
> > > Date: Mon, 2 May 2011 04:09:59 -0700
> > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >
> > > Hi, Michael.
> > >
> > > The nature of the domain model really doesn't lend itself to any
> logical
> > partioning of "supernodes", so it would indeed have to be something very
> > arbitary/random.
> > >
> > > For now, I think we will have to either deal with the performance
> issues
> > or switch to using Lucene for the indexing, but we can't do that yet
> until
> > we have the ability to query the list of terms for a given key (which is
> a
> > necessary function in our domain model).  We could perhaps keep a list of
> > "terms" as nodes *and* index them, but that seems redundant.
> > >
> > > Ultimately, I think the solution is to hide the complexity via the
> > indexing framework and to offer a variety of in-graph indexing models
> that
> > address specific types of domain requirements.
> > >
> > > Rick
> > >
> > > ________________________________________
> > > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
> > > Sent: Monday, May 02, 2011 3:49 AM
> > > To: Neo4j user discussions
> > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >
> > > Perhaps then it is sensible to introduce a second layer of nodes, so
> that
> > you split down your "supernodes" and distribute the write contention?
> > >
> > > Would be interesting if putting a round robin on that second level of
> > color nodes would be enough to spread lock contention?
> > >
> > > This is what peter talks about in his activity stream update scenario.
> > >
> > > And in general perhaps a step to a more performant in-graph index.
> > >
> > > When thinking about in-graph indexes I thought it might perhaps be
> > interesting to re-use the HashMap approach of declaring x (2^n)
> bucket-nodes
> > then having from the index-root node relationships with the
> (re-distributed)
> > hashcode & (x-1) relationship-types to the bucket nodes and below the
> bucket
> > node rels with the concrete value as an relationship attribute to the
> > concrete nodes.
> > >
> > > I think this will be addressed even better with Craig's indexes or the
> > Collection abstractions that Andreas Kollegger is working on.
> > >
> > > Cheers
> > >
> > > Michael
> > >
> > > Am 02.05.2011 um 12:16 schrieb Rick Bullotta:
> > >
> > > > Hi, Niels.
> > > >
> > > > That's what we're doing now, but it has performance issues with large
> > #'s of relationships when "cars" are constantly being added, since the
> > "color" nodes become synchronization bottlenecks for updates.
> > > >
> > > > Rick
> > > >
> > > > ________________________________________
> > > > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> > > > Sent: Sunday, May 01, 2011 9:41 AM
> > > > To: user@lists.neo4j.org
> > > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >
> > > > One option would be to create a unique value node for each distinct
> > color and create a relationship from car to that value node. The value
> nodes
> > can be grouped together with relationships to some reference node.
> > > >
> > > > This gives the opportunity of finding all distinct colors, and it
> > allows you to find all cars with that particular color.
> > > >> Date: Sun, 1 May 2011 14:41:40 +0200
> > > >> From: matt...@neotechnology.com
> > > >> To: user@lists.neo4j.org
> > > >> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >>
> > > >> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > > >>> Hi, Mattias.
> > > >>>
> > > >>> Here's a use case:
> > > >>>
> > > >>> I have a million nodes representing cars, and those nodes are all
> > "tagged" with some value, let's say a color name, as a property.  I have
> > indexed those nodes on the color property value.  Now I'd like to present
> a
> > list of the distinct color values with which nodes (cars) have been
> tagged.
> >  At present, I'd need to iterate through all million, read the property,
> and
> > maintain a "distinct" HashSet as I iterate through them.
> > > >>>
> > > >>> I've tried using relationships from the "car" node(s) to a set of
> > "color" node(s), but had scalability/performance issues when there are
> lots
> > of car nodes being added/deleted (the "color" node quickly becomes a hot
> > spot/synchronization choke point).
> > > >>
> > > >> Allright, yeah such nodes can become bottlenecks, so I see your
> > > >> problem for sure.
> > > >>>
> > > >>> Rick
> > > >>>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: user-boun...@lists.neo4j.org [mailto:
> > user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson
> > > >>> Sent: Tuesday, April 26, 2011 2:17 PM
> > > >>> To: Neo4j user discussions
> > > >>> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >>>
> > > >>> Hi Rick,
> > > >>>
> > > >>> No, not really. What the use case for having such a method?
> > > >>>
> > > >>> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > > >>>> Hi, all.
> > > >>>>
> > > >>>> Is there a method or suggested approach for obtaining a list of
> all
> > of the distinct key values in a given index?  I don't care about the
> indexed
> > nodes or relationships themselves, just the value(s) of the key.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> Rick
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Neo4j mailing list
> > > >>>> User@lists.neo4j.org
> > > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Mattias Persson, [matt...@neotechnology.com]
> > > >>> Hacker, Neo Technology
> > > >>> www.neotechnology.com
> > > >>> _______________________________________________
> > > >>> Neo4j mailing list
> > > >>> User@lists.neo4j.org
> > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>> _______________________________________________
> > > >>> Neo4j mailing list
> > > >>> User@lists.neo4j.org
> > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Mattias Persson, [matt...@neotechnology.com]
> > > >> Hacker, Neo Technology
> > > >> www.neotechnology.com
> > > >> _______________________________________________
> > > >> Neo4j mailing list
> > > >> User@lists.neo4j.org
> > > >> https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> > ------------------------------
> >
> > Message: 6
> > Date: Mon, 2 May 2011 17:28:58 +0200
> > From: Craig Taverner <cr...@amanzi.com>
> > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > To: Neo4j user discussions <user@lists.neo4j.org>
> > Message-ID: <BANLkTimm1O=8syuccaf+u59f8sz+jun...@mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > Thinking back you your original domain description, cars with colors,
> > surely
> > you have more properties than just colors to index?
> >
> > If you have two or more properties, then you use combinations of
> properties
> > for the first level of the index tree, which provides your logical
> > partitioning of supernodes in a domain specific way. For example,
> > considering having the four properties color, manufacturer, model, year.
> > The
> > first level of index nodes would be the set of unique combinations of all
> > possible properties (all existing combinations, actually). This set is
> much
> > larger than the set of colors. So red will occur many times. As a result
> > you
> > dramatically reduce node contention, and the number of relationships per
> > node is much less. Then if you want to perform the query for all red
> cars,
> > actually your traverser needs to be only slightly more complex, basically
> > 'find all cars with color red and any value of the other properties'.
> >
> > This is the design of the 'amanzi-index' I started on github in December
> > (but did not complete). It was focusing on doing queries on multiple
> > properties at the same time, but does effectively cover your case of
> > reducing node contention, if you can add more properties to the index. It
> > also has the concept of a mapper from the domain specific property to the
> > index key, which was designed to reduce the number of index nodes, but in
> > your case you could also use it to increase the number of index nodes,
> > using
> > some of the ideas by Jim and Michael. Jim suggested that instead or 'red'
> > always mapping to the same node, it could map to a set of different nodes
> > (randomly selected, or round robin). Michael discussed a distributed
> > hash-code, which I do not fully understand, but it does sound relevant
> :-)
> >
> > So, in short, using the design of the amanzi-index you could help this
> > problem in two ways:
> >
> >   - index together with other properties to get a domain-specific
> >   partitioning of the 'supernodes'
> >   - Add a mapper between the color and the index key to get partitioning
> of
> >   the supernodes
> >
> >
> > On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta
> > <rick.bullo...@thingworx.com>wrote:
> >
> > > Hi, Michael.
> > >
> > > The nature of the domain model really doesn't lend itself to any
> logical
> > > partioning of "supernodes", so it would indeed have to be something
> very
> > > arbitary/random.
> > >
> > > For now, I think we will have to either deal with the performance
> issues
> > or
> > > switch to using Lucene for the indexing, but we can't do that yet until
> > we
> > > have the ability to query the list of terms for a given key (which is a
> > > necessary function in our domain model).  We could perhaps keep a list
> of
> > > "terms" as nodes *and* index them, but that seems redundant.
> > >
> > > Ultimately, I think the solution is to hide the complexity via the
> > indexing
> > > framework and to offer a variety of in-graph indexing models that
> address
> > > specific types of domain requirements.
> > >
> > > Rick
> > >
> > > ________________________________________
> > > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > > Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
> > > Sent: Monday, May 02, 2011 3:49 AM
> > > To: Neo4j user discussions
> > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > >
> > > Perhaps then it is sensible to introduce a second layer of nodes, so
> that
> > > you split down your "supernodes" and distribute the write contention?
> > >
> > > Would be interesting if putting a round robin on that second level of
> > color
> > > nodes would be enough to spread lock contention?
> > >
> > > This is what peter talks about in his activity stream update scenario.
> > >
> > > And in general perhaps a step to a more performant in-graph index.
> > >
> > > When thinking about in-graph indexes I thought it might perhaps be
> > > interesting to re-use the HashMap approach of declaring x (2^n)
> > bucket-nodes
> > > then having from the index-root node relationships with the
> > (re-distributed)
> > > hashcode & (x-1) relationship-types to the bucket nodes and below the
> > bucket
> > > node rels with the concrete value as an relationship attribute to the
> > > concrete nodes.
> > >
> > > I think this will be addressed even better with Craig's indexes or the
> > > Collection abstractions that Andreas Kollegger is working on.
> > >
> > > Cheers
> > >
> > > Michael
> > >
> > > Am 02.05.2011 um 12:16 schrieb Rick Bullotta:
> > >
> > > > Hi, Niels.
> > > >
> > > > That's what we're doing now, but it has performance issues with large
> > #'s
> > > of relationships when "cars" are constantly being added, since the
> > "color"
> > > nodes become synchronization bottlenecks for updates.
> > > >
> > > > Rick
> > > >
> > > > ________________________________________
> > > > From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
> > > Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com]
> > > > Sent: Sunday, May 01, 2011 9:41 AM
> > > > To: user@lists.neo4j.org
> > > > Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >
> > > > One option would be to create a unique value node for each distinct
> > color
> > > and create a relationship from car to that value node. The value nodes
> > can
> > > be grouped together with relationships to some reference node.
> > > >
> > > > This gives the opportunity of finding all distinct colors, and it
> > allows
> > > you to find all cars with that particular color.
> > > >> Date: Sun, 1 May 2011 14:41:40 +0200
> > > >> From: matt...@neotechnology.com
> > > >> To: user@lists.neo4j.org
> > > >> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >>
> > > >> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > > >>> Hi, Mattias.
> > > >>>
> > > >>> Here's a use case:
> > > >>>
> > > >>> I have a million nodes representing cars, and those nodes are all
> > > "tagged" with some value, let's say a color name, as a property.  I
> have
> > > indexed those nodes on the color property value.  Now I'd like to
> present
> > a
> > > list of the distinct color values with which nodes (cars) have been
> > tagged.
> > >  At present, I'd need to iterate through all million, read the
> property,
> > and
> > > maintain a "distinct" HashSet as I iterate through them.
> > > >>>
> > > >>> I've tried using relationships from the "car" node(s) to a set of
> > > "color" node(s), but had scalability/performance issues when there are
> > lots
> > > of car nodes being added/deleted (the "color" node quickly becomes a
> hot
> > > spot/synchronization choke point).
> > > >>
> > > >> Allright, yeah such nodes can become bottlenecks, so I see your
> > > >> problem for sure.
> > > >>>
> > > >>> Rick
> > > >>>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: user-boun...@lists.neo4j.org [mailto:
> > > user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson
> > > >>> Sent: Tuesday, April 26, 2011 2:17 PM
> > > >>> To: Neo4j user discussions
> > > >>> Subject: Re: [Neo4j] Lucene/Neo Indexing Question
> > > >>>
> > > >>> Hi Rick,
> > > >>>
> > > >>> No, not really. What the use case for having such a method?
> > > >>>
> > > >>> 2011/4/26 Rick Bullotta <rick.bullo...@thingworx.com>:
> > > >>>> Hi, all.
> > > >>>>
> > > >>>> Is there a method or suggested approach for obtaining a list of
> all
> > of
> > > the distinct key values in a given index?  I don't care about the
> indexed
> > > nodes or relationships themselves, just the value(s) of the key.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> Rick
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Neo4j mailing list
> > > >>>> User@lists.neo4j.org
> > > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Mattias Persson, [matt...@neotechnology.com]
> > > >>> Hacker, Neo Technology
> > > >>> www.neotechnology.com
> > > >>> _______________________________________________
> > > >>> Neo4j mailing list
> > > >>> User@lists.neo4j.org
> > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>> _______________________________________________
> > > >>> Neo4j mailing list
> > > >>> User@lists.neo4j.org
> > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Mattias Persson, [matt...@neotechnology.com]
> > > >> Hacker, Neo Technology
> > > >> www.neotechnology.com
> > > >> _______________________________________________
> > > >> Neo4j mailing list
> > > >> User@lists.neo4j.org
> > > >> https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > > _______________________________________________
> > > Neo4j mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> >
> >
> > ------------------------------
> >
> > _______________________________________________
> > User mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> >
> > End of User Digest, Vol 50, Issue 4
> > ***********************************
> >
>
>
>
> --
>
> MIKAMAI | Making Media Social
> http://mikamai.com
> +447868260229
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Server Plugin Example to help with large queries over REST API

Reply via email to