Re: [Neo4j] Are graphs ok for lots of Event data

2011-12-07 Thread Craig Taverner
Of course the graph can be used for processing event data, and whether
that works for your case or not depends. But we have used it for this,
and I can discuss a few points.

The event stream is obviously just a linear chain and can be modeled
as such in the graph (eg. with NEXT relationships between event
nodes). However this does not bring much advantage over the original
flat file which already has implicit next (next line, assuming time
ordered). You could instead use a TimeLineIndex to manage the order,
and then you would have an advantage over disordered original data.
Durations between events can be new nodes with START and END
relationships to the individual events, and the time difference
optionally added as a property to the duration node.

One nice thing about the graph is that you can keep adding data and
structure as you go, sometimes much later. So your question about
adding server and number of items processed, etc, can be added later,
at your convenience.

When grouping events together and getting statistics, some things can
be added incrementally, like max/min/count/total. But percentile is
not so trivial. Consider the case where you want to know the
statistics for each hour of events. If you have an hour node connected
to all event nodes in that hour, you can update the
max/min/count/total values as new event data enters the database. But
percentile needs to be calculated once all events in the hour have
arrived. This can be handled at the application level.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] path finding using OSM ways

2011-12-07 Thread Craig Taverner
We do indeed have twice the node count (and twice the relationship
count). This is a necessary side effect of the fact that an OSM node
can participate in more than one way (at intersections as well as
shared edges of polygons, etc.). In addition, with shared edges the
direction can be reversed from one way to the other, so we need a
completely separate set of nodes and relationships to model one way
versus the other. We have considered a compacted version of the graph
where we only use the extra nodes and relationships when they are
needed, but the code to decide when they are needed or to convert the
subgraph to the expanded version when needed (ie. when a new joined
way is loaded) would be much more complex, and therefor susceptible to
bugs. We choose a cleaner, simpler code base over a more complex, but
more compact graph.

Now we also want to model historical changes. It appears that the use
of multiple nodes/relationships will also allow us to model this, so
it is a good thing (tm) :-)

For routing, I would create a set of relationships connecting directly
all nodes that are intersection points, and ignoring all the nodes
along the way. We can add edge weights to these new relationships for
the distance traveled, or other appropriate weighting factors (type of
road, possible speed, hinderences, etc.). This graph would be ideal
for routing calculations. The main OSM graph is not ideal for routing,
but is designed to be a true and accurate reflection of the original
OSM data and topology stored in the open street map database. With
Neo4j we can do both :-)

These routing relationships have not been added to the current OSM
model in neo4j-spatial, but would be relatively trivial to add (if we
ignore advanced concepts like turning restrictions). They could be
added by the OSMImporter code that identifies intersections, with only
a few lines of extra code (I think ;-)

On 12/6/11, danielb danielbercht...@gmail.com wrote:

 craig.taverner wrote

 ...
 - Create a way-point node for these
 ...


 Hi together,

 I wonder why to add extra nodes to the graph (if I understand Craig
 correctly)? Wouldn't you then end up in expanding twice the node count
 (way-point nodes and OSM nodes themself, because you have to query the OSM
 id (or any other identification value of the end node) in every expand and
 lat / lon if you don't have precompiled edge weights)? I would just connect
 the OSM nodes directly with new edges to form a routing subgraph.

 Best Regards,
 Daniel


 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-path-finding-using-OSM-ways-tp3004328p3564688.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Feedback requested: Major wish list item for Neo4J

2011-12-07 Thread Craig Taverner
I definitely second this suggestion. We have recently being working on a
binary store for dense data we would like to access as if they were
properties of nodes. Right now we have properties that are references to
files on disk, and then handle the binary ourselves, but this does not
benefit from any transactional advantages. Rick's suggestion of a plugable
store would suite us very well, because I presume Neo4j would specify the
interface/api to use to implement such a store in a way that could be
handled atomically within transactions, and then we could satisfy that with
our own store.

On Wed, Dec 7, 2011 at 3:43 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 One area where I would love to see the Neo4J team focus some energy is in
 the efficient storage and retrieval of blob/large text properties.  Similar
 to the indexing strategy in Neo4J, it would be nice if this was pluggable
 (and it could depend on some other data store more optimized for blob/clob
 properties).

 The keys for this to be successful are:

 - Transacted
 - Does not store these properties in memory except when accessed (and
 then, perhaps offer a getPropertyAsStream method and a
 setPropertyFromStream method for optimal performance)
 - Transparent - should just work

 Nice to haves, but not at all required in the first iteration:

 - Pluggable (store in Neo4J native, filesystem, EC2 simple storage, etc.)

 Addition of these capabilities would move Neo4J into a dramatically
 expanded realm of potential applications, some of which are quite mind
 blowing, both in the social realm and in the enterprise realm.

 Feedback welcomed!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?

2011-12-06 Thread Craig Taverner
There was only a method ending in 'WithCheck', or something like that,
lying unused in the code from last year. Nothing more than that. Except for
thinking about it, which is why I wrote the previous mail.
On Dec 2, 2011 12:50 PM, Peter Neubauer pe...@neubauer.se wrote:

 Not sure,
 Craig, do you have the code somewhere?

 /peter

 On Tue, Nov 22, 2011 at 4:17 PM, grimace macegh...@gmail.com wrote:
  thanks for the response(s)!  The hardware I'm testing on is not the best
 and
  only 4G of ram so I'm limited, but this seems the best opportunity for
 me to
  learn this...that being said...
 
  For incremental imports, stitching osm files together, we re-activate
 the
  old code that tests the lucene index before adding nodes and relations.
  There might be some subtle edge cases to consider, but a set of tests
  with
  overlapping and non-overlapping osm files should flush them out.
 
  I'd love to play with this. Is the old code there for me to re-enable in
  testing? Or can you point me to where this might be put in?
 
  Thx,
  Greg
 
  --
  View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3527995.html
  Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] possibility to merge some neo4j databases?

2011-11-29 Thread Craig Taverner
There are two approaches I can think of:
- use a better index for mapping ids. Lucent is too slow. Memory hashtables
are memory bound.Peter has been investigating alternative dbs like bdb. I
tried, but did not finish a hashmap of cached arrays, and Chris wrote his
big data import project on github, which is a hashmap of cached hashmaps.
Many promising solutions, but none yet complete. All Target the general
case of id mapping.
- for this specific case, merging small databases, I had an idea a couple
of years ago which I still think will work. Bulk appending entire
databases, by offsetting all internal ids by the current max id. I remember
the reason Johan did not like this idea was that it suffered from the same
flaws as the batch inserter, locking the entire db, no rollback and risk of
entire db corruption. For people happy with the batch inserter, perhaps
this is still an option, but unlikely to get prioritized by the neo team
because if the corruption risks. It would, however, perform spectacularly
well since the id map is a trivial function.

Personally I hope someone completes Chris persistent hashmap or a similar
solution. Id maps are a recurring theme and would be very valuable.
On Nov 29, 2011 12:07 PM, osallou olivier.sal...@gmail.com wrote:

 Hi,
 I need to batch insert millions of data in neo4j.
 It is quite difficult to keep all in a Map to get node ids, so it needs
 frequent lookups in index to get some node ids for relationships, and
 result is quite low.

 Is there any way to build several neo4j databases (independantly) then
 to merge them? (I could build many small db in parallel)

 Thanks

 Olivier

 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Contributors section in the manual

2011-11-29 Thread Craig Taverner
What is the sort order? Date of first commit, number of lines, commits,
packages?
On Nov 21, 2011 2:35 PM, Peter Neubauer peter.neuba...@neotechnology.com
wrote:

 Everyone,
 have started to put in some random people in, see
 http://docs.neo4j.org/chunked/snapshot/contributors.html .

 Any ideas what more info to provide here, or how to make this nicer?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org  - NOSQL for the Enterprise.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.



 On Sun, Nov 13, 2011 at 10:42 AM, Peter Neubauer
 peter.neuba...@neotechnology.com wrote:
  To start with,
  The manual is for the direct codebase that is part of the distribution.
 The
  next step is to include sections and pointers to other stable related
  projects and drivers.
 
  Does that sound reasonable?
 
  On Nov 13, 2011 1:36 AM, Nigel Small ni...@nigelsmall.name wrote:
 
  Are you looking for info on associated projects like py2neo or direct
  contributions to the main code base?
 
  On a side note, I've been getting quite a few hits to my blog post on
  pagination in Neo4j. The bits I wrote for that are all Python/py2neo
 again
  but that or something similar might be worth including somewhere on the
  Neo
  site as it appears to be a reasonably sought-after topic.
 
  Cheers
 
  *Nigel Small*
  Phone: +44 7814 638 246
  Blog: http://nigelsmall.name/
  GTalk: ni...@nigelsmall.name
  MSN: nasm...@live.co.uk
  Skype: technige
  Twitter: @technige https://twitter.com/#!/technige
  LinkedIn: http://uk.linkedin.com/in/nigelsmall
 
 
 
  On 12 November 2011 20:40, Peter Neubauer
  peter.neuba...@neotechnology.comwrote:
 
   Hi guys,
   I would love to add a section on contributors to the Neo4j Manual, in
   http://docs.neo4j.org/chunked/snapshot/community.html so that all of
   you that participate in the process can be found in there.
  
   Do you have any suggestions on how to present this, that is - what
   info, links and maybe a short presentation snippets and pictures?
   Graph to components or simply a table?
  
   Thoughts?
  
   Cheers,
  
   /peter neubauer
  
   GTalk:  neubauer.peter
   Skype   peter.neubauer
   Phone   +46 704 106975
   LinkedIn   http://www.linkedin.com/in/neubauer
   Twitter  http://twitter.com/peterneubauer
  
   http://www.neo4j.org  - NOSQL for the Enterprise.
   http://startupbootcamp.org/- Öresund - Innovation happens HERE.
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?

2011-11-22 Thread Craig Taverner
I did some initial work on incremental imports back in 2010, but stopped
due to some complications:

   - We needed to mix lucene reads and writes during the import (read to
   check if the node already exists, so we don't import twice) and this
   performs very badly in the batch inserter. We decided to first code a
   non-batch insert mode before re-starting the incremental import work. Now
   Peter and I did code a non-batch importer in early 2011, but never went
   back to complete the incremental import.
   - We wanted to support both the case of importing multiple OSM files
   that could be stitched together by resolving overlaps, as well as the case
   of applying changesets to the existing OSM model. This increased the
   complexity of the work just enough to ensure it got dropped. In early 2011
   we also added support to changesets in the model (but only as a data
   structure, not in terms of importing changesets). So we are one step closer
   to this also.

Since we now have non-batch importing, and changeset data structures, the
opportunity to re-start the incremental import and importing changesets is
there. It should not be too hard.

For incremental imports, stitching osm files together, we re-activate the
old code that tests the lucene index before adding nodes and relations.
There might be some subtle edge cases to consider, but a set of tests with
overlapping and non-overlapping osm files should flush them out.

For applying changesets, more thinking is still required. Do we want to
support history in the model, or only the latest version? Should we verify
that only newer changesets are applied and in the right order, or rely on
the user to get it right?

I can say that we did some thinking this summer on the data structures
required to support a complete change history. This relies on the fact that
we already support multiple possible ways on the same nodes, so we can
also, in principle, support multiple possible 'versions' of ways on the
same nodes. More thinking is required, but I have a suspicion that we
should actually go ahead and do this properly will full history, because
that might be the only way to make sure the user never messes things up by
importing in the wrong order.

On Tue, Nov 22, 2011 at 9:58 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Gregory,
 incremental loads (and thus, restarts of OSM imports) are a feature we
 want to add later on, but it's not in there yet. This would also mean
 we could stitch in other areas on demand, and support submitting
 changesets back to OSM or at least capture them, so you as an OSM
 based app can contribute to OSM automagically.

 I know it's much to ask, but help here would be greatly appreciated. I
 hope to lab with Michael Hunger on import of data into OSM (and
 others) this Friday and hope to get somewhere :)

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org  - NOSQL for the Enterprise.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.



 On Tue, Nov 22, 2011 at 7:15 AM, grimace macegh...@gmail.com wrote:
  I've been playing with OSMImporter; tried batch and native java.  I've
 had
  mixed success trying to import the planet, but since it's of considerable
  size, the job usually blows up or grinds to a halt about half way.  I
 think
  the most I've made it to is 651M nodes and that's not even the ways or
  relations.   I just don't know enough about it and thought I would ask
  before I try to dive in to it, but what would I have to do to so that I
  could restart the job ( where it left off ) when it blows?
 
  --
  View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3526941.html
  Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] osm_import.rb

2011-11-11 Thread Craig Taverner
Hi,

Sorry for a late contribution to this discussion. I will try make a few
comments to cover the various mails above.

Firstly, the neo4j-spatial.rb GEM at version 0.0.8 on RubyGems works with
Neo4j-Spatial 0.6, which does include the non-batch inserter code, so in
principle should work for you. However, there is a need to change one line
of code in the Ruby to make it use the normal graph API instead of the
batch inserter. I will commit this change later, but for now you would
change line 118 of osm.rb (see
https://github.com/craigtaverner/neo4j-spatial.rb/blob/master/lib/neo4j/spatial/osm.rb#L118),
to instead look like:
#@importer.import_file batch_inserter, @osm_path
@importer.import_file normal_database, @osm_path, false, 5000
(basically you replace 'batch_inserter' with 'normal_database' and add the
two extra parameters 'false, 5000').

Looking at the errors you are getting, I see they are, as you suspected,
related to out of date instructions. I will try get round to updating the
instructions soon, but in the meantime:

   - For using the Ruby Gem, you should use the osm_import command (added
   automatically to your path when you install the gem). So you can replace
   the command 'jruby -S examples/osm_import.rb' with just 'osm_import'.
   - When using the code directly from github, there is a jar missing in
   the lib/neo4j/spatial/jars directory. This is the
   neo4j-spatial-0.6-SNAPSHOT.jar, which can be downloaded and copied into
   that directory manually. The direct link to this file on the
m2.neo4j.orgsite is
   
http://m2.neo4j.org/org/neo4j/neo4j-spatial/0.6-SNAPSHOT/neo4j-spatial-0.6-SNAPSHOT.jar

Your last comment about 'includePoints' is just a setting for whether or
not to use all OSM points as individual geometries or not. The default is
false because you normally do not want to be able to search for all points
on a long road, but for the road itself. I recommend leaving this as false,
unless you have a specific need.

Regards, Craig

On Thu, Nov 10, 2011 at 2:51 PM, grimace macegh...@gmail.com wrote:

 I ended up trying again with just java (but still running with
 batchInserter), adjusting my memory settings and max heap, it's currently
 working on the americas.osm file from cloudmade -
 http://downloads.cloudmade.com/americas#downloads_breadcrumbs. The file is
 about 99 GB when assembled.

 I'm running on ubuntu 11.10 Core 2 Duo 2.Ghz with 4G ram (not very fast,
 but
 what I have available right now),

 Java Heap -- -Xmx=3072M
 config settings:
 neostore.nodestore.db.mapped_memory=1000M
 neostore.relationshipstore.db.mapped_memory=300M
 neostore.propertystore.db.mapped_memory=400M
 neostore.propertystore.db.strings.mapped_memory=800M
 neostore.propertystore.db.arrays.mapped_memory=100M

 My code is essentially from the test suite that you suggested but I am
 using
 the batchImporter instead.  I'm about 1/3 of the way through and don't want
 to interrupt the process, but when it's done I'll try it without the batch
 importer.  It runs at about 4500 nodes/second.  Is that reasonable? I
 haven't looked at performance numbers from anyone else. Would the non batch
 performance be better?

 Is is better to 'includePoints' or not?

 One questions I had was, once I get this imported via this method ( neo4j
 embedded ), is it possible to move the imported db to a neo4j server?  I'm
 hoping it is. If so, what would that process be?



 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/osm-import-rb-tp3493463p3496760.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j low-level data storage

2011-10-07 Thread Craig Taverner
I think Daniels questions are very relevant, but not just to OSM. Any large
graph (of which OSM is simply a good example) will be affected by
fragmentation, and that can affect performance. I recently was hit by
performance of GIS queries (not OSM) related to fragmentation of the index
tree. I will describe that problem below, but first let me describe my view
on Daniels question.

It is true that if parts of the graph that are geographically close are also
close on disk the load time for bounding box queries will be faster.
However, this is not a problem that is easy to solve in a generic way,
because it requires knowledge of the domain. I can see two ways to create a
less fragmented graph:

   - Have a de-fragmenting algorithm that re-organizes an existing graph
   according to some rules. This does not exist in neo4j (yet?), but is
   probably easier to generalize, since it should be possible to first analyse
   the connectedness of the graph, and then defragment based on that. This
   means a reasonable solution might be possible without knowing the domain.
   - Be sure to load domain specific data in the order you wish to query it.
   In other words, create a graph that is already de-fragmented.

This second approach is the route I have started following (at least I've
taken one or two tiny baby-steps in that direction, but plan for more). In
the case of the OSM model produced by the OSMImporter in Neo4j-Spatial, we
do not do much here. We are importing the data in the order it was created
in the original postgres database (ie. in the order it was originally added
to open street map). However, since the XML format puts ways after all
nodes, we actually also store all ways after all nodes, which means that to
load any particular way completely from the database requires hitting disk
at at least two very different locations, the location of the way node and
the interconnects between the nodes, and the location(s) of the original
location nodes. This multiple hit will occur on the nodes, relationships and
properties tables in a similar way. So I can also answer a question Daniel
asked about the ids. The Neo4j nodes, relationships and properties have
their own id space. So you can have node 1, relationship 1 and property 1.
Lets consider a real example, a street made of 5 points, added early to OSM
(so low id's in both postgres and in neo4j). The OSM file will have these
nodes near the top, but the way that connects them together will be near the
bottom of the file. In Postgres the nodes and ways are in different tables,
and will both be near the top. In neo4j both osm-ways and osm-nodes are
neo4j-nodes (in the same 'table'). The osm-nodes will have low ids, but the
ways will have a high id. Also we use proxy nodes to connect osm-ways to
osm-nodes, and these will be created together with the way. So we will have
5 nodes with low ids, and 8 nodes with high id's (5 proxy nodes, 1 way node,
1 geometry node and 1 tags node). If the way was big and/or edited multiple
times, we could get even higher fragmentation. Personally I think that
fragmenting one geometry into a few specific locations is not a big problem
for the neo4j caches. However, when we are talking about a result-set or
traversal of thousands or hundreds of thousands of geometries, then doubling
or tripling the number of disk hits due to fragmentation can definitely have
a big impact.

How can this fragmentation situation be improved? One idea is to load the
data with two passes. The current loader is trying to optimize OSM import
speed, which is difficult already (and slower than in rdbms due to increased
explicit structure), and so we have a single pass loader, with a lucene
index for reconnecting ways to nodes. However, I think we could change this
to a two pass loader, with the first pass reading and indexing the point
nodes into a unique id-index (for fast postgres id lookup), and the second
pass would connect the ways, and store both the nodes and ways to the
database at the same time, in continuous disk space. This would improve
query performance, and if we make a good unique-id index faster than lucene,
we will actually also improve import speed .. :-)

Now all of the above does not answer the original question regarding
bounding box queries. All we will have done with this is improve the load
time for complete OSM geometries (by reducing geometry fragmentation). But
what about the index itself. We are storing the index as part of the graph.
Today, Neo4j-spatial uses an RTree index that is created at the end of the
load in OSMImporter. This means we load the complete OSM file, and then we
index it. This is a good idea because it will store the entire RTree in
contiguous disk space. Sort of  there is one issue with the RTree node
splitting that will cause slight fragmentation, but I think it is not too
serious. Now when performing bounding box queries, the main work done by the
RTree will hit the minimum amount of disk space, until 

Re: [Neo4j] Neo4j in GIS Applications

2011-10-07 Thread Craig Taverner
Hi all,

I am certainly behind on my emails, but I did just answer a related question
about OSM and fragmentation, and I think that might have answered some of
Daniels questions.

But I can say a little more about OSM and Neo4j here, specifically about the
issue of joins in postgres. Let me start by describing where I think
postgres might be faster than neo4j, and then move onto where neo4j is
faster than postgres.

Importing OSM data into postgres will be faster than neo4j because the
foreign keys are simple integer references between tables and are indexed
using postgres high performance indexes. In Neo4j the relationships are much
more detailed explicit bi-directional references taking more disk space (but
no index space). The disk write time is longer (more data written), but the
advantages of not having an index make it worth while.

So that leads naturally to where neo4j is faster. The reason there is no
index on the foreign key is because there is no need for one. Each
relationship contains the id of the node it points to (and points from), and
that id is directly mapped to the location on disk of the node itself. So
this is more like an array lookup, because all nodes are the same size on
disk. So the 'join' you perform when traversing from one osm-node to another
is extremely fast, but more importantly it is not affected by database size.
It is O(1) in performance! Fantastic! In rdbms, the need for an index on the
foreign key means you are building a tree structure to get the join down
from O(N) to O(ln(N)) or something better, but never as good as O(1).

In neo4j-spatial, if you perform a bounding box query, you are traversing an
RTree, which does not exist in posgres, but does exist in PostGIS. In both
Neo4j-Spatial and PostGIS you are working with a tree index that will slow
things down if there is a lot of data, and currently the postgis rtree is
better optimized than the neo4j-spatial rtree. But if you are performing
more graph-like processing, for example proximity searches, or routing
analysis, then you will get the full O(1) benefits of the graph database,
and no way can postgres match that :-)

OK. Lots of hype, but I get enthusiastic sometimes. Take anything I say with
a pinch of salt. Believe the part that make sense to you, and try some tests
otherwise. It would be great to hear your experiences with modeling OSM in
neo4j versus postgres.

Regards, Craig

On Tue, Oct 4, 2011 at 7:18 PM, Andreas Kollegger 
andreas.kolleg...@neotechnology.com wrote:

 Hi Daniel,

 If you haven't yet, you should check out the work done in the Neo4j Spatial
 project - https://github.com/neo4j/spatial - which has fairly
 comprehensive
 support for GIS.

 Data locality, as you mention, is exactly a big advantage of using a graph
 for geospatial data. Take a look at the Neo4j Spatial project and let us
 know what you think.

 Best,
 Andreas

 On Tue, Oct 4, 2011 at 9:58 AM, danielb danielbercht...@gmail.com wrote:

  Hello everyone,
 
  I am going to write my master thesis about the suitability of graph
  databases in GIS applications (at least I hope so^^). The database has to
  provide topological queries, network analysis and the ability to store
  large
  amount of mapdata for viewing - all based on OSM-data of Germany ( 100M
  nodes). Most likely I will compare Neo4j to PostGIS.
  As a starting point I want to know why you would recommend Neo4j to do
 the
  job? What are the main advantages of a graph database compared to a
  (object-)relational database in the GIS environment? The main focus and
 the
  goal of this work should be to show a performance improvement over
  relational databases.
  In a student project (OSM navigation system) we worked with relational
  (SQLite) and object-oriented (Perst) databases on netbook hardware and
  embedded systems. The relational database approach showed us two
 problems:
  If you transfer the OSM model directly into tables then you have a lot of
  joins which slows everything down (and lots of redundancy when using
  different tables for each zoom level). The other way is to store as much
 as
  possible in one big (sparse) table. But this would also have some
  performance issues I guess and from a design perspective it is not a nice
  solution. The object-oriented database also suffered from many random
 reads
  when loading a bounding box. In addition we could not say how data was
  stored in detail.
  The performance indeed increased after caching occured or by the use of
 SSD
  hardware. You can also store everything in RAM (money does the job), but
  for
  now you have to assume that all of the data has to be read from a slow
 disk
  the first time. Can Neo4j be configured to read for example a bounding
 box
  of OSM data from disk in an efficient way (data locality)?
  Maybe you also have some suggestions where I should have a look at in
 this
  work and what can be improved in Neo4j to get better results. I also
 would
  appreciate related papers.
 
  

Re: [Neo4j] Problem Installing Spatial (Beginner)

2011-10-07 Thread Craig Taverner
Sorry for such a late response, I missed this mail.

I must first point out that it seems you are trying to use Neo4j-Spatial in
the standalone server version of Neo4j. That is possible, but not well
supported. We have only exposed a few of the functions in the server, and do
not test it regularly.

The main way we are using neo4j-spatial at the moment is in the embedded
version of neo4j. This is where the maven instructions come in because they
assume you are writing a Java application that will embed the database. If
you are using a java application, and you can start using maven, then
everything should be easy to get working.

However, since I am relatively sure you are using neo4j-server, I think you
are getting into deep water. We need to improve our support for neo4j server
more before I can recommend you try it. The next release, 0.7, is focusing
on geoprocessing features, and then we hope to expose this in neo4j-server
in 0.8. Hopefully then things will be much easier for you.

On Tue, Sep 27, 2011 at 5:24 PM, handloomweaver a...@atomised.coop wrote:

 Hi

 I wonder if someone would be so kind to help. I'm new to Neo4j and was
 trying to install Neo4jSpatial to try its GIS features out. I need to be
 clear that I have no experience of Java  Maven so I'm struggling a bit.

 I want to install Neo4j  Spatial once somewhere on my 4GB MacBook Pro. I
 have no problem downloading the Neo4j Java Binary and starting it. But I'm
 confused about the Spatial library. Looking at the Github page it says
 either use Maven or copy a zip file into a folder in Neo4j. Is the zip file
 the Github repository contents or something else?

 I've tried the Maven way (mvn install) described on GitHub but I'm firstly
 confused about if/where Neo4j is being installed (does it install it first,
 where?) and anyway the install fails. It seems to be a JVM Heap memory
 problem? Why is it failing. How can I make it not fail. Is it a config file
 somewhere needing tweaked?

 http://handloomweaver.s3.amazonaws.com/Terminal_Output.txt
 http://handloomweaver.s3.amazonaws.com/surefire-reports.zip

 I'm really keen to use Neo4J spatial but the barrier to entry for the less
 technical GIS developer is proving too high for me!

 I'd SO appreciate some help/pointers. I apologise that I am posting such a
 NOOB question on your forum but I've exhausted Google searches.

 Thanks





 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Problem-Installing-Spatial-Beginner-tp3372924p3372924.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Spatial query with property filter

2011-08-29 Thread Craig Taverner
I can elaborate a little on what Peter says. The DynamicLayer support is
indeed the only way to do what you want right now, but I think it is
actually quite a good fit for your use case. When defining a dynamic layer
you are actually just defining a 'returnable evaluator', which will be
applied to the nodes during the RTree spatial search. This means that the
primary search is spatial, but for each leaf node (geometry) the dynamic
layer query is applied as a filter.

If you use CQL for the query, then all geometries are converted into JTS
geometry classes for the filter (which adds a little overhead, so if the
spatial query is not your limited factor, this can affect performance). If
you use JSON for the query, it is applied directly to the graph as a pattern
match. So JSON should be faster, but does also require that you know the
structure of the graph, which the CQL approach does not.

Peters pointer to the TestDynamicLayers class is the best place to start for
seeing how to use both CQL and JSON filter syntaxes.

On Mon, Aug 29, 2011 at 11:59 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Hi there,
 well, spatial querying is not something that can be easily stuck into an
 iterator. If you want more than casual querying, I think you need to use
 the
 GeoTools APIs, we provide support for CQL as a query lang there, see

 https://github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L60for
 some examples. Basically, you define a dynamic layer witha  CQL query,
 which will return the subset of the full layer (e.g. a SimplePointLayer)
 that matches that query.

 Would that help?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Mon, Aug 29, 2011 at 1:37 AM, faffi obscurredbyclo...@gmail.com
 wrote:

  Hey guys,
 
  I'm seeing some kind of disconnect between the spatial and the regular
  graph
  traversing query. I can't find a way of executing a spatial query like in
  SimplePointLayer but also providing something like a ReturnEvaluator.
 
  My use case is essentially for all nodes within a 10km radius, return all
  with name foo. Do I actually have to iterate through all the nodes
  returned by the query in a list and individually check them?
 
  Thanks,
  faffi
 
  --
  View this message in context:
 
 http://neo4j-community-discussions.438527.n3.nabble.com/Spatial-query-with-property-filter-tp3291410p3291410.html
  Sent from the Neo4j Community Discussions mailing list archive at
  Nabble.com.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial and postgis

2011-08-13 Thread Craig Taverner
Or if you want a command line import, try the ruby gem 'neo4j-spatial.rb'.
Once installed you can type:
osm_import file.shp
On Aug 13, 2011 10:33 AM, Andreas Wilhelm a...@kabelbw.de wrote:
 Hi,

 with the pgsql2shp tool you can dump your postgis db in a shapefile and
 you should be able to import it in Neo4j Spatial in the following way:

 String shpPath = SHP_DIR + File.separator + layerName;
 ShapefileImporter importer = new ShapefileImporter(graphDb(), new
 NullListener(), commitInterval);
 importer.importFile(shpPath, layerName);


 Best Regards

 Andreas



 Am 12.08.2011 11:10, schrieb chen zhao:
 Hi,

 I very interested in neo4j spatial . but I do not know how to import the
 spatial data.

 My data are stored in postgis. I read the document 
 http://wiki.neo4j.org/content/Spatial_Data_Storage; and 
 http://wiki.neo4j.org/content/Importing_and_Exporting_Spatial_Data,but I
 yet do not know to to import data from postgis or import shapfiles.

 Could you provide some detail information?

 Please advice.

 zhao
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Spatial and gtype property

2011-07-29 Thread Craig Taverner
Yes. If you have performed a search and now have SpatialDatabaseRecord
results, then that is the best method to use.

On Thu, Jul 28, 2011 at 6:03 AM, Christopher Schmidt 
fakod...@googlemail.com wrote:

 So best is to use SpatialDatabaseRecord.getGeometry()?

 Christopher

 On Wed, Jul 27, 2011 at 10:50 PM, Craig Taverner cr...@amanzi.com wrote:

  Actually we do allow multiple geometry types in the same layer, but some
  actions, like export to shapely, will fail. We even test for this in
  TestDynamicLayers.
 
  You can use the gtype if you want, but it is specific to some
  GeometryEncoders, and might change in future releases. It would be better
  to
  get the layers geometry encoder and use that.
  On Jul 27, 2011 6:04 PM, Peter Neubauer 
  peter.neuba...@neotechnology.com
  wrote:
   Christopher,
   What do you mean by allowing to use? Yes, these properties are used to
   store the Geometry Type for a Layer and for geometry nodes. Sadly, you
   cannot have more than one Geometry in Layers due to the limitations of
   e.g. the GeoTools stack.
  
   Cheers,
  
   /peter neubauer
  
   GTalk:  neubauer.peter
   Skype   peter.neubauer
   Phone   +46 704 106975
   LinkedIn   http://www.linkedin.com/in/neubauer
   Twitter  http://twitter.com/peterneubauer
  
   http://www.neo4j.org   - Your high performance graph
  database.
   http://startupbootcamp.org/- Öresund - Innovation happens HERE.
   http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
 party.
  
  
  
   On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt
   fakod...@googlemail.com wrote:
   Hi all,
  
   is it allowed to use the gtype-property to get the geometry type
  numbers?
  
   (Which are defined in org.neo4j.gis.spatial.Constants)
  
   --
   Christopher
   twitter: @fakod
   blog: http://blog.fakod.eu
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Christopher
 twitter: @fakod
 blog: http://blog.fakod.eu
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j Spatial and gtype property

2011-07-27 Thread Craig Taverner
Actually we do allow multiple geometry types in the same layer, but some
actions, like export to shapely, will fail. We even test for this in
TestDynamicLayers.

You can use the gtype if you want, but it is specific to some
GeometryEncoders, and might change in future releases. It would be better to
get the layers geometry encoder and use that.
On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com
wrote:
 Christopher,
 What do you mean by allowing to use? Yes, these properties are used to
 store the Geometry Type for a Layer and for geometry nodes. Sadly, you
 cannot have more than one Geometry in Layers due to the limitations of
 e.g. the GeoTools stack.

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt
 fakod...@googlemail.com wrote:
 Hi all,

 is it allowed to use the gtype-property to get the geometry type numbers?

 (Which are defined in org.neo4j.gis.spatial.Constants)

 --
 Christopher
 twitter: @fakod
 blog: http://blog.fakod.eu
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How often are Spatial snapshots published?

2011-07-22 Thread Craig Taverner
Interesting that if you look at the github 'blame' for that file (see
https://github.com/neo4j/neo4j-spatial/blame/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java),
you find that all the findClosestEdges methods where added in October 2010.
So if Nolan has a version older than that, then something weird is going on.
He must have the very first version from September 2010, which is not
compatible with any recent Neo4, Geotools or uDig.

When I look at m2.neo4j.org I can see that the latest 0.6-SNAPSHOT is from
May. So we do have a problem, but not one that takes us back to last
September.

Nolan, perhaps your pom.xml refers to an older neo4j-spatial? You should use
0.6-SNAPSHOT. And we will change that again soon (to 0.7) since we are
making changes to the geoprocessing and indexing.

On Fri, Jul 22, 2011 at 10:04 AM, Anders Nawroth
and...@neotechnology.comwrote:

 Hi!

 The deployment seems to be broken at the moment, I'll look into that ASAP.

 /anders

 2011-07-22 09:28, Peter Neubauer skrev:
  Nolan,
  saftest is to build it yourself from GITHub, I will check the
  deployment. Is that ok for now?
 
  /peter
 
  On Fri, Jul 22, 2011 at 3:57 AM, Nolan Darilekno...@thewordnerd.info
  wrote:
  I'm looking at the Spatial sources from Git, and am seeing lots of
  versions of SpatialTopologyUtils.findClosestEdges that don't appear to
  be in the snapshot I'm downloading. For instance,
 
   public static ArrayListPointResult  findClosestEdges(Point point,
   Layer layer) {
 
 
  doesn't appear to be in the snapshot build I have--that or my local
  cache is borken.
 
  Are these snapshots rebuilt regularly?
 
  Thanks.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to create a graph database out of a huge dataset?

2011-07-19 Thread Craig Taverner
I'm not sure it's such a good idea to call tx.success() on every iteration
of the loop. I suggest call it only in the commit, and after the loop (ie.
move it two lines down).

Also I think a commit size of 50k it a little large. You're probably not
going to see much improvement past 10k. In fact I generally only use 1k
myself (but I hear 10k is popular too :-)

On Sun, Jul 17, 2011 at 8:53 PM, st3ven st3...@web.de wrote:

 Hi,

 thanks for your fast answer.
 Right now I'm using lucene for 6M authors, but my whole dataset consists of
 nearly 25M authors.
 Can i use lucene there also, because I think this getting really slow to
 check if a user already exists.
 How can I change my heap memory settings and my memory-map settings, cause
 I'm using the transactional mode?
 Cause I think with 25M authors I will get a OutOfMemory Exception.

 Here is my code that I have already written so far:

 import java.io.BufferedReader;
 import java.io.FileReader;
 import java.io.IOException;

 import org.neo4j.graphdb.GraphDatabaseService;
 import org.neo4j.graphdb.Node;
 import org.neo4j.graphdb.Relationship;
 import org.neo4j.graphdb.Transaction;
 import org.neo4j.graphdb.index.Index;
 import org.neo4j.graphdb.index.IndexHits;
 import org.neo4j.graphdb.index.IndexManager;
 import org.neo4j.kernel.EmbeddedGraphDatabase;

 public class WikiGraphRegUser {

/**
 * @param args
 */
public static void main(String[] args) throws IOException {

BufferedReader bf = new BufferedReader(new FileReader(
E:/wiki0.csv));
WikiGraphRegUser wgru = new WikiGraphRegUser();
wgru.createGraphDatabase(bf);
}

private String articleName = ;
private GraphDatabaseService db;
private IndexManager index;
private IndexNode authorList;
private int transactionCounter = 0;
private Node article;
private boolean isFirstAuthor = false;
private Node author;
private Relationship relationship;
private int node;

private void createGraphDatabase(BufferedReader bf) {
db = new EmbeddedGraphDatabase(target/db);
index = db.index();
authorList = index.forNodes(Author);

String zeile;
Transaction tx = db.beginTx();

try {
// reads lines of CSV-file
while ((zeile = bf.readLine()) != null) {
if (transactionCounter++ % 5 == 0) {

tx.success();
tx.finish();
tx = db.beginTx();
}
// String[] looks like this: Article%;%
 Timestamp%;% Author
String[] artikelinfo = zeile.split(%;% );
if (artikelinfo.length != 3) {
System.out.println(ERROR: check
 CSV);
for (int i = 0; i 
 artikelinfo.length; i++) {

  System.out.println(artikelinfo[i]);
}
return;
}

if (articleName == ) {
// create Article and connect with
 ReferenceNode
article =
 createArticle(artikelinfo[0],

  db.getReferenceNode(), MyRelationshipTypes.ARTICLE);
articleName = artikelinfo[0];

isFirstAuthor = true;

} else if
 (!articleName.equals(artikelinfo[0])) {
// create Article and connect with
 ReferenceNode
article =
 createArticle(artikelinfo[0],

  db.getReferenceNode(), MyRelationshipTypes.ARTICLE);
articleName = artikelinfo[0];
isFirstAuthor = true;
}
// checks if author already exists
IndexHitsNode hits =
 authorList.get(Author, artikelinfo[2]);
// if new author
if (hits.size() == 0) {
if (isFirstAuthor) {
// creates author and
 connects him with an article
author =
 createAndConnectNode(artikelinfo[2], article,

  MyRelationshipTypes.WROTE, artikelinfo[1]);
isFirstAuthor = false;
} else {

author 

Re: [Neo4j] Neo4j Spatial - Keep OSM imports - Use in GeoServer

2011-07-12 Thread Craig Taverner
I am travelling at the moment, so cannot give a long answer, but can suggest
you look at the wiki page for neo4j in uDig, because there we have made some
updates concerning which jars to use, and that will probably help you get
this working.
On Jul 12, 2011 10:59 AM, Robin Cura robin.c...@gmail.com wrote:
 Hi,

 First of all, thanks a lot to both of you for your answers, I have only
been
 able to try this yesterday, and it released me from lots of troubles.

 I succeeded editing the Neo4jTestCase.java file in Netbeans, as you told.
 I've got troubles to install latest JRuby release (needed for
neo4j-spatial)
 within my Ubuntu, so, I'll make this later, but it's really a good thing
to
 know considering the simplicity of use.

 Creating thoses databases made me realize another problem.In fact, I
 followed the tutorial about using neo4j db in Geoserver, and it appears
that
 my neo4j plugin for Geoserver doesn't work, as I always get this error
when
 trying to create a new store linking to my neo4j database.
 My database is a folder named db1 (and db2 for the other one), located
 in my ~/ folder.

 In Geoserver, I create a new store and make it link to
 file:/home/administrateur/db1/neostore.id
 But each time, I got this errror :

 Error connecting to Store.

 There was an error trying to connecto to store neo4jstore. Do you want to
 save it anyway?

 Original exception error:

 Could not acquire data access 'neo4jstore'

 I tried with my 2 databases, and same problem.
 It seems those 2 db aren't the problem, as I've been able to
open/visualise
 those in Gephi (using neo4j import plugin).

 My guess is that my neo4-spatial plugin for Geoserver isn't working
 properly.

 The main problem is that, since the tutorial was written, neo4j changed.

 In the tuto, we have to place some files in geoserver/WEB-INF/lib/ folder
:

 - json-simple-1.1.jar -- No problem, this file is still used
 - geronimo-jta_1.1_spec-1.1.1.jar -- Same, this is still the version
 used in neo4j
 - neo4j-kernel-1.2-1.2.M04.jar -- Replaced this one with my current
 neo4j kernel jar, neo4j-kernel-1.4.jar
 - neo4j-index-1.2-1.2.M04.jar
 - neo4j-spatial.jar-- Replaced this one with the latest build returned
 by using sudo mvn clean package : neo4j-spatial-0.6-SNAPSHOT.jar

 My problem is that there is no more neo4j-index file in latest neo4j
 releases. There is some neo4j-lucene-index files, but 1.4 doesn't seem to
 use neo4j-index anymore.
 When I only put neo4j-lucene-index.jar, Geoserver doesn't propose any
option
 to create a Store from Neo4j databases.

 So, what I did is I used the neo4j-index-1.3-1.3.M01.jar file from
previous
 release of Neo4j : Geoserver proposes to create a Store from a Neo4j db,
but
 I got the error message quoted above.

 Any idea how I could make this work ? What is the file that replace
 neo4j-index in Neo4j 1.4 ?

 I join one of my database, archived, so that one of you with a working
neo4j
 plugin in Geoserver could test it and confirm the problem isn't with the
DB.

 Thanks,

 Robin Cura

 2011/7/9 Craig Taverner cr...@amanzi.com

 Another option is to run the main method of OSMImport class, which
expects
 command line arguments for database location and OSM file, and will
simply
 import a file once. This is not tested often, so there is a risk things
 have
 changed, but it is worth a try.

 Another, even easier, option in my opinion is the JRuby gem,
 neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial

 To get this running, just install JRuby from http://jruby.org, and then
 install the gem with jruby -S gem install neo4j-spatial and then you
will
 have new console commands like 'import_layer'. If you run 'import_layer
 mydata.osm', it will import it to a new database, which you can use. See
 the
 github page for more information:
 https://github.com/craigtaverner/neo4j-spatial.rb

 On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer 
 peter.neuba...@neotechnology.com wrote:

  Robin,
 
  the database is deleted after each run in Neo4jTestCase.java,
 
  @Override
  @After
  protected void tearDown() throws Exception {
  shutdownDatabase(true);
  super.tearDown();
  }
 
  if you change to shutdownDatabase(false), the database will not be
  deleted. In this case, make sure to run just that test in order not to
  write several tests to the same DB for clarity.
 
  mvn test -Dtest=TestDynamicLayers
 
  Does that work for you?
 
 
  Cheers,
 
  /peter neubauer
 
  GTalk: neubauer.peter
  Skype peter.neubauer
  Phone +46 704 106975
  LinkedIn http://www.linkedin.com/in/neubauer
  Twitter http://twitter.com/peterneubauer
 
  http://www.neo4j.org - Your high performance graph
 database.
  http://startupbootcamp.org/ - Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com
wrote:
   Hello,
  
   First of all, I don't know anything in java, and I'm trying to figure
 out
  if
   neo4j could

Re: [Neo4j] Neo4j Spatial - Keep OSM imports

2011-07-08 Thread Craig Taverner
Another option is to run the main method of OSMImport class, which expects
command line arguments for database location and OSM file, and will simply
import a file once. This is not tested often, so there is a risk things have
changed, but it is worth a try.

Another, even easier, option in my opinion is the JRuby gem,
neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial

To get this running, just install JRuby from http://jruby.org, and then
install the gem with jruby -S gem install neo4j-spatial and then you will
have new console commands like 'import_layer'. If you run 'import_layer
mydata.osm', it will import it to a new database, which you can use. See the
github page for more information:
https://github.com/craigtaverner/neo4j-spatial.rb

On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Robin,

 the database is deleted after each run in Neo4jTestCase.java,

   @Override
@After
protected void tearDown() throws Exception {
shutdownDatabase(true);
super.tearDown();
}

 if you change to shutdownDatabase(false), the database will not be
 deleted. In this case, make sure to run just that test in order not to
 write several tests to the same DB for clarity.

 mvn test -Dtest=TestDynamicLayers

 Does that work for you?


 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote:
  Hello,
 
  First of all, I don't know anything in java, and I'm trying to figure out
 if
  neo4j could be usefull for my projects. If it is, I will of course learn
 a
  bit of java so that I can use neo4j in a decent way for my needs.
 
  I'd like to use a neo4j spatial database together with GeoServer.
  For this, I'm following the tutorial here :
  http://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServer
  But this paragraph is blocking me :
  
 
- One option for the database location is a database created using the
unit tests in Neo4j Spatial. The rest of this wiki assumes that you ran
 the
TestDynamicLayers unit test which loads an OSM dataset for the city of
 Malmö
in Sweden, and then creates a number of Dynamic Layers (or views) on
 this
data, which we can publish in GeoServer.
- If you do use the unit test for the sample database, then the
 location
of the database will be in the target/var/neo4j-db directory of the
 Neo4j
Source code.
 
  
 
  My problem is I do not succeed keeping those neo4j spatial databases
 created
  with the tests : When I run TestDynamicLayers, it builds databases (in
  target/var/neo4j-db), but as soon as the database is successfully loaded,
 it
  deletes it and start importing another database, and so on.
 
  My poor understanding of java doesn't help a lot, I tried to edit the
 .java
  in Netbeans + Maven, but until then, it doesn't work, all the directories
  created during the tests are deleted when the test ends.
 
  Any idea how I could keep those databases ?
 
  Thanks,
 
  Robin
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-06 Thread Craig Taverner
Hi Boris,

I can see the new update method here:
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138

And the commit for it is here:
https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e

Hope that helps.

Let me know if this works. The REST method is entirely untested, but does
wrap code that is tested, so I'm relatively optimistic :-)

Regards, Craig

On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Hi Craig,

 This is awesome!

 Where is the update method? I can't find the code on github.

 Thanks!

 On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote:

  As I understand it, Andreas is working on the much more complex problem
 of
  updating OSM geometries. That is more complex because it involves
  restructuring the connected graph.
 
  The case Boris has is much simpler, just modifying the WKT or WKB in the
  editable layer. In the Java API this is simply to call the
  GeometryEncoder.encodeGeometry() method, which will modify the geometry
 in
  place (ie. replace the old geometry with a new one). However, I do not
  think
  it is that simple on the REST interface. I can check, but think we will
  need
  a new method for updating geometries. Internally it is trivial to code.
 
  So I just added a quick method, called updateGeometryFromWKT, which
  requires
  the geometry (in WKT), the existing geometry node-id, and the layer. Give
  it
  a try.
 
  On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.com
  wrote:
 
   Actually,
   Andreas Wilhelm is working right now on updating geometries.
  
   Sent from my phone.
   On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote:
Wow that's great! I'll try it out asap. This leads to my next
 question:
   how
do I update the geometry in a layer, rather than add new? What I am
   thinking
of doing is having a multipoint geometery associated with each of my
  user
nodes which will represent their location history. My plan is to add
  the
geometry to a world layer and then associate the returned node with
  the
user. How do I then add new points to that connecter node? Can I just
   edit
the wkt and assume the index will update? Or do you have a better
   suggestion
for doing this? I would rather avoid having each point be a seperate
  node
   as
I am tracking gps data and getting lots of coordinates, it would be
  many
thousands of nodes per user.
   
Many thanks!
   
   
   
On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
   wrote:
   
Hi Boris,
   
Ah! You are using the REST API. That changes a lot, since Neo4j
  Spatial
   is
only recently exposed in REST and we do not expose most of the
capabilities
I have discussed in this thread, or indeed in my other answer
 today.
   
I did recently add some REST methods that might work for you,
   specifically
the addEditableLayer, which makes a WKB layer, and the
addGeometryWKTToLayer, for adding any kind of Geometry (including
LineString) to the layer. However, these were only added recently,
  and
   I
have no experience using them myself, so consider this very much
   prototype
code. From your other question today, can I assume you are having
   trouble
making sense of the data coming back? So we need a better way to
  return
the
results in WKT instead of WKB? One option would be to enhance the
addEditableLayer method to allow the creation of WKT layers instead
  of
   WKB
layers, so the internal representation is more internet friendly.
   
I've just added untested support for setting the format to WKT for
  the
internal representation of the editable layer in the REST
 interface.
   This
is
untested (outside of my usual unit tests, that is), and is only in
  the
trunk
of neo4j-spatial, but you are welcome to try it out and see what
   happens.
   
Regards, Craig
   
On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn 
 bo...@popcha.com
wrote:
   
 Hi Craig,

 Thanks so much for this reply. It is very insightful. Is it
  possible
   for
me
 to implement the LineString geometries and lookups using REST?

 Many thanks!

 On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com
 
wrote:

  OK. I understand much better what you want now.
 
  Your person nodes are not geographic objects, they are persons
  that
can
 be
  at many positions and indeed move around. However, the 'path'
  that
they
  take
  is a geographic object and can be placed on the map and
 analysed
  geographically.
 
  So the question I have is how do you store the path the person
   takes?
Is
  this a bunch of position nodes connected back to that person?
 Or
perhaps
 a
  chain of position-(next

Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #6

2011-07-02 Thread Craig Taverner
Hi Andreas,

Sounds like good progress over all. It is only a week to the mid-terms, so
it would be good to do a general code overview and see if this can be
integrated with trunk. Shall we plan for a review and test integration in
the middle of next week?

Regards, Craig

On Sat, Jul 2, 2011 at 10:25 AM, Andreas Wilhelm a...@kabelbw.de wrote:

 Hi,

 This week I had a little blocker with deleting some subgraph nodes and
 relations. For that I made a seperate test to identify the problem and
 try to find a solution.

 Apart from that I integrated a additonal spatial type function to get
 the distance between geometry nodes and
 updated the already existing spatial type functions to the new API.


 Best Regards

 Andreas Wilhelm
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] reify links with other neo4j databases located on different distributed servers

2011-07-02 Thread Craig Taverner
As far as I know there is no internal support for transparent traversals
across shards. Generally people are doing that in the application layer.
However, I think there might be a middle ground of sorts. I we modify the
relationship expander, I could imagine that relationships that are between
shards could be modified to return node on the other shard. This would make
the traversal return nodes across shards, but since I've not tried this
myself, I am uncertain if there are other consequences.

On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala aliabba...@gmail.comwrote:

 Hi,

 I cannot figure out how my application logic can reify links with
 other neo4j databases located on different distributed servers?
 hence , how can i make the traversals and graph algorithms transparent
 to the location of the different databases ?
 --
 Aliabbas Petiwala
 M.Tech CSE
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] wkb value in node created by addGeometryWKTToLayer

2011-07-02 Thread Craig Taverner
Hi Boris,

You do not need to read the property yourself from the node, rather use the
GeometryEncoder for this, it converts from the internal spatial storage to
the Geometry class, which you can work with. If you call geom.toString() you
will get a nice printable version (in WKT). Using the GeometryEncoder is a
particularly good idea because we support many internal storage formats, not
just the WKB you found. If you have point data only, you should consider
using the SimplePointLayer (created with
SpatialDatabaseService.createSimplePointLayer()), which will store the Point
as two properties, for latitude and longitude.

Back to your main question: WKB and WKT are two different formats for
representing spatial data. We support both with the WKBGeometryEncoder and
WKTGeometryEncoder classes, but in both cases we convert from that format to
JTS Geometry class for performing spatial operations on. Internally these
classes use the WKBReader/WKBWriter (and WKT versions of this) for
performing the conversions. If you want to convert between WKB and WKT
yourself, you should just use the JTS code directly.

But as I said before, I do not think you need to do this. If you are getting
your nodes from a search using the index, something like
search.getResults().get(0).getGeometry().toString() will return the WKT
version.

Regards, Craig

On Sat, Jul 2, 2011 at 1:04 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Craig or anyone who can answer this: what does the wkb value represent
 here.
 I know its the well known bytes, but how do I get back to wkt? I thought it
 was a byte array, but I can't seem to get my original values back. Form the
 values in the test case I have:

 POINT(15.2 60.1)


 wkb:

 [0,0,0,0,2,0,0,0,2,64,46,51,51,51,51,51,51,64,78,25,-103,-103,-103,-103,-102,64,46,-103,-103,-103,-103,-103,-102,64,78,12,-52,-52,-52,-52,-51]
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-02 Thread Craig Taverner
Hi Boris,

Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is
only recently exposed in REST and we do not expose most of the capabilities
I have discussed in this thread, or indeed in my other answer today.

I did recently add some REST methods that might work for you, specifically
the addEditableLayer, which makes a WKB layer, and the
addGeometryWKTToLayer, for adding any kind of Geometry (including
LineString) to the layer. However, these were only added recently, and I
have no experience using them myself, so consider this very much prototype
code. From your other question today, can I assume you are having trouble
making sense of the data coming back? So we need a better way to return the
results in WKT instead of WKB? One option would be to enhance the
addEditableLayer method to allow the creation of WKT layers instead of WKB
layers, so the internal representation is more internet friendly.

I've just added untested support for setting the format to WKT for the
internal representation of the editable layer in the REST interface. This is
untested (outside of my usual unit tests, that is), and is only in the trunk
of neo4j-spatial, but you are welcome to try it out and see what happens.

Regards, Craig

On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Hi Craig,

 Thanks so much for this reply. It is very insightful. Is it possible for me
 to implement the LineString geometries and lookups using REST?

 Many thanks!

 On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote:

  OK. I understand much better what you want now.
 
  Your person nodes are not geographic objects, they are persons that can
 be
  at many positions and indeed move around. However, the 'path' that they
  take
  is a geographic object and can be placed on the map and analysed
  geographically.
 
  So the question I have is how do you store the path the person takes? Is
  this a bunch of position nodes connected back to that person? Or perhaps
 a
  chain of position-(next)-position-(next)-position, etc? However you
 have
  stored this in the graph, you can express this as a geographic object by
  implementing the GeometryEncoder interface. See, for example, the 6 lines
  of
  code it takes to traverse a chain of NEXT locations and produce a
  LineString
  geometry in the SimpleGraphEncoder at
 
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
 
  
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
  If
  you do this, you can create a layer that uses your own geometry encoder
 (or
  the SimpleGraphEncoder I referenced above, if you use the same graph
  structure) and your own domain model will be expressed as LineString
  geometries and you can perform spatial operations on them.
 
  Alternatively, if your data is more static in nature, and you are
 analysing
  only what the person did in the past, and the graph will therefor not
  change, perhaps you do not care to store the locations in the graph, and
  you
  can just import them as a LineString directly into a standard layer.
 
  Whatever route you take, the final action you want to perform is to find
  points near the LineString (path the person took). I do not think the
  bounding box is the right approach for that either. You need to try, for
  example, the method findClosestEdges in the utilities class at
 
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115
 
  
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115
  This
  method can find the part of the persons path that it closest to the point
  of
  interest. There also also many other geographic operations you might be
  interested in trying, once you have a better feel for the types of
 queries
  you want to ask.
 
  Regards, Craig
 
  On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Thanks for the detailed response! Here is what I'm trying to do and I'm
   still not sure how to accomplish it:
  
   1. I have a node which is a person
  
   2. I have geo data as that person moves around the world
  
   3. I use the geodata to create a bounding box of where that person has
  been
   today
  
   4. I want to say, was this person A near location X today?
  
   5. I do this by seeing if location X is in A's bounding box.
  
   From looking at what you suggest doing, it's not clear how I assign the
   node
   person A to a layer? Is it that the bounding box is now in the layer
 and
   not
   in the node? The issue then becomes, how od I associate the two as the
   RTree
   relationship seems to establish itself on the bounding box between the
  node
   and the layer.
  
   Many thanks for your patience as I learn this challenging material

Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-07-02 Thread Craig Taverner
As I understand it, Andreas is working on the much more complex problem of
updating OSM geometries. That is more complex because it involves
restructuring the connected graph.

The case Boris has is much simpler, just modifying the WKT or WKB in the
editable layer. In the Java API this is simply to call the
GeometryEncoder.encodeGeometry() method, which will modify the geometry in
place (ie. replace the old geometry with a new one). However, I do not think
it is that simple on the REST interface. I can check, but think we will need
a new method for updating geometries. Internally it is trivial to code.

So I just added a quick method, called updateGeometryFromWKT, which requires
the geometry (in WKT), the existing geometry node-id, and the layer. Give it
a try.

On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.comwrote:

 Actually,
 Andreas Wilhelm is working right now on updating geometries.

 Sent from my phone.
 On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote:
  Wow that's great! I'll try it out asap. This leads to my next question:
 how
  do I update the geometry in a layer, rather than add new? What I am
 thinking
  of doing is having a multipoint geometery associated with each of my user
  nodes which will represent their location history. My plan is to add the
  geometry to a world layer and then associate the returned node with the
  user. How do I then add new points to that connecter node? Can I just
 edit
  the wkt and assume the index will update? Or do you have a better
 suggestion
  for doing this? I would rather avoid having each point be a seperate node
 as
  I am tracking gps data and getting lots of coordinates, it would be many
  thousands of nodes per user.
 
  Many thanks!
 
 
 
  On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com
 wrote:
 
  Hi Boris,
 
  Ah! You are using the REST API. That changes a lot, since Neo4j Spatial
 is
  only recently exposed in REST and we do not expose most of the
  capabilities
  I have discussed in this thread, or indeed in my other answer today.
 
  I did recently add some REST methods that might work for you,
 specifically
  the addEditableLayer, which makes a WKB layer, and the
  addGeometryWKTToLayer, for adding any kind of Geometry (including
  LineString) to the layer. However, these were only added recently, and
 I
  have no experience using them myself, so consider this very much
 prototype
  code. From your other question today, can I assume you are having
 trouble
  making sense of the data coming back? So we need a better way to return
  the
  results in WKT instead of WKB? One option would be to enhance the
  addEditableLayer method to allow the creation of WKT layers instead of
 WKB
  layers, so the internal representation is more internet friendly.
 
  I've just added untested support for setting the format to WKT for the
  internal representation of the editable layer in the REST interface.
 This
  is
  untested (outside of my usual unit tests, that is), and is only in the
  trunk
  of neo4j-spatial, but you are welcome to try it out and see what
 happens.
 
  Regards, Craig
 
  On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Hi Craig,
  
   Thanks so much for this reply. It is very insightful. Is it possible
 for
  me
   to implement the LineString geometries and lookups using REST?
  
   Many thanks!
  
   On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com
  wrote:
  
OK. I understand much better what you want now.
   
Your person nodes are not geographic objects, they are persons that
  can
   be
at many positions and indeed move around. However, the 'path' that
  they
take
is a geographic object and can be placed on the map and analysed
geographically.
   
So the question I have is how do you store the path the person
 takes?
  Is
this a bunch of position nodes connected back to that person? Or
  perhaps
   a
chain of position-(next)-position-(next)-position, etc? However
 you
   have
stored this in the graph, you can express this as a geographic
 object
  by
implementing the GeometryEncoder interface. See, for example, the 6
  lines
of
code it takes to traverse a chain of NEXT locations and produce a
LineString
geometry in the SimpleGraphEncoder at
   
   
  
 

 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
   

   
  
 

 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82
If
you do this, you can create a layer that uses your own geometry
  encoder
   (or
the SimpleGraphEncoder I referenced above, if you use the same
 graph
structure) and your own domain model will be expressed as
 LineString
geometries and you can perform spatial operations on them.
   
Alternatively, if your data is more static

[Neo4j] Cypher error in neo4j-spatial

2011-07-02 Thread Craig Taverner
Hi,

Recent builds of Neo4j-Spatial no longer like Peters new bounding box query.
Peter is on vacation, and I am not familiar with the code (nor cypher), so I
thought I would just dump the error message here for now in case someone can
give me a quick pointer.

The line of code is:
Query query = parser.parse( start n=(layer1,'bbox:[15.0, 16.0, 56.0,
57.0]') match (n) -[r] - (x) return n.bbox, r:TYPE, x.layer?, x.bbox? );

The error is:
org.neo4j.cypher.SyntaxError: string matching regex `\z' expected but `:'
found
at org.neo4j.cypher.parser.CypherParser.parse(CypherParser.scala:75)
at org.neo4j.cypher.javacompat.CypherParser.parse(CypherParser.java:39)
at
org.neo4j.gis.spatial.IndexProviderTest.testNodeIndex(IndexProviderTest.java:91)

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] traversing densely populated nodes

2011-06-30 Thread Craig Taverner
This topics has come up before, and the domain level solutions are usually
very similar, like Norbert's category/proxy nodes (to group by
type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can
build a generic user-level solution that can also be wrapped to appear as an
internal database solution?

For example, consider Niels's solution of the TimeLine index. In this case
we group all the nodes based on a consistent hash. Usually the timeline
would use a timestamp, but really any reasonably variable property can do,
even the node-id itself. Then we have a BTree between the dense nodes and
the root node (node with too many relationships). How about this crazy idea,
create an API that mimics the normal node.getRelationship*() API, but
internally traverses the entire tree? And also for creating the
relationships? So for most cod we just do the usual
node.createRelationshipTo(node,type,direction) and node.traverse(...), but
internally we actually traverse the b-tree.

This would solve the performance bottleneck being observed while keeping the
'illusion' of directly connected relationships. The solution would be
implemented mostly in the application space, so will not need any changes to
the core database. I see this as being of the same kind of solution as the
auto-indexing. We setup some initial configuration that results in certain
structures being created on demand. With auto-indexing we are talking about
mostly automatically adding lucene indexes. With this idea we are talking
about automatically replacing direct relationships with b-trees to resolve a
specific performance issue.

And when the relationship density is very low, if the b-tree is
auto-balancing, it could just be a direct relationship anyway.

On Wed, Jun 29, 2011 at 6:56 PM, Agelos Pikoulas
agelos.pikou...@gmail.comwrote:

 My problem pattern is exactly the same as Niels's :

 A dense-node has millions of relations of a certain direction  type,
 and only a few (sparse) relations of a different direction and type.
 The traversing is usually following only those sparse relationships on
 those
 dense-nodes.

 Now, even when traversing on these sparse relations, neo4j becomes
 extremely
 slow
 on a certainly non linear Order (the big cs O).

 Some tests I run (email me if u want the code) reveal that even the number
 of those dense-nodes in the database greatly influences the results.

 I just reported to Michael the runs with the latest M05 snapshot, which are
 not very positive...
 I have suggested an (auto) indexing of relationship types / direction that
 is used by traversing frameworks,
 but I ain't no graphdb-engine expert :-(

 A'


 Message: 5
  Date: Wed, 29 Jun 2011 18:19:10 +0200
  From: Niels Hoogeveen pd_aficion...@hotmail.com
  Subject: Re: [Neo4j] traversing densely populated nodes
  To: user@lists.neo4j.org
  Message-ID: col110-w326b152552b8f7fbe1312d8b...@phx.gbl
  Content-Type: text/plain; charset=iso-8859-1
 
 
  Michael,
 
 
 
  The issue I am refering to does not pertain to traversing many relations
 at
  once
 
  but the impact many relationship of one type have on relationships
 
  of another type on the same node.
 
 
 
  Example:
 
 
 
  A topic class has 2 million outgoing relationships of type HAS_INSTANCE
  and
 
  has 3 outgoing relationships of type SUB_CLASS_OF.
 
 
 
  Fetching the 3 relations of type SUB_CLASS_OF takes very long,
 
  I presume due to the presence of the 2 million other relationships.
 
 
 
  I have no need to ever fetch the HAS_INSTANCE relationships from
 
  the topic node. That relation is always traversed from the other
 direction.
 
 
 
  I do want to know the class of a topic instance, leading to he topic
 class,
 
  but have no real interest ever to traverse all topic instance from  the
  topic
 
  class (at least not directly.. i do want to know the most recent
 addition,
 
  and that's what I use the timeline index for).
 
 
 
  Niels
 
 
   From: michael.hun...@neotechnology.com
   Date: Wed, 29 Jun 2011 17:50:08 +0200
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] traversing densely populated nodes
  
   I think this is the same problem that Angelos is facing, we are
 currently
  evaluating options to improve the performance on those highly connected
  supernodes.
  
   A traditional option is really to split them into group or even kind of
  shard their relationships to a second layer.
  
   We're looking into storage improvement options as well as modifications
  to retrieval of that many relationships at once.
  
   Cheers
  
   Michael
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Database engine using Neo4j

2011-06-30 Thread Craig Taverner
Hi Kriti,

I can comment on a few things, especially neo4j-spatial:

   - Neo4j is certainly good for social networks, and people have used it
   for that, but I personally do not have experience with that so I will not
   comment further (others can chip in where necessary).
   - Neo4j-Spatial is good for performing some spatial queries on your
   domain data. So you start by modeling your domain however you want, and then
   when you want to start using neo4j-spatial, just add all nodes that have
   spatial components (eg. location) to the spatial index and they will be
   available for querying. The SimplePointLayer class has support for querying
   by proximity, which sounds like what you want. You can also query with a
   filter on properties (so only nearby objects matching some other criteria).
   - I do my neo4j-spatial development in eclipse, so there should be no
   issues for you using eclipse. Just use m2eclipse, and add the dependency to
   your pom.xml. The current version o neo4j-spatial requires neo4j1.4, so if
   you are using older neo4j, you might need to make minor changes.
   - Neo4j is not optimized for storing BLOBs, so while it can store images
   as byte[], it is advisable to rather store a reference to the image (eg.
   URI), and store the image in another way (filesystem, other database, etc.)

Regards, Craig

On Wed, Jun 29, 2011 at 2:06 PM, kriti sharma kriti.0...@gmail.com wrote:

 Dear Users,

 I am developing a time capsule DB engine using Neo4j as a database.
 I intend to develop three scales (temporal , geo/spatial and
 egocentric/personal relationships) in the db structure.
 for the geolocation part, i would like to be able to query upon a location
 keyword and also some nearby places/photos/people that i have in my DB.

 Do you think neo4j spatial will be a good choice for such a spatial scheme?
 I have developed a timeline in the usual neo4j using timeline feature. Can
 I
 simply integrate neo4j spatial in my existing code for neo4j in eclipse?

 i am retrieving data from twitter, flickr, facebook etc. so the format of
 data may not be uniform. Therefore i found Neo4j to be an excellent option.
 Has some work been done in modelling a user's Facebook data(friends and
 networks) relationships in Neo4j?

 How should I go about storing images in the DB?

 Thanks
 Kriti
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] traversing densely populated nodes

2011-06-30 Thread Craig Taverner
In the amanzi-index I link all indexed nodes into the index tree, so
traversals are straight up the tree. Of course this also means that there
are at least as many relationships as indexed nodes.

I was reviewing Michaels code for the relationship expander, and think that
is a great idea, tranparently using an index instead of the normal
relationships API, and can imagine using the relationship expander to
instead traverse the BTree to the final relationship to the leaf nodes.

So if we imagine a BTree with perhaps 10 or 20 hops from the root to the
leaf node, the relationship expander Michael described would complete all
hops and return only the last relationship, giving the illusion of direct
connections from root to leaf. This would certainly perform well, especially
for cases where there are factors limiting the number of relationships we
want returned. I think the request for type and direction is the first
obvious case, but we could be even more explicit than that, if we pass
constraints based on the BTree's consistent hash.

On Thu, Jun 30, 2011 at 11:36 PM, Niels Hoogeveen pd_aficion...@hotmail.com
 wrote:


 In theory the approach I described earlier could work, though there are
 some pitfalls to the current implementation that need ironing out before
 this can become a recommended approach.
 The choice of Timeline instead of Btree may actually be the wrong choice
 after all. I chose Timeline because of my familiarity with this particular
 class, but its implementation may actually not be all that suitable for this
 particular use case. This has to do with the fact that Timeline is not just
  a tree, but a list where entries with an interval of max. 1000 are stored
 in a Btree index. This works reasonably well for a Timeline, but makes the
 approach less ideal for storing dense relationships.
 The problem with the Timeline implementation is the ability to lookup the
 tree root from a particular leave. In an ordinary Btree is would simply be a
 traversal from the leave through the layers of block nodes to the tree root.
 In Timeline the traversal will be different. It first has to move through
 the Timeline list until it finds an entry that is stored in the Btree (which
 worst case takes 1000 hops), and then it has to traverse the Btree up to the
 tree root. To avoid this complicated traversal I ended up doing a lookup
 through Lucene of the timeline URI (which is stored in all timeline list
 entries). In fact I might as well have added the URI of the dense node as a
 property and do the lookup through Lucene without the Timeline, it just
 happens that I like the sort order of Timeline, making it a useful approach
 anyway.
 I will experiment using Btree directly (without Timeline) and see if that
 leads to a simpler and faster traversal from leave to root node.
 There is one more issue before this can become production ready. Btree as
 it is implemented now is not thread safe (per the implementations Javadocs),
 so it need some love and attention to make it work properly.
 Niels

  Date: Thu, 30 Jun 2011 13:57:20 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] traversing densely populated nodes
 
  This topics has come up before, and the domain level solutions are
 usually
  very similar, like Norbert's category/proxy nodes (to group by
  type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can
  build a generic user-level solution that can also be wrapped to appear as
 an
  internal database solution?
 
  For example, consider Niels's solution of the TimeLine index. In this
 case
  we group all the nodes based on a consistent hash. Usually the timeline
  would use a timestamp, but really any reasonably variable property can
 do,
  even the node-id itself. Then we have a BTree between the dense nodes and
  the root node (node with too many relationships). How about this crazy
 idea,
  create an API that mimics the normal node.getRelationship*() API, but
  internally traverses the entire tree? And also for creating the
  relationships? So for most cod we just do the usual
  node.createRelationshipTo(node,type,direction) and node.traverse(...),
 but
  internally we actually traverse the b-tree.
 
  This would solve the performance bottleneck being observed while keeping
 the
  'illusion' of directly connected relationships. The solution would be
  implemented mostly in the application space, so will not need any changes
 to
  the core database. I see this as being of the same kind of solution as
 the
  auto-indexing. We setup some initial configuration that results in
 certain
  structures being created on demand. With auto-indexing we are talking
 about
  mostly automatically adding lucene indexes. With this idea we are talking
  about automatically replacing direct relationships with b-trees to
 resolve a
  specific performance issue.
 
  And when the relationship density is very low, if the b-tree is
  auto-balancing, it could just be a direct 

Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
I have previously used two solutions to deal with multiple types in btrees:

   - My first index in 2009 was a btree-like n-dim index using generics to
   support int[], long[], float[] and double[] (no strings). I used this for
   TimeLine (long[1]) and Location (double[2]). The knowledge about what type
   was used was in the code for constructing the index (whether a new index or
   accessing an existing index in the graph).
   - In December I started my amanzi-index (on
githubhttps://github.com/craigtaverner/amanzi-index)
   that is also btree-like, n-dimensional. But this time it can index multiple
   types in the same tree (so a float, int and string in the same tree, instead
   of being forced to have all properties of the same type). It is a re-write
   of the previous index to support Strings, and mixed types. This time it does
   save the type information in meta-data at the tree root.

The idea of using a 'comparator' class for the types is similar, but simpler
than the idea I implemented for amanzi-index, where I have mapper classes
that describe not only how to compare types, but also how to map from values
to index keys and back. This includes (to some extent) the concept of the
lucene analyser, since the mapper can decide on custom distribution of, for
example, strings and category indexes.

For both of these indexes, you configure the index up front, and then only
call index.add(node) to index a node. This will fit in well with the new
auto-indexing ideas in neo4j.

On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:






 At this moment Btree only supports the primitive datatype long, while Rtree
 only supports the datatype double. For Btree it makes sense to at least
 support strings, floats, doubles and ints too. Use cases for these data
 types are pretty obvious and are Btree backed in (almost) every RDBMS
 product around.I think the best solution would be to create Comparator
 objects wrapping these primitive data types and store the class name of the
 comparator in root of the index tree. This allows users to create their own
 comparators for datatypes not covered yet. It would make sense people would
 want to store BigInt and BigDecimal objects in a Btree too, others may want
 to store dates (instead of datetime), fractions, complex numbers or even
 more exotic data types.
 Niels
  From: sxk1...@hotmail.com
  To: user@lists.neo4j.org
  Date: Tue, 28 Jun 2011 22:43:24 -0700
  Subject: Re: [Neo4j] neo4j-graph-collections
 
 
  I've read through this thread in more detail and have a few thoughts,
 when you talk about type I am assuming that you are referring to an
 interface that both (Btree,Rtree) can implement, for the data types I'd like
 to understand the use cases first before implementing the different data
 types, maybe we could store types of Object instead of Long or Double and
 implement comparators in a more meaningful fashion.   Also I was wondering
 if unit tests would need to be extracted out of the spatial component and
 embedded inside the graph-collections component as well or whether we'd
 potentially need to write brand new unit tests as well.
  Craig as I mentioned I'd love to help, let me know if it would be
 possible to fork a repo or to talk in more detail this week.
  Regards
 
   From: pd_aficion...@hotmail.com
   To: user@lists.neo4j.org
   Date: Wed, 29 Jun 2011 01:35:43 +0200
   Subject: Re: [Neo4j] neo4j-graph-collections
  
  
   As to the issue of n-dim doubles, it would be interesting to consider
 creating a set of classes of type Orderable (supporting , =, , =
 operations), this we can use in both Rtree and Btree. Right now Btree only
 supports datatype Long. This should also become more generic. A first step
 we can take is at least wrap the common datatypes in Orderable classes.
   Niels
  
Date: Wed, 29 Jun 2011 00:32:15 +0200
From: cr...@amanzi.com
To: user@lists.neo4j.org
Subject: Re: [Neo4j] neo4j-graph-collections
   
The RTree in principle should be generalizable, but the current
implementation in neo4j-spatial does make a few assumptions specific
 to
spatial data, and makes use of spatial envelopes for the tree node
 bounding
boxes. It is also specific to 2D. We could make a few improvements
 first,
like generalizing to n-dimensions, replacing the recursive search
 with a
traverser and generalizing the bounding boxes to be simple
 double-arrays.
Then the only thing left would be to decide if it is ok for it to be
 based
on n-dim doubles or should be generalized to more types.
   
On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal 
 sxk1...@hotmail.comwrote:
   
 I would be interested in helping out with this, let me know next
 steps.

 Sent from my iPhone

 On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen 
 pd_aficion...@hotmail.com
 wrote:

 
  A couple of weeks ago Peter Neubauer set up a repository for
 in-graph
 

Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
It is technically possible, but it is a somewhat specialized index, not a
normal BTree, so I think you would want both (mine and a classic btree). My
index performs better for certain data patterns, is best with semi-ordered
data and moderately even distributions (since it has no rebalancing), and
requires the developer to pick a good starting 'resolution' which means they
should know something about their data. Perhaps we just port some of the
typing support into a btree in the collections project?

On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Craig,
 Would it be possible to merge your work on Amanzi with the work the Neo
 team has done on the Btree component that is now in neo4j-graph-collections,
 so we can eventually have one implementation that meets a broad variety of
 needs?
 Niels

  Date: Wed, 29 Jun 2011 15:34:47 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-graph-collections
 
  I have previously used two solutions to deal with multiple types in
 btrees:
 
 - My first index in 2009 was a btree-like n-dim index using generics
 to
 support int[], long[], float[] and double[] (no strings). I used this
 for
 TimeLine (long[1]) and Location (double[2]). The knowledge about what
 type
 was used was in the code for constructing the index (whether a new
 index or
 accessing an existing index in the graph).
 - In December I started my amanzi-index (on
  githubhttps://github.com/craigtaverner/amanzi-index)
 that is also btree-like, n-dimensional. But this time it can index
 multiple
 types in the same tree (so a float, int and string in the same tree,
 instead
 of being forced to have all properties of the same type). It is a
 re-write
 of the previous index to support Strings, and mixed types. This time
 it does
 save the type information in meta-data at the tree root.
 
  The idea of using a 'comparator' class for the types is similar, but
 simpler
  than the idea I implemented for amanzi-index, where I have mapper classes
  that describe not only how to compare types, but also how to map from
 values
  to index keys and back. This includes (to some extent) the concept of the
  lucene analyser, since the mapper can decide on custom distribution of,
 for
  example, strings and category indexes.
 
  For both of these indexes, you configure the index up front, and then
 only
  call index.add(node) to index a node. This will fit in well with the new
  auto-indexing ideas in neo4j.
 
  On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
  pd_aficion...@hotmail.comwrote:
 
  
  
  
  
  
   At this moment Btree only supports the primitive datatype long, while
 Rtree
   only supports the datatype double. For Btree it makes sense to at least
   support strings, floats, doubles and ints too. Use cases for these data
   types are pretty obvious and are Btree backed in (almost) every RDBMS
   product around.I think the best solution would be to create Comparator
   objects wrapping these primitive data types and store the class name of
 the
   comparator in root of the index tree. This allows users to create their
 own
   comparators for datatypes not covered yet. It would make sense people
 would
   want to store BigInt and BigDecimal objects in a Btree too, others may
 want
   to store dates (instead of datetime), fractions, complex numbers or
 even
   more exotic data types.
   Niels
From: sxk1...@hotmail.com
To: user@lists.neo4j.org
Date: Tue, 28 Jun 2011 22:43:24 -0700
Subject: Re: [Neo4j] neo4j-graph-collections
   
   
I've read through this thread in more detail and have a few thoughts,
   when you talk about type I am assuming that you are referring to an
   interface that both (Btree,Rtree) can implement, for the data types I'd
 like
   to understand the use cases first before implementing the different
 data
   types, maybe we could store types of Object instead of Long or Double
 and
   implement comparators in a more meaningful fashion.   Also I was
 wondering
   if unit tests would need to be extracted out of the spatial component
 and
   embedded inside the graph-collections component as well or whether we'd
   potentially need to write brand new unit tests as well.
Craig as I mentioned I'd love to help, let me know if it would be
   possible to fork a repo or to talk in more detail this week.
Regards
   
 From: pd_aficion...@hotmail.com
 To: user@lists.neo4j.org
 Date: Wed, 29 Jun 2011 01:35:43 +0200
 Subject: Re: [Neo4j] neo4j-graph-collections


 As to the issue of n-dim doubles, it would be interesting to
 consider
   creating a set of classes of type Orderable (supporting , =, , =
   operations), this we can use in both Rtree and Btree. Right now Btree
 only
   supports datatype Long. This should also become more generic. A first
 step
   we can take is at least wrap the common datatypes in Orderable classes.
 Niels

Re: [Neo4j] neo4j-graph-collections

2011-06-29 Thread Craig Taverner
I think moving the RTree to the generic collections would not be too hard. I
saw Saikat showed interested in doing this himself.

Saikat, contact me off-list for further details on what I think could be
done to make this port.

On Wed, Jun 29, 2011 at 9:52 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Peter, I totally agree. Having the Rtree index removed of spatial
 dependencies in graph-collections should be our first priority. Once that is
 done we can focus on the other issues.
 Which doesn't mean we should stop discussing future improvements like
 setting up comparators (or something to that extent) that can be reusable,
 but we shouldn't try to get that up before Rtree is in graph-collections.
 Niels

  From: peter.neuba...@neotechnology.com
  Date: Wed, 29 Jun 2011 21:10:15 +0200
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-graph-collections
 
  Craig,
  just gave you push access to the graph collections in case you want to
  do anything there.
 
  Also, IMHO it would be more important to isolate and split out the
  RTree component from Spatial than to optimize it - that could be done
  in the new place with targeted performance tests later?
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen
  pd_aficion...@hotmail.com wrote:
  
   Craig,
   Would it be possible to merge your work on Amanzi with the work the Neo
 team has done on the Btree component that is now in neo4j-graph-collections,
 so we can eventually have one implementation that meets a broad variety of
 needs?
   Niels
  
   Date: Wed, 29 Jun 2011 15:34:47 +0200
   From: cr...@amanzi.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] neo4j-graph-collections
  
   I have previously used two solutions to deal with multiple types in
 btrees:
  
  - My first index in 2009 was a btree-like n-dim index using
 generics to
  support int[], long[], float[] and double[] (no strings). I used
 this for
  TimeLine (long[1]) and Location (double[2]). The knowledge about
 what type
  was used was in the code for constructing the index (whether a new
 index or
  accessing an existing index in the graph).
  - In December I started my amanzi-index (on
   githubhttps://github.com/craigtaverner/amanzi-index)
  that is also btree-like, n-dimensional. But this time it can index
 multiple
  types in the same tree (so a float, int and string in the same
 tree, instead
  of being forced to have all properties of the same type). It is a
 re-write
  of the previous index to support Strings, and mixed types. This
 time it does
  save the type information in meta-data at the tree root.
  
   The idea of using a 'comparator' class for the types is similar, but
 simpler
   than the idea I implemented for amanzi-index, where I have mapper
 classes
   that describe not only how to compare types, but also how to map from
 values
   to index keys and back. This includes (to some extent) the concept of
 the
   lucene analyser, since the mapper can decide on custom distribution
 of, for
   example, strings and category indexes.
  
   For both of these indexes, you configure the index up front, and then
 only
   call index.add(node) to index a node. This will fit in well with the
 new
   auto-indexing ideas in neo4j.
  
   On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen
   pd_aficion...@hotmail.comwrote:
  
   
   
   
   
   
At this moment Btree only supports the primitive datatype long,
 while Rtree
only supports the datatype double. For Btree it makes sense to at
 least
support strings, floats, doubles and ints too. Use cases for these
 data
types are pretty obvious and are Btree backed in (almost) every
 RDBMS
product around.I think the best solution would be to create
 Comparator
objects wrapping these primitive data types and store the class name
 of the
comparator in root of the index tree. This allows users to create
 their own
comparators for datatypes not covered yet. It would make sense
 people would
want to store BigInt and BigDecimal objects in a Btree too, others
 may want
to store dates (instead of datetime), fractions, complex numbers or
 even
more exotic data types.
Niels
 From: sxk1...@hotmail.com
 To: user@lists.neo4j.org
 Date: Tue, 28 Jun 2011 22:43:24 -0700
 Subject: Re: [Neo4j] neo4j-graph-collections


 I've read through this thread in more detail and have a few
 thoughts,
when you talk about type I am assuming that you are referring to an
interface that both 

Re: [Neo4j] neo4j-graph-collections

2011-06-28 Thread Craig Taverner
The RTree in principle should be generalizable, but the current
implementation in neo4j-spatial does make a few assumptions specific to
spatial data, and makes use of spatial envelopes for the tree node bounding
boxes. It is also specific to 2D. We could make a few improvements first,
like generalizing to n-dimensions, replacing the recursive search with a
traverser and generalizing the bounding boxes to be simple double-arrays.
Then the only thing left would be to decide if it is ok for it to be based
on n-dim doubles or should be generalized to more types.

On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote:

 I would be interested in helping out with this, let me know next steps.

 Sent from my iPhone

 On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com
 wrote:

 
  A couple of weeks ago Peter Neubauer set up a repository for in-graph
 datastructures: https://github.com/peterneubauer/graph-collections.
  At this time of writing only the Btree/Timeline index is part of this
 component.
  In my opinion it would be interesting to move the Rtree parts of
 neo-spatial to neo4j-graph-collections too.
  I looked at the code but don't feel competent to seperate out those
 classes that support generic Rtrees from those classes that are clearly
 spatial related.
  Is there any enthusiasm for such a project and if so, who is willing and
 able to do this?
  Niels
 
 
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] cassandra + neo4j graph

2011-06-27 Thread Craig Taverner
Hi,

I can comment on the spatial side. The
neo4j-spatialhttps://github.com/neo4j/neo4j-spatiallibrary provides
some tools for doing spatial analysis on your data. I do
not know exactly what you plan to do, but since you mention user and place
locations, I guess you are likely to be asking the database for proximity
searches (users near me, or places of interest near me), in which case the
SimplePointLayerhttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SimplePointLayer.javaclass
should provide you what you need. Read the code (linked above), it is
simple. Or read the related blog Neo4j Spatial, Part1: Finding things close
to other 
thingshttp://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html.
You also do not need to include neo4j-spatial from the beginning. Just model
your graph in a way suiting your domain, and when you want to enable spatial
searches, include neo4j-spatial dependencies in your pom and start using it.
If you happen to conform to one of the expected spatial structures, you can
add you nodes to the spatial index directly, otherwise implement a
GeometryEncoder and things should work from there. What I think you might
find interesting is that you can edit the search mechanism to filter on both
spatial and domain specific characteristics in the same pass. There are
various options for this, so we can discuss that later, should you wish.

Regards, Craig


On Mon, Jun 27, 2011 at 3:49 PM, Aliabbas Petiwala aliabba...@gmail.comwrote:

 thanks for the informative reply , to add more , the social networking
 website will be geo aware and some spatial info also needs to be
 stored  like the coordinates of the user node or the coordinates of
 the location\place how can we add more also will neo4j alone + spatial
 suffice ? can there be multiple masters for load balancing and how
 about splitting the graph in the design itself like designing in terms
 of multiple graphs which are mapped to a glue graph?
 hats off for building such a pioneering technology!

 regards,
 Aliabbas

 On 6/26/11, Jim Webber j...@neotechnology.com wrote:
  Hi Aliabbas,
 
  It's difficult to make pronouncements about your solution design without
  knowing about it, but here are some heuristics that can help you to plan
  whether you go with a native Neo4j solution or mix it up with other
 stores.
  All of these are only ideas and you should test first to ensure they make
  sense in your domain.
 
  1. Document/record size. If each node is likely to contain a lot of data
  (e.g. many megabytes) then you may choose to hold that outside of Neo4j
  (e.g. file system, KV store). Otherwise Neo4j.
 
  2. Length of individual fields. If they're small enough to fit within our
  short-string parameters (optimised around post codes, telephone numbers
 etc)
  then you get a performance boost compared to longer strings (which live
 in a
  separate store file in Neo4j). If your individual fields are really
 really
  long (See above, many megabytes), then consider moving them outside
 Neo4j.
  If you can slice up your fields into shorter strings then you'll get a
 good
  performance and footprint boost.
 
  3. Many machines. Neo4j has master/slave replication so write performance
 is
  asymptotically limited by the IO performance of the master (while reads
  scale horizontally, pretty much). The number of nodes you have is not a
  problem for Neo4j, so what is critical is whether a single master can
 handle
  the write load you want to throw at it. Since modern buses are fast, and
  since graph data structures are often less write-heavy than equivalents
 in
  other data stores*, I'd suggest that you might be well served by Neo4j
 here.
 
  But my overriding advice is to spike something with Neo4j and then, only
 if
  you find something that doesn't work in your context, to think about
 adding
  another data store.
 
  Jim
 
  * I'll be blogging about this shortly since it's a common enough
  misconception that 1000 writes in a relational/other NOSQL database
 implies
  1000 writes in a graph, whereas often it's a single write meaning graphs
 can
  be 1000 times better for the same workload.
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 


 --
 Aliabbas Petiwala
 M.Tech CSE
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Recent slowdown in imports with lucene

2011-06-26 Thread Craig Taverner
Sorry for the lack of details. I wrote the email late at night, as I am
again.

Anyway, the relevant code in github is
OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java.
When adding nodes to the graph, it also adds the osm-id to a lucene index.
There is no index#removal call, only multiple index#add calls within the
same transaction. In fact we call index.add and index.get for one index (osm
changesets), while calling index.add on another (osm-nodes). The relevant
lines of code are 812 for adding new OSM nodes to the graph, and 914 for
finding changesets in a different index.

I have not investigated for which version of neo4j the slowdown started, or
if there is somehow some other cause. I will try find time to do that later
this week. But I thought I should ask on the list anyway in case anyone else
has a similar problem, or if there are some obvious answers.

On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson
matt...@neotechnology.comwrote:

 Please elaborate on how you are using your index. Are you using
 Index#remove(entity,key) or Index#remove(entity) followed by get/query in
 the same tx? There was a recent change in transactional state
 implementation, where a full representation (in-memory lucene index) was
 needed for it to be able to return accurate results in some corner cases.
 That change could slow things down, but not that much though. I'll give
 some
 different scenarios a go and see if I can find some culprit for this.

 But again, a little more information would be useful, as always.

 2011/6/26 Craig Taverner cr...@amanzi.com

  Hi,
 
  Has anyone noticed a slowdown of imports into neo4j with recent
 snapshots?
  Neo4j-spatial importing OSM data (which uses lucene to find matching
 nodes
  for ways) is suddenly running much slower than usual on non-batch
 imports.
  For most of my medium sized test cases, I normally have surprisingly
  similar
  import times for batch inserter and non-batch inserter
  (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs
 the
  normal API is now more than 10 times slower. Down to 70 nodes per second,
  which is insanely slow.
 
  Any idea if there is something in the recent snapshots for me to look
 into?
  Reproducing the problem requires simply running the TestOSMImport test
  cases
  in neo4j-spatial. I have only tried this on my laptop, so I have not
 ruled
  out that there is something local going on.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Recent slowdown in imports with lucene

2011-06-26 Thread Craig Taverner
Hi again,

My apologies, but I have found the problem, and it is in the OSMImporter
itself, nothing to do with Lucene or Neo4j. Peter made a
commithttps://github.com/neo4j/neo4j-spatial/commit/b5e0f1d1a11ed9c8b2b8074f529362a1607a7643#src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.javain
May that while at first glance appears to be a cleanup of my code
(removal of string literals), it did have two meaningful changes I only saw
on deeper inspection:

   - Addition of the map type: exact to the index creating (when I
   removed this, node creation improved from 70/s to 140/s)
   - User control over the commit size (previously I had hard-coded this to
   5000 nodes per tx).

There was a small, but significant bug in the commit size, with the new user
parameter not being used to initialize anything, with the consequence that
every node was committed individually. Setting the block size back to 5000
increased the node creation rate to nearly 1 (over 100 times faster).
That is a serious improvement.

Sorry again for wasting space on the list. I'm glad this was a user error,
though, not a neo4j issue :-)

Regards, Craig

On Mon, Jun 27, 2011 at 12:54 AM, Craig Taverner cr...@amanzi.com wrote:

 Sorry for the lack of details. I wrote the email late at night, as I am
 again.

 Anyway, the relevant code in github is 
 OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java.
 When adding nodes to the graph, it also adds the osm-id to a lucene index.
 There is no index#removal call, only multiple index#add calls within the
 same transaction. In fact we call index.add and index.get for one index (osm
 changesets), while calling index.add on another (osm-nodes). The relevant
 lines of code are 812 for adding new OSM nodes to the graph, and 914 for
 finding changesets in a different index.

 I have not investigated for which version of neo4j the slowdown started, or
 if there is somehow some other cause. I will try find time to do that later
 this week. But I thought I should ask on the list anyway in case anyone else
 has a similar problem, or if there are some obvious answers.


 On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson 
 matt...@neotechnology.com wrote:

 Please elaborate on how you are using your index. Are you using
 Index#remove(entity,key) or Index#remove(entity) followed by get/query in
 the same tx? There was a recent change in transactional state
 implementation, where a full representation (in-memory lucene index) was
 needed for it to be able to return accurate results in some corner cases.
 That change could slow things down, but not that much though. I'll give
 some
 different scenarios a go and see if I can find some culprit for this.

 But again, a little more information would be useful, as always.

 2011/6/26 Craig Taverner cr...@amanzi.com

  Hi,
 
  Has anyone noticed a slowdown of imports into neo4j with recent
 snapshots?
  Neo4j-spatial importing OSM data (which uses lucene to find matching
 nodes
  for ways) is suddenly running much slower than usual on non-batch
 imports.
  For most of my medium sized test cases, I normally have surprisingly
  similar
  import times for batch inserter and non-batch inserter
  (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs
 the
  normal API is now more than 10 times slower. Down to 70 nodes per
 second,
  which is insanely slow.
 
  Any idea if there is something in the recent snapshots for me to look
 into?
  Reproducing the problem requires simply running the TestOSMImport test
  cases
  in neo4j-spatial. I have only tried this on my laptop, so I have not
 ruled
  out that there is something local going on.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Mattias Persson, [matt...@neotechnology.com]
 Hacker, Neo Technology
 www.neotechnology.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user



___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Recent slowdown in imports with lucene

2011-06-25 Thread Craig Taverner
Hi,

Has anyone noticed a slowdown of imports into neo4j with recent snapshots?
Neo4j-spatial importing OSM data (which uses lucene to find matching nodes
for ways) is suddenly running much slower than usual on non-batch imports.
For most of my medium sized test cases, I normally have surprisingly similar
import times for batch inserter and non-batch inserter
(EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the
normal API is now more than 10 times slower. Down to 70 nodes per second,
which is insanely slow.

Any idea if there is something in the recent snapshots for me to look into?
Reproducing the problem requires simply running the TestOSMImport test cases
in neo4j-spatial. I have only tried this on my laptop, so I have not ruled
out that there is something local going on.

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j -- Can it be embedded in Android?

2011-06-24 Thread Craig Taverner
I heard that Peter Neubauer made a port of neo4j to android a few years ago,
but that nothing has been done since and no version since then would work.
So my understanding is that it does not work on android, but that it is
possible to make it work (with some work ;-).

Peter is away, but I expect he would have a better answer than me.

On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com
 wrote:

 Dear All,

 I have googled for this on the web and did not arrive at a satisfactory
 answer.

 *Question: Is it possible to run Neo4j on Android? *

 Thanks,

 Sidharth

 --
 Sidharth Kshatriya
 www.sidk.info
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j -- Can it be embedded in Android?

2011-06-24 Thread Craig Taverner
Personally what I would like to see would be a sub-graph approach, with the
android device storing a sub-graph of the main database, and updating that
asynchronously with the server. Seems like something that can be done in a
domain specific way, but much harder to do generically. I wanted this for
OSM, with the local OSM graph on the android device representing a local map
supporting fast LBS services, and automatically updating from the main OSM
graph on a big central server as the user travels.

On Fri, Jun 24, 2011 at 2:56 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 I think the limited capabilities of the Android device(s) (RAM, primarily)
 limit the usefulness of Neo4J versus alternatives since the datasets are
 usually small and simple in mobile apps.  If we need any heavy-duty graph
 work for a mobile app, we'd do it on the server.

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On Behalf Of Sidharth Kshatriya
 Sent: Friday, June 24, 2011 8:53 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android?

 Yes, I saw that on the mailing list archives too. I would have though there
 would be some interest in using this on android -- but there seems to be no
 news about it since...

 On Fri, Jun 24, 2011 at 6:13 PM, Rick Bullotta
 rick.bullo...@thingworx.comwrote:

  I remember something like that, too.  The main issue is probably the
  non-traditional file system that Android exposes.
 
  -Original Message-
  From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
  On Behalf Of Craig Taverner
  Sent: Friday, June 24, 2011 8:37 AM
  To: Neo4j user discussions
  Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android?
 
  I heard that Peter Neubauer made a port of neo4j to android a few years
  ago,
  but that nothing has been done since and no version since then would
 work.
  So my understanding is that it does not work on android, but that it is
  possible to make it work (with some work ;-).
 
  Peter is away, but I expect he would have a better answer than me.
 
  On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya 
  sid.kshatr...@gmail.com
   wrote:
 
   Dear All,
  
   I have googled for this on the web and did not arrive at a satisfactory
   answer.
  
   *Question: Is it possible to run Neo4j on Android? *
  
   Thanks,
  
   Sidharth
  
   --
   Sidharth Kshatriya
   www.sidk.info
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Sidharth Kshatriya
 www.sidk.info
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j-spatial roadmap/stability

2011-06-23 Thread Craig Taverner
Hi Christopher,

Thanks for your interest in neo4j and neo4j-spatial. I will answer your
questions and comments inline.

I am working for the largest German speaking travel and holiday portal.
 Currently we are using a relatively simple MySQL based spatial distance
 functionality. We plan to enhance this by something which is capable of a
 flexible set of spatial queries. We will evaluate Neo4j-Spatial for that
 and
 benchmark it against PostGIS/PostGreSQL.


This would be a very interesting application for neo4j-spatial. I'm sure we
could support you in that. Obviously it is not as mature as PostGIS, but I
think it is very suitable for flexible queries, especially if you plan to
combine a complex domain model with spatial data, or expose a spatial
element to existing domains.

I found some Roadmap descriptions in the Neo4j Wiki (
 http://wiki.neo4j.org/content/Neo4j_Spatial_Project_Plan), but I am not
 sure
 that these are still valid. Craig said (somewhere) that Neo4j Spatial is
 still alpha (I hope that this means that only the interfaces are still
 unstable). And I know that neo4j-spatial is an open source project where
 there is no Neo Technology responsibility.


The project plan you found was unfortunately the original plan put down
before neo4j-spatial really started, and represents the expectations for
2010. Most of these were met, and several other capabilities achieved in
addition. I will edit the wiki to more accurately reflect the current status
of the project.

However, it is still true that it is in an alpha state. The API's are likely
to change. Since last September we have viewed it as an alpha release,
available for people to try out and provide feedback on. We believe it is
capable of many useful tasks, and can be used for real applications. But it
has not been in the 'wild' for long, and so there are probably remaining
bugs and performance issues. In addition, as mentioned before, we will
almost certainly change the API's a little as we receive more feedback and
move the system forward. Already in 2011 there have been three new additions
influencing the API: the SimplePointLayer for LBS and related capabilities,
the beginnings of the REST API for inclusion in Neo4j-Server, and the
Geoprocessing features.

Can you drop a few words about the Spatial roadmap, its stability and
 planned licensing (all based on using it on a high volume web site)?


I think we need Peter's opinion on the licensing. I believe it is currently
the same as neo4j itself. The code comments state AGPL, and I am not sure if
the recent decision to move the core to GPL is applicable to the spatial
code.

For the roadmap we will also update the wiki pages. Currently the efforts
are to:

   - Improve the OSM model API (some basic API for exploring the OSM ways
   and nodes, already in place but needing some refinement)
   - Improve the REST API for spatial (we have some customers trying this
   out, and will make enhancements based on their feedback)
   - Integrate the spatial index into the new automatic indexing feature of
   Neo4j (some initial prototype of this is in place, and will be refined for
   the 1.5 release of Neo4j)
   - Improved Geoprocessing support, particularly on the OSM model. This is
   involving a GSoC project and will be presented at FOSS4G in Denver this
   year. See
   http://2011.foss4g.org/sessions/geoprocessing-neo4j-spatial-and-osm

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] More spatial questions

2011-06-19 Thread Craig Taverner
Hi Nolan,

I think I can answer a few of your questions. Firstly, some background. The
graph model of the OSM data is based largely on the XML formated OSM
documents, and there you will find 'nodes', 'ways', 'relations' and 'tags'
each as their own xml-tag, and as a consequence each will also have their
own neo4j-node in the graph. Another point is that the geometry can be based
on one or more nodes or ways, and so we always create another node for the
geometry, and link it to the osm-node, way or relation that represents that
geometry.

What all this boils down to is that you cannot find the tags on the geometry
node itself. You cannot even find the location on that node. If you want to
use the graph model in a direct way, as you have been trying, you really do
need to know how the OSM data is modeled. For example, for a LineString
geometry, you would need to traverse from the geometry node to the way node
and finally to the tags node (to get the tags). To get to the locations is
even more complex. Rather than do that, I would suggest that you work with
the OSM API we provided with the OSMLayer, OSMDataset and OSMGeometryEncoder
classes. Then you do not need to know the graph model at all.

For example, OSMDataset has a method for getting a Way object from a node,
and the returned object can be queried for its nodes, geometry, etc.
Currently we provide methods for returning neo4j-nodes as well as objects
that make spatial sense. One minor issue here is the ambiguity inherent in
the fact that both neo4j and OSM make use of the term 'node', but for
different things. We have various solutions to this, sometimes replacing
'node' with 'point' and sometimes prefixing with 'osm'. The unit tests in
TestsForDocs includes some tests for the OSM API.

My first goal is to find the nearest OSM node to a given lat, lon. My
 attempts seem to be made of fail thus far, however. Here's my code:


Most of the OSM dataset is converted into LineStrings, and what you really
want to do is find the closest vertex of the closest LineString. We have a
utility function 'findClosestEdges' in the SpatialTopologyUtils class for
that. The unit tests in TestSpatialUtils, and the testSnapping() method in
particular, show use of this.

My thinking is that nodes should be represented as points, so I can't
 see why this fails. When I run this in a REPL, I do get a node back. So
 far so good. Next, I want to get the node's tags. So I run:


The spatial search will return 'geometries', which are spatial objects. In
neo4j-spatial every geometry is represented by a unique node, but it is not
required that that node contain coordinates or tags. That is up to the
GeometryEncoder. In the case of the OSM model, this information is
elsewhere, because of the nature of the OSM graph, which is a highly
interconnected network of points, most of which do not represent Point
geometries, but are part of much more complex geometries (streets, regions,
buildings, etc.).

n.getSingleRelationship(OSMRelation.TAGS, Direction.INCOMING)


The geometry node is not connected directly to the tags node. You need two
steps to get there. But again, rather than figure out the graph yourself,
use the API. In this case, instead of getting the geometry node from the
SpatialDatabaseRecord, rather just get the properties using getPropertyNames
and getProperty(String). This API works the same on all kinds of spatial
data, and in the case of OSM data will return the TAGS, since those are
interpreted as attributes of the geometries.

n.getSingleRelationship(OSMRelationship.GEOM,
 Direction.INCOMING).getOtherNode(n).getPropertyKeys
 I see what appears to be a series of tags (oneway, name, etc.) Why are
 these being returned for OSMRelation.GEOM rather than OSMRelation.TAGS?


These are not the tags. Now you have found the node representing an OSM
'Way'. This has a few properties on it that are relevant to the way, the
name, whether the street is oneway or not, etc. Sometimes these are based on
values in the tags, but they are not the tags themselves. This node is
connected to the geometry node and the tags node, so you were half-way there
(to the tags that is). You started at the geometry node, and stepped over to
the way node, and one more step (this time with the TAGS relationship) would
have got you to the tags.

But again, I advise against trying to explore the OSM graph by itself. As
you have already found, it is not completely trivial. What you should have
done is access the attributes directly from the search results.

Additionally, I see the property way_osm_id, which clearly isn't a tag.
 It would also seem to indicate that this query returned a way rather
 than a node like I'd hoped. This conclusion is further born out by the
 tag names. So clearly I'm not getting the search correct. But beyond
 that, the way being returned by this search isn't close to the lat,lon I
 provided. What am I missing?


The lat/long values are quite a bit deeper in the graph. In the case 

Re: [Neo4j] Auto Indexing for Neo4j

2011-06-18 Thread Craig Taverner
I am using only one relationship type in my index tree, and made traversal
decisions based on properties of the tree nodes, but have considered an
'optimization' based on embedding the index keys into the relationship
types, which I think is what you did. However, I am not convinced it will
work well because I suspect there will be losses if the total number of
relationship types gets very high. I think this is a separate issue to the
total number of relationships, but might affect all traversers, since there
must exist a hashmap of all relationship types.

Still it is very cool what Peter says below, because if all these
'experiments' with in-graph indexes can get put behind the standard index
API, then we can get much more testing of this approach, and hopefully learn
what we need to make this a viable solution for wide use.

On Wed, Jun 15, 2011 at 4:56 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 A problem with a probably dumb index in a graph that I created for an
 experiment was the
 performance of getAllRelationships on that machine (it was a very large
 graph with all nodes being indexed).

 It was a mapping from long values to nodes, my simplistic approach just
 chopped the long values into chunks of 3 digits and used those 3 digits as
 relationship-types (i.e. 1000 additional rel-types).
 to form a tree which pointed to the node in question at the end.

 Will have to investigate that further.


 Am 14.06.2011 um 23:43 schrieb Peter Neubauer:

  Craig,
  the autoindexing is one step in this direction. The other is to enable
  the Spatial and other in-graph indexes like the graph-collections
  (timeline etc) at all to be treated like normal index providers. When
  that is done (will talk to Mattias who is coming back from vacation
  tomorrow on that), we are in a position to think about more complex
  autoindex providers.
 
  Also, the possibility to treat Neo4j Spatial and other graph
  structures as index providers, would hook into the index framework and
  expose things to higher level queries like Cypher and Gremlin, e.g.
  combining a spatial bounding box geometry search with a graph
  traversal for suitable properties that are less than 2 kilometers from
  the nearest school, sorting the results, returning only price and lat
  as columns, the 3 topmost hits.
 
  START geom = (index:spatial:'BBOX(the_geom, -90, 40, -60, 45)')
  MATCH (geom)--(fast), (fast)-[r, :NEAR]-(school)
  WHERE fast.roooms4 AND school.classes4 AND r.length2return
  fast.pic?, fast.lon?, fast.lat?
  SORT BY fast.price, fast.lat^
  SLICE 3
 
  So, I think the next step is to make in-graph indexing structures plug
  into the index framework, and then into autoindexing :)
 
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Tue, Jun 14, 2011 at 5:49 PM, Craig Taverner cr...@amanzi.com
 wrote:
  This is great news.
 
  Now I'm really curious about the next step, and that is allowing indexes
  other than lucene. For example, the RTree index in neo4j-spatial was
 never
  possible to wrap behind the normal index API, because that was designed
 only
  for properties of nodes (and relationships), but the RTree is based on
  something completely different (complete spatial geometries). However,
 the
  new auto-indexing feature implies that any node can be added to an index
  without the developer needing to know anything about the index API.
 Instead
  the index needs to know if the node is appropriate for indexing. This is
  suitable for both lucene and the RTree.
 
  So what I'd like to see is that when configuring auto-indexing in the
 first
  place, instead of just specifying properties to index, specify some
 indexer
  implementation that can be created and run internally. For example,
 perhaps
  you pass the classname of some class that implements some necessary
  interface, and then that is instantiated, passed config properties, and
 used
  to index new or modified nodes. One method I could imagine this
 interface
  having would be a listener for change events to be evaluated for whether
 or
  not the index should be activated for a node change. For the lucene
 property
  index, this method would return true if the property exists on that
 node.
  For the RTree this method would return true if the node contained the
  meta-data required for neo4j-spatial to recognize it as a spatial type?
  Alternatively just an index method that does nothing when the nodes are
 not
  to be indexed, and indexes when necessary?
 
  So, are we now closer to having this kind of support?
 
  On Tue, Jun 14, 2011 at 11:30 PM, Chris

Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

2011-06-15 Thread Craig Taverner
Could this also be related to the possibility that in order to determine
relationship type and direction, the relationships need to be loaded from
disk? If so, then having a large number of relationships on the same node
would decrease performance, if the number was large enough to affect the
disk io caching.

If this is the case, perhaps adding a proxy node for the incoming
relationships would work-around the problem? Of course then you have doubled
the number of part nodes (two for each part, one part and one containers
proxy).

On Wed, Jun 15, 2011 at 10:27 PM, Rick Bullotta rick.bullo...@thingworx.com
 wrote:

 I would respectfully disagree that it doesn't necessarily represent
 production usage, since in some cases, each query/traversal will be unique
 and isolated to a part of a subgraph, so in some cases, a cold query may
 be the norm

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On Behalf Of Michael Hunger
 Sent: Wednesday, June 15, 2011 10:25 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

 That is rather a case of warming up your caches.

 Determining the traversal speed from the first run is not a good benchmark
 as it doesn't represent production usage :)
 The same (warming up) is true for all kinds of benchmarks (except for
 startup performance benchmarks).

 Cheers

 Michael

 Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas:

  I have a few Part nodes related with each via HASPART
  relationship/edges.
  (eg Part1---HASPART---Part2---HASPART---Part3 etc) .
  TraversalDescription works fine, following each Part's outgoing HASPART
  relationship.
 
  Then I add a large number (say 100.000) of Container Nodes, where each
  Container has a CONTAINS relation to almost *every* Part node.
  Hence each Part node now has a 100.000 incoming CONTAINS relationships
 from
  Container nodes,
  but only a few outgoing HASPART relationships to other Part nodes.
 
  Now my previous TraversalDescription run extremely slow (several seconds
  inside each IteratorPath.next() call)
  Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on
 the
  TraversalDescription,
  but it seems its not used by neo4j as a hint. Note that on a subsequent
 run
  of the same Traversal, its very quick indeed.
 
  Is there any way to use Indexing on relationships for such a scenario, to
  boost things up ?
 
  Ideally, the Traversal framework could use automatic/declerative indexing
 on
  Node Relationship types and/or direction to perform such traversals
 quicker.
 
  Regards
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Most Efficient way to query in my use cases

2011-06-15 Thread Craig Taverner
Another common thing to do in this case is create a node for the purchase
action. This node would be related to the purchaser (user), item (pen) and
shop, and would contain data appropriate to the purchase (date/time, price,
etc).

Then traverse from the shop or the pen to all purchase actions that
reference the other one (shop or pen).

On Thu, Jun 16, 2011 at 4:48 AM, Jim Webber j...@neotechnology.com wrote:

 Hi Manav,

 I think there's a relationship missing here.

 Pen--SOLD_BY--shop

 That way it's easy to find all the pens that a shop sold, and who them sold
 them to.

 In general modelling your domain expressively does not come at an increase
 cost with Neo4j (caveat: you can still create write hotspots).

 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Slow Traversals on Nodes with too many Relationships

2011-06-15 Thread Craig Taverner
I understood that on windows the memory mapped sizes needed to be included
in the heap, since they are not allocated outside the heap as they are on
linux/mac. So in this case he needs a larger heap (and make sure the memory
mapped files are much smaller than the heap). The relevant part of the
configuration settings doc says:

When running Neo4j on Windows the size of the memory-mapped nioneo
configurations need to be added to the heap size parameter. On Linux and
Unix-systems memory mapped IO is not included in the heap size.


I still think that the solution to this case is to group the different
relationship types into separate sub-graphs, so that the performance of
traversing  HAS_ONE is not affected by the number of relationships of
CONTAINS. Of course traversing the CONTAINS will still be slow without
increasing the cache, as you suggest.

On Thu, Jun 16, 2011 at 12:07 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Agelos,

 sorry, didn't want to sound that way.

 512M ram is not very much for larger graphs. Neo4j has to cache nodes,
 relationships in the heap as well as you own datastructures.

 The memory mapped files for the datastores are kept outside the heap.

 Normally with your 4G I'd suggest using about 1.5G for heap and 1.5G for
 the memory mapped files.
 http://wiki.neo4j.org/content/Configuration_Settings

 Do you have a small test-case available that creates your graph and runs
 your traversal? Then I could have a look at that and also do some
 profiling to determine the issues for this slowdown.

 The indexing doesn't help as it also has to hit caches or disk. The graph
 traversal is normally a very efficient operation that shouldn't experience
 this bad performance.

 Cheers

 Michael


 P.S. I just use my mail client for handling the mailing list and it works
 fine for me. Imho Gmail groups threads automatically.


 Am 15.06.2011 um 17:40 schrieb Agelos Pikoulas:

  Re: [Neo4j] Slow Traversals on Nodes with too many
Relationships
 
  I have to respectfully agree with Rick Bullotta.
 
  I was suspecting the big-O is not linear for this case.
 
  To verify I added x4 Container nodes (400.000) and their appropriate
  Relationships, and it is now *unbelievably* slow :
  It does not take x4 more, but it takes more than 30-40 seconds for each
  next() Remind you 100K nodes = ~2secs for each next() !!!
 
  And only to make matters worse, the subsequent runs weren't fast either -
  they actually took more time than the first
  (1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms)
 
  The whole setup is running on
  Eclipse 3.6, with -Xmx512m on JavaVM,
  Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ
  Vertex 2). The neo4J data resides on this SSD.
  The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB.
 
  I wonder what would happen if the Container nodes were a few million
 (which
  will be my case) - it will run forever.
 
  Could you please looking into my suggestion - i.e Using a 'smart' behind
  the scenes Indexing on both *RelationshipType* and *Direction* that
  Traversals actually use to boost things up ?
 
  To another topic, how does one use this mailing list - I use it through
  gmail and I am utterly lost - is there a better client/UI to actually
  post/reply into threads ?
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Auto Indexing for Neo4j

2011-06-14 Thread Craig Taverner
This is great news.

Now I'm really curious about the next step, and that is allowing indexes
other than lucene. For example, the RTree index in neo4j-spatial was never
possible to wrap behind the normal index API, because that was designed only
for properties of nodes (and relationships), but the RTree is based on
something completely different (complete spatial geometries). However, the
new auto-indexing feature implies that any node can be added to an index
without the developer needing to know anything about the index API. Instead
the index needs to know if the node is appropriate for indexing. This is
suitable for both lucene and the RTree.

So what I'd like to see is that when configuring auto-indexing in the first
place, instead of just specifying properties to index, specify some indexer
implementation that can be created and run internally. For example, perhaps
you pass the classname of some class that implements some necessary
interface, and then that is instantiated, passed config properties, and used
to index new or modified nodes. One method I could imagine this interface
having would be a listener for change events to be evaluated for whether or
not the index should be activated for a node change. For the lucene property
index, this method would return true if the property exists on that node.
For the RTree this method would return true if the node contained the
meta-data required for neo4j-spatial to recognize it as a spatial type?
Alternatively just an index method that does nothing when the nodes are not
to be indexed, and indexes when necessary?

So, are we now closer to having this kind of support?

On Tue, Jun 14, 2011 at 11:30 PM, Chris Gioran 
chris.gio...@neotechnology.com wrote:

 Good news everyone,

 A request that's often come up on the mailing list is a mechanism for
 automatically indexing properties of nodes and relationships.

 As of today's SNAPSHOT, auto-indexing is part of Neo4j which means nodes
 and relationships can now be indexed based on convention, requiring
 far less effort and code from the developer's point of view.

 Getting hold of an automatic index is straightforward:

 AutoIndexerNode nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
 AutoIndexNode nodeAutoIndex = nodeAutoIndexer.getAutoIndex();

 Once you've got an instance of AutoIndex, you can use it as a read-only
 IndexNode.

 The AutoIndexer interface also supports runtime changes and
 enabling/disabling the auto indexing functionality.

 To support the new features, there are new Config
 options you can pass to the startup configuration map in
 EmbeddedGraphDatabase, the most important of which are:

 Config.NODE_AUTO_INDEXING (defaults to false)
 Config.RELATIONSHIP_AUTO_INDEXING (defaults to false)

 If set to true (independently of each other) these properties will
 enable auto indexing functionality and at the successful finish() of
 each transaction, all newly added properties on the primitives for which
 auto indexing is enabled will be added to a special AutoIndex (and
 deleted or changed properties will be updated accordingly too).

 There are options for fine grained control to determine
 properties are indexed, default behaviors and so forth. For example, by
 default all properties are indexed. If you want only properties name and
 age for Nodes and since and until for Relationships
 to be auto indexed, simply set the initial configuration as follows:

 Config.NODE_KEYS_INDEXABLE = name, age;
 Config.RELATIONSHIP_KEYS_INDEXABLE=since, until;

 For the semantics of the auto-indexing operations, constraints and more
 detailed examples, see the documentation available  at

 http://docs.neo4j.org/chunked/1.4-SNAPSHOT/auto-indexing.html

 We're pretty excited about this feature since we think it'll make your
 lives
 as developers much more productive in a range of use-cases. If you're
 comfortable with using SNAPSHOT versions of Neo4j, please try it out
 and let us know what you think - we'd really value your feedback.

 If you're happier with using packaged milestones then this feature
 will be available from 1.4 M05 in a couple of weeks from now.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversals versus Indexing

2011-06-13 Thread Craig Taverner
Think of your domain model graph as a kind of index. Traversing that should
generally be faster than a generic index like lucene. Of course some things
do not graph well, and you should use lucene for those. But if you can find
something with a graph traversal, that is likely the way to go.

Also you should think of structuring the graph to suit the queries you plan
to perform. Then you will optimize the traversals.
On Jun 13, 2011 11:33 AM, espeed ja...@jamesthornton.com wrote:
 It depends on the traversal you are running.

 --
 View this message in context:
http://neo4j-user-list.438527.n3.nabble.com/Neo4j-Traversals-versus-Indexing-tp3057515p3057538.html
 Sent from the Neo4J User List mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j-spatial

2011-06-09 Thread Craig Taverner
Hi Saikat,

Yes, your explanation was clear, but I was busy with other work and failed
to repond - my bad ;-)

Anyway, your idea is nice. And I can think of a few ways to model this in
the graph, but at the end of the day the most important thing to decide
first is what queries are you going to perform? Do you want a creative map,
that while not drawn to scale, can still be asked questions like 'how far
from the roller-coaster to the closest lunch venue?'. That kind of question
could make use of the graph and the spatial extensions to provide an answer
and show the route on the creative map, even if it is not a real to-scale
map. Is that what you want to see?

You can try contact me on skype also.

Regards, Craig

On Thu, Jun 9, 2011 at 5:35 AM, Saikat Kanjilal sxk1...@hotmail.com wrote:


 Hi Craig,Following up on this thread, was this explanation clear?  If so
 I'd like to talk more details.Regards

 From: sxk1...@hotmail.com
 To: user@lists.neo4j.org
 Subject: RE: [Neo4j] neo4j-spatial
 Date: Sun, 5 Jun 2011 20:15:27 -0700








 Hey Craig,Thanks for responding, so to be clear a theme park can have its
 own map created by the graphic artists that work at the theme park company,
 this map is sometimes 2D or sometimes a 3D map that really has no notion of
 lat long coordinates or GPS.  What I am proposing is that we have the
 ability to inject GPS coordinates into this creative map through some
 mechanism that understands what the GPS coordinates of each point in this
 creative map are.  So thats where the google map comes in, the google or
 bing map would potentially have lat long coordinates of every point in a
 theme park, so now the challenge is how do we transfer that knowledge inside
 this 2D or 3D creative map so that we can run neo4j traversal algorithms
 inside a map that has been injected with GPS data.  A theme park is just the
 beginning, imagine having the power to inject this information into any 2D
 or 3D map, that would be pretty amazing.In essence I am doing this so
 that the creative map itself
  can use neo4j and be highly interactive and meaningful.
 Let me know if that's still unclear and if so lets talk on skype.
 Regards

  Date: Mon, 6 Jun 2011 01:13:08 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] neo4j-spatial
 
  Hi Saikat,
 
  This sounds worth discussing further. I think I need to hear more about
 your
  use case. I do not know what the term 'creative map' means, and what
  traversals you are planning to do? When you talk about 'plotting points',
 do
  you mean you have a GPS and are moving inside a real theme park and want
 to
  see this inside google maps? Or are you just drawing a path on an
  interactive GIS?
 
  I think once I have some more understanding of what your use case is,
 what
  problem you are trying to solve, I am sure I will be able to give advice
 on
  how best to approach it, if it relates to anything else we are doing, or
  whether this is something you would need to put some coding time into :-)
 
  Regards, Craig
 
  On Sun, Jun 5, 2011 at 8:26 PM, Saikat Kanjilal sxk1...@hotmail.com
 wrote:
 
  
   Craig et al,I have an interesting usecase that I've been thinking about
 and
   I was wondering if it would make a good candidate for inclusion inside
   neo4j-spatial, I've read through the wiki (
   http://wiki.neo4j.org/content/Collaboration_on_Spatial_Projects) and
 was
   interested in using neo4j-spatial to take any creative 2D Map and
   geo-enabling it.  To explain in more detail lets say you are at a
 certain
   latitude and longitude in a theme park inside a google map (or a bing
 map),
   now you want to have the ability to reference that same latitude and
   longitude inside a 2d or a 3d creative map of that theme park and then
 be
   able to plot these points and enable traversal algorithms inside the
   creative map.
   I was wondering if you guys are thinking about this usecase, if not I'd
   love to work on and discuss this in more detail to see whether this
 fits
   into the neo4j-spatial roadmap.
   Thoughts?
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-08 Thread Craig Taverner
OK. I understand much better what you want now.

Your person nodes are not geographic objects, they are persons that can be
at many positions and indeed move around. However, the 'path' that they take
is a geographic object and can be placed on the map and analysed
geographically.

So the question I have is how do you store the path the person takes? Is
this a bunch of position nodes connected back to that person? Or perhaps a
chain of position-(next)-position-(next)-position, etc? However you have
stored this in the graph, you can express this as a geographic object by
implementing the GeometryEncoder interface. See, for example, the 6 lines of
code it takes to traverse a chain of NEXT locations and produce a LineString
geometry in the SimpleGraphEncoder at
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82

https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82If
you do this, you can create a layer that uses your own geometry encoder (or
the SimpleGraphEncoder I referenced above, if you use the same graph
structure) and your own domain model will be expressed as LineString
geometries and you can perform spatial operations on them.

Alternatively, if your data is more static in nature, and you are analysing
only what the person did in the past, and the graph will therefor not
change, perhaps you do not care to store the locations in the graph, and you
can just import them as a LineString directly into a standard layer.

Whatever route you take, the final action you want to perform is to find
points near the LineString (path the person took). I do not think the
bounding box is the right approach for that either. You need to try, for
example, the method findClosestEdges in the utilities class at
https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115

https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115This
method can find the part of the persons path that it closest to the point of
interest. There also also many other geographic operations you might be
interested in trying, once you have a better feel for the types of queries
you want to ask.

Regards, Craig

On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote:

 Thanks for the detailed response! Here is what I'm trying to do and I'm
 still not sure how to accomplish it:

 1. I have a node which is a person

 2. I have geo data as that person moves around the world

 3. I use the geodata to create a bounding box of where that person has been
 today

 4. I want to say, was this person A near location X today?

 5. I do this by seeing if location X is in A's bounding box.

 From looking at what you suggest doing, it's not clear how I assign the
 node
 person A to a layer? Is it that the bounding box is now in the layer and
 not
 in the node? The issue then becomes, how od I associate the two as the
 RTree
 relationship seems to establish itself on the bounding box between the node
 and the layer.

 Many thanks for your patience as I learn this challenging material.

 On Tue, Jun 7, 2011 at 4:13 PM, Craig Taverner cr...@amanzi.com wrote:

  I think you need to differentiate the bounding boxes of the data in the
  layer (stored in the database), and the bounding box of the search query.
  The search query is not stored in the database, and will not be seen as a
  node or nodes in the database. So if you want to search for data within
  some
  bounding box or polygon, then express that in the search query, and you
 do
  not need to care about how your nodes are stored in the database.
 
  So when you say you want to make a larger bounding box, I assume you are
  talking about the query itself. The REST API has the method
  findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and
  you
  can set those to whatever you want for your query.
 
  The REST API also exposes the CQL query language supported by GeoTools.
  This
  allows you to perform SQL-like queries on geometries and feature
  attributes.
  For example, you can search for all objects within a specific polygon
 (not
  just a rectangular bounding box), as well as conforming to certain
  attributes. See
 
 http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.htmlfor
  some examples of CQL.
 
  However, our current CQL support is not fully integrated with the RTree
  index. This means that the CQL itself will not benefit from the index,
 but
  be a raw search. You can, however, still get the benefit of the index by
  passing in the bounding box separately. So, for example, you want to
 search
  for data in a polygon. Make the polygon object, get it's bounding box and
  also the CQL query string. Then make a 'dynamic layer' using the CQL
 (which
  is a bit like making a prepared statement

Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-07 Thread Craig Taverner
Hi,

The bounding boxes are used by the RTree index, which is a typical way to
index spatial data. For Point data, the lat/long and the bounding box are
the same thing, but for other shapes (streets/LineString and Polygons), the
bounding box is quite different to the actual geometry (which is not just a
single lat/long, but a set of connected points forming a complex shape).

The RTree does not differentiate between points and other geometries,
because it cares only about the bounding box, and therefor we provide that
even for something as simple as a Point.

Does that answer the question?

Regards, Craig

On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Greetings!

 Perhaps someone using neo4j-spatial can answer this seemingly simple
 question. Nodes classified into layers have both lat/lon properties and
 bounding boxes, the bounding box seems to be required to establish the
 relationship between node and layer, however the node is not found if the
 lat/lon does not match the query. Can someone explain the relationship
 between these two properties on a node?

 Many thanks!
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] neo4j spatial bounding box vs. lat/lon

2011-06-07 Thread Craig Taverner
I think you need to differentiate the bounding boxes of the data in the
layer (stored in the database), and the bounding box of the search query.
The search query is not stored in the database, and will not be seen as a
node or nodes in the database. So if you want to search for data within some
bounding box or polygon, then express that in the search query, and you do
not need to care about how your nodes are stored in the database.

So when you say you want to make a larger bounding box, I assume you are
talking about the query itself. The REST API has the method
findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you
can set those to whatever you want for your query.

The REST API also exposes the CQL query language supported by GeoTools. This
allows you to perform SQL-like queries on geometries and feature attributes.
For example, you can search for all objects within a specific polygon (not
just a rectangular bounding box), as well as conforming to certain
attributes. See
http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.html for
some examples of CQL.

However, our current CQL support is not fully integrated with the RTree
index. This means that the CQL itself will not benefit from the index, but
be a raw search. You can, however, still get the benefit of the index by
passing in the bounding box separately. So, for example, you want to search
for data in a polygon. Make the polygon object, get it's bounding box and
also the CQL query string. Then make a 'dynamic layer' using the CQL (which
is a bit like making a prepared statement). Then perform the same
'findGeometriesInLayer' method mentioned above, using the bounding box and
the dynamic layer (containing the CQL). This has the effect of using the
RTree index for a first approximate search, followed by pure CQL for the
final mile.

See examples of this in action in the Unit tests in the source code.
https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/ServerPluginTest.java#L109
has
examples of CQL queries on the REST API.

On Tue, Jun 7, 2011 at 5:48 PM, Boris Kizelshteyn bo...@popcha.com wrote:

 Thanks! So it seems you are saying that the bounding box represents a
 single
 point and is the same as the lat/lat lon? What if I make the bounding box
 bigger? What I am trying to do is geo queries against a bounding box made
 of
 a set of points, rather than individual points. So the query is, find the
 nodes where the given point falls inside their bounding boxes. Can I do
 this
 with REST?

 Thanks!

 On Tue, Jun 7, 2011 at 11:34 AM, Craig Taverner cr...@amanzi.com wrote:

  Hi,
 
  The bounding boxes are used by the RTree index, which is a typical way to
  index spatial data. For Point data, the lat/long and the bounding box are
  the same thing, but for other shapes (streets/LineString and Polygons),
 the
  bounding box is quite different to the actual geometry (which is not just
 a
  single lat/long, but a set of connected points forming a complex shape).
 
  The RTree does not differentiate between points and other geometries,
  because it cares only about the bounding box, and therefor we provide
 that
  even for something as simple as a Point.
 
  Does that answer the question?
 
  Regards, Craig
 
  On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com
  wrote:
 
   Greetings!
  
   Perhaps someone using neo4j-spatial can answer this seemingly simple
   question. Nodes classified into layers have both lat/lon properties and
   bounding boxes, the bounding box seems to be required to establish the
   relationship between node and layer, however the node is not found if
 the
   lat/lon does not match the query. Can someone explain the relationship
   between these two properties on a node?
  
   Many thanks!
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC

2011-06-07 Thread Craig Taverner
Done.

Although now we have 20 lines of comments for 1 line of method code.
Previously we had 4 lines of comments for one line of code. Whew!

On Tue, Jun 7, 2011 at 11:02 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Very cool.
 Maybe you could just doc the parameters more than pointing to the Oracle
 reference, so one can see it directly in the JavaDoc?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Thu, Jun 2, 2011 at 2:13 PM, Craig Taverner cr...@amanzi.com wrote:

  Hi,
 
  Recently someone asked a question on StackOverflow, if Neo4j Spatial was
  capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT
  specifically. Since this is related to the ongoing GSoC projects for
 Neo4j
  Spatial, I thought I would do a quick investigation. What I found was
 that
  the requested capabilities are available in JTS (which we include in
 Neo4j
  Spatial), but with very different names. The code to achieve this in JTS
 is
  'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have
  wrapped these in the
  SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is
  accessible together with some other spatial topology functions, and also
  looks more like the Oracle function.
 
  I pushed this to github, and think it can be included as a prototype for
  the
  discussions for the GSoC on Geoprocessing.
 
  Regards, Craig
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #2

2011-06-07 Thread Craig Taverner
I suggest you code review them first. Especially since there are API
changes.

On Tue, Jun 7, 2011 at 10:11 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Very nice Andreas!

 You consider it safe to pull these changes into the main repo?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Sun, Jun 5, 2011 at 1:39 PM, Andreas Wilhelm a...@kabelbw.de wrote:

  Hi,
 
  This week I implemented update and search capability for spatial
  functions and following spatial functions with JUnit tests:
 
  ST_AsText, ST_AsKML, ST_AsGeoJSON, ST_AsBinary and ST_Reverse.
 
 
  Best Regards
 
  Andreas
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] [SoC] Re: GSoC 2011 Weekly report - OSM data mining and editing capabilities in uDig and Geotools

2011-06-05 Thread Craig Taverner
Hi Mirco,

Sounds like progress. Some suggestions:

   - I do not think you need to change the code for neo4j and udig, but only
   for neo4j-spatial and udig-community/neo4j. It is OK to make clones of those
   so you have the code for review, but they are quite core, and you should not
   need to actually change them.
   - Focus on neo4j-spatial and udig-community/neo4j, which are the two
   projects you will certainly make changes to. All uDig GUI changes can be
   made in udig-community/neo4j.
   - You might even want to make a new udig plugin in a new git project,
   perhaps udig-community/osm, for the OSM editor work. The neo4j plugin would
   provide the communication layer for neo4j and any neo4j data sources, while
   the OSM plugin would provide OSM specific features, including the additional
   views and editors required to support a complete 'OSM Editor' capability.

Regards, Craig

On Sun, Jun 5, 2011 at 1:51 AM, Mirco Franzago mircofranz...@gmail.comwrote:

 Weekly report #2
 ==What I did==
 - The main work was to set-up the whole devel enviroment: eclipse + udig +
 neo4j.
 - I forked the repository on github for my code: [0], [1] and [2] are 
 respectively
 the repositories for udig, neo4j and neo4j-spatial.
 - The target was to have eclipse with the udig sdk took from github, just
 as neo4j, to be able to commit the udig code and the neo4j code from the
 same envoroment.
 - I set-up the apache maven tool and the e-git plugin to be able to use
 them directly from eclipse.
 - After these steps and some fighting against the jars to import it was
 possible to execute udig with the neo4j plugins and to test the main
 functionalities.
 - I started the code analysis to understand where put my hands next week
 :-)

 ==Next week plan==
 - Fix some last problems for a new git user with the commit command.
 - Finally start the real coding after the initially head-cracking
 problems.

 [0] https://github.com/mircofranzago/udig-platform
 [1] https://github.com/mircofranzago/neo4j
 [2] https://github.com/mircofranzago/neo4j-spatial




 2011/5/31 Mirco Franzago mircofranz...@gmail.com

 Hi all,
 I am Mirco Franzago and I started to work to my google summer of code 2011
 project. I weekly will update this thread to let the community know about
 the work done and the work that will do.
 Last week I could not to do much cause I was very busy for my last exam
 before summer. Now I'm ready to start for this new job.



 ___
 SoC mailing list
 s...@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/soc


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC

2011-06-02 Thread Craig Taverner
Hi,

Recently someone asked a question on StackOverflow, if Neo4j Spatial was
capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT
specifically. Since this is related to the ongoing GSoC projects for Neo4j
Spatial, I thought I would do a quick investigation. What I found was that
the requested capabilities are available in JTS (which we include in Neo4j
Spatial), but with very different names. The code to achieve this in JTS is
'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have
wrapped these in the
SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is
accessible together with some other spatial topology functions, and also
looks more like the Oracle function.

I pushed this to github, and think it can be included as a prototype for the
discussions for the GSoC on Geoprocessing.

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] path finding using OSM ways

2011-05-31 Thread Craig Taverner
Hi Bryce,

Nice to see you back.

The OSM data model in Neo4j-Spatial, created by the OSMImporter, is designed
to mimic the complete contents of the XML files provided for OSM. As it is,
this is not ideal for routing because it traces the complete set of nodes
for the ways, while for routing you really want a graph that connects each
waypoint by a single relationship. So, if I were to perform routing on top
of the OSM model, I would actually build an overlap graph that just connects
the waypoints. The current model has a vertex called a 'way', but that is
not a way-point, because it represents the entire way (eg. a street). We
would need to do the following:

   - Identify ways that are streets (as opposed to non-routing types like
   regions, buildings, lakes, etc.)
   - Identify the points that are intersections (way-points)
   - Create a way-point node for these
   - Add relationships between way points if they are connected by streets
   in the OSM model
   - Weight the relationships by the length of the streets
   - Then apply the A* algorithm (which I have no experience with myself,
   but others in neo4j certainly do)

I think everything but the last part would be very easy to add to the
OSMImporter itself, so that the routing graph exists in any OSM model. Today
it does not exist, and routing would be more difficult and expensive (since
you would have to traverse a much more complex graph, unnecessarily).

Regards, Craig

On Tue, May 31, 2011 at 4:31 AM, bryce hendrix brycehend...@gmail.comwrote:

 I am finally getting back to experimenting with Neo4j. Because it has been
 a
 while since I last looked at it, I've forgotten just about everything. I
 want to start with something simple, is there any sample code which does A*
 path finding over OSM ways?

 Thanks,
 Bryce
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Embedded with webadmin

2011-05-25 Thread Craig Taverner
While HA is one option, with two processes 'sharing' a database, one being
the server and the other the embedded app, there is another option, and that
is to integrate the two apps. If your app is a web-app and also needs to
exist in something like jetty or winstone, perhaps you could run both the
server and your app together in the same process? One obvious way of doing
this is to write your app as a server extension within the neo4j-server
extensions API. I suspect there are other ways to do this where your app is
in control and simply accesses (and starts) the relevant code from
neo4j-server, but I don't know how to do that. Could be interesting to find
out.

On Tue, May 24, 2011 at 11:39 PM, Adriano Henrique de Almeida 
adrianoalmei...@gmail.com wrote:

 Yep,

 the neo4jserver is just a rest api over neo4j database, so it's still
 stored
 in at the disk. So, all you need to do, is to point your java application
 to
 the neo4j db directory.

 Remember, that you'll be unable to start both you app and the neo4j server
 at the same time, at the same database. For this situation, you'll need
 Neo4j HA.

 Regards

 2011/5/24 Chris Baranowski pharcos...@gmail.com

  Hi all,
 
  I searched this mailing list some but couldn't find a definitive answer:
  is it possible to use the web admin with an embedded neo4j database? I'd
  like to run embedded in my project and also be able to administrate
 online.
 
  Thanks!
  Chris
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Adriano Almeida
 Caelum | Ensino e Inovação
 www.caelum.com.br
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] finding nodes between two nodes

2011-05-18 Thread Craig Taverner
If you remove the depth=1, and specify the direction, you can get to the
excluded dishes in one traversal:

relationships = [{type= answered, direction =
outgoing}, {type= excludes, direction = outgoing}]


That will simplify the code a lot.

It does not get to the safe dishes in one traversal, but at least it moved
three down to two.

On Wed, May 18, 2011 at 3:11 PM, noppanit noppani...@gmail.com wrote:

  Customer Menu
| |
[rel:customer]   |
| |
   Joy  / \
| [rel:answered]  /\ [rel:dish]
  Nut Allergy / Pasta
 |   /
  [rel:excludes] | /[rel:dish]
 |   /
   Nut Salad---|

 So basically this is what I wanted to do and this is the shorter version of
 my graph. I want to get all the dishes that Joy is not allergic to. So, at
 the end I want to get Pasta which Joy is not allergic to, because Joy
 answered the question that she is allergic to Nut, and Nut salad contains
 Nut.

 The solution that I'm using right now is. Traverse to get all the dishes
 from the menu node and store it to one array. And the second traverse I get
 all the dishes that Joy is allergic to by traversing all the answered
 relationship from Joy and traversing again by excludes relationship to
 get
 all the dishes that Joy is allergic to. Then I differentiate the two arrays
 to get all the dishes that Joy can eat. I was wondering that could I get
 all
 the dishes that Joy could eat in just one traversal? And this is the code
 that I use to do that. I'm using Ruby and Neography.

  def get_excluded_dishes_for_customer(customer_name)
customer = neo.get_index('customersIndex', 'name', customer_name)
answered_dishes = neo.traverse(customer, nodes, {relationships =
 [{type= answered, direction = all}], depth = 1})

@array_of_answered_dishes = prepares_data(answered_dishes)

@array_of_excluded_dishes = Array.new

@array_of_answered_dishes.each do |answered_dish|
  a_dish = neo.get_index('fredsIndex', 'name', answered_dish[:text])
  excluded_dishes = neo.traverse(a_dish, nodes, {relationships =
 [{type= excludes, direction = all}], depth = 1})

  prepared_excluded_dishes = prepares_data(excluded_dishes)
  prepared_excluded_dishes.each do |text|
@array_of_excluded_dishes  text[:text]
  end
end

#Onle unique dishes
@array_of_excluded_dishes = @array_of_excluded_dishes.uniq

@all_dishes = prepares_data(get_all_dishes)
@only_dish_names = Array.new

@all_dishes.each do |text|
  @only_dish_names  text[:text]
end

@array_of_excluded_dishes = @only_dish_names - @array_of_excluded_dishes

return @array_of_excluded_dishes
  end

 Thank you very much.

 --
 View this message in context:
 http://neo4j-user-list.438527.n3.nabble.com/Neo4j-finding-nodes-between-two-nodes-tp2938387p2956858.html
 Sent from the Neo4J User List mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Color suggestions for the Self-Relationship bike shed

2011-05-17 Thread Craig Taverner
What about a system config enabling/disabling loops? Then we could have
option 1, but for people that never loops, they can still get the extra loop
check by setting the system config option.

On Tue, May 17, 2011 at 2:01 AM, Stephen Roos sr...@careerarcgroup.comwrote:

 We are not going to use loops, but would still vote for #1.  Checking
 against loops seems more like a business logic responsibility that Neo4j
 clients should be responsible for.

 -Original Message-
 From: Tobias Ivarsson [mailto:tobias.ivars...@neotechnology.com]
 Sent: Monday, May 16, 2011 7:02 AM
 To: Neo user discussions
 Subject: Re: [Neo4j] Color suggestions for the Self-Relationship bike shed

 Does anyone NOT planning to use loops have an opinion in the matter?
 That would be very valuable input.

 Cheers,
 --
 Tobias Ivarsson tobias.ivars...@neotechnology.com
 Hacker, Neo Technology
 www.neotechnology.com
 Cellphone: +46 706 534857

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Timeline index

2011-05-09 Thread Craig Taverner
Very good points.

But I must admit that there is a demand for automatic indexing. I personally
am not using it, but I would like prepared indexes, indexes that can be
configured up front and then just add the node. I see your point about this
implying more schema (in the index preparation), but I do not see that as
avoidable.

I think (or hope) that for automatic indexes, the criteria for how a node
qualifies for indexing would be defined by the developer, hopefully with
code, so it can be very general and flexible. For example, I guess that
whenever a node is added to the graph, an event is triggered to pass the
node to any listeners that look for patterns to match. For performance I
guess there should be some simple patterns like the existence of some
property to index, but it would be good if the user can define the code to
be called, so more complex cases can be considered, like exploring the local
sub-graph and indexing based on some more complex criteria. Certainly the
user will then have the power to hurt performance, but that is currently the
case anyway :-)

On Mon, May 9, 2011 at 8:07 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Automatic indexes could be a very nice feature, though personally I would
 very much like to maintain the ability to manually index nodes and
 relationships. There are situations where I store a different value in a
 property than I store in the index (string properties containing html tags,
 but indexes that store those same values with the html tags stripped). There
 are also situations where the indexed node is not the node that actually
 contains the property being indexed (eg. in quad-store layout, a value node
 contains the property, but the related node is used in the index). I can
 also conceive of indexes where there is not even a stored property value
 involved.
 Having an automatic index would certainly make things easier in some
 scenarios, but it's not easy to create an automatic indexing mechanism that
 works for all possible use cases.
 I am also a little bit concerned about such a feature, because it would
 result in schema-creep. One of the most powerful features I find in Neo4J is
 how storage and schema are completely independent. In fact the store can be
 used without any schema at all, while at the same time the store can be used
 to persist a schema if that is needed.
 One of the things I disliked about table based databases is the mixing of
 storage and schema. It is impossible to define an entity without defining a
 table, which immediately creates a schema entity. Having strict separation
 of storage and schema is one of the reasons NOSQL databases are so flexible.
 Such separation makes it possible to invent different types of schemata for
 different use cases.
 When I still used relational databases, I always ended up replicating the
 schema facility of the underlying database to add more meta information to
 the database. Being able to roll my own schema facility is therefore one of
 the key features that made Neo4J such an attractive option. If more schema
 facilities would eventually creep into the kernel, those advantages would
 slowly dissipate.

  Date: Mon, 9 May 2011 18:34:10 +0200
  From: cr...@amanzi.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Timeline index
 
  +10 for both if Neils responses. I think both external and in-graph
 indexes
  should be supported.
 
  The last time I talked to Mattias about this it sounded like the only
 really
  clean option for integrating them behind one API would be once automatic
  indexes are supported, because at that point indexes get configured
 up-front
  (like the BTree and RTree) and then simply used (behind the scenes in
  automated indexes). I'm hoping automatic indexes are planned for 1.4,
 then
  all of this can come together :-)
 
  On Mon, May 9, 2011 at 3:14 PM, Niels Hoogeveen
  pd_aficion...@hotmail.comwrote:
 
  
   Rick, I am looking forward to the results of your investigation. I see
 a
   need for both external search mechanisms (Lucene, and possible Solr),
 as
   well as in-graph search mechanisms based on constrained traversals (eg.
   Timeline index based on a Btree and the Rtree index used in
 neo4j-spatial).
   Any progress in either direction is most welcome.
  
From: rick.bullo...@thingworx.com
To: matt...@neotechnology.com; user@lists.neo4j.org
Date: Mon, 9 May 2011 03:57:13 -0700
Subject: Re: [Neo4j] Timeline index
   
Niels/Mattias: we are also exploring a Solr implementation for the
 index
   framework.  There are some potential benefits using Solr in a large
   graph/HA/distributed scenario that we are investigating.  The tough
 part is
   the distributed transactioning, though.
   
   
- Reply message -
From: Mattias Persson matt...@neotechnology.com
Date: Mon, May 9, 2011 6:14 am
Subject: [Neo4j] Timeline index
To: Neo4j user discussions user@lists.neo4j.org
   
2011/4/12 Niels Hoogeveen 

Re: [Neo4j] Timeline index

2011-05-09 Thread Craig Taverner
I'm confident that given the history of neo4j, there will be no forcing of a
schema :-)

And I'm thinking of previous developments that added convenience and value,
like jo4neo, neo4j.rb, even the meta-model. Useful, but no-one was ever
forced or even pushed to use them. I hope the new automatic indexing will be
likewise a convenient alternative to consider.

On Mon, May 9, 2011 at 10:17 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 Mattias/Craig,
 Of course I don't want to deny people the opportunity to have easy indexing
 features, as long as it remains optional and doesn't lead to schema-creep
 into the Neo4j kernel.
 Having configurable event handlers that allow for automatic indexing, while
 maintaining the possibility to manually maintain indices sounds like a
 reasonable solution.
 Over the last year I have dedicated many hours to create my own schema
 driven CMS in Neo4J, which makes me vigilant to make sure the Neo4j kernel
 remains as schema-less as possible. see also:
 http://lists.neo4j.org/pipermail/user/2011-May/008431.html
 Adding schema/type/class information to Neo4j is likely to be much in
 demand the bigger applications grow, and I applaud all developments in those
 directions, as long as they remain optional. The schema needs for my
 application may differ very much from the schema needs in other
 applications, making it important not to add too many assumptions in the
 neo4j kernel. Having property keys and relationship labels is, as far as I
 am concerned the right dose of schema at the kernel level.


  Date: Mon, 9 May 2011 20:50:56 +0200
  From: matt...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Timeline index
 
  2011/5/9 Niels Hoogeveen pd_aficion...@hotmail.com
 
  
   Automatic indexes could be a very nice feature, though personally I
 would
   very much like to maintain the ability to manually index nodes and
   relationships. There are situations where I store a different value in
 a
   property than I store in the index (string properties containing html
 tags,
   but indexes that store those same values with the html tags stripped).
 There
   are also situations where the indexed node is not the node that
 actually
   contains the property being indexed (eg. in quad-store layout, a value
 node
   contains the property, but the related node is used in the index). I
 can
   also conceive of indexes where there is not even a stored property
 value
   involved.
   Having an automatic index would certainly make things easier in some
   scenarios, but it's not easy to create an automatic indexing mechanism
 that
   works for all possible use cases.
   I am also a little bit concerned about such a feature, because it would
   result in schema-creep. One of the most powerful features I find in
 Neo4J is
   how storage and schema are completely independent. In fact the store
 can be
   used without any schema at all, while at the same time the store can be
 used
   to persist a schema if that is needed.
   One of the things I disliked about table based databases is the mixing
 of
   storage and schema. It is impossible to define an entity without
 defining a
   table, which immediately creates a schema entity. Having strict
 separation
   of storage and schema is one of the reasons NOSQL databases are so
 flexible.
   Such separation makes it possible to invent different types of schemata
 for
   different use cases.
   When I still used relational databases, I always ended up replicating
 the
   schema facility of the underlying database to add more meta information
 to
   the database. Being able to roll my own schema facility is therefore
 one of
   the key features that made Neo4J such an attractive option. If more
 schema
   facilities would eventually creep into the kernel, those advantages
 would
   slowly dissipate.
  
 
  These issues with automatic indexing are exactly those that I struggle
 with
  when I try to get my head around automatic indexing. At its core I don't
  like it, because it takes away control, but for 80% of the use cases I
 think
  it'd be useful. I don't think that neo4j will ever be strict schematic in
  any way, although some inferred types could possibly be implemented in
 some
  way, via TransactionEventHandlers.
 
  A couple of months ago I played around with auto indexing as a lab
 project
  and ended up with the exact same solution that Craig just replied with.
 So
  I'd say that or the middle way of preconfiguring indexes up front
 covers
  would pretty much make most people happy IMHO.
 
  Date: Mon, 9 May 2011 18:34:10 +0200
 
From: cr...@amanzi.com
To: user@lists.neo4j.org
Subject: Re: [Neo4j] Timeline index
   
+10 for both if Neils responses. I think both external and in-graph
   indexes
should be supported.
   
The last time I talked to Mattias about this it sounded like the only
   really
clean option for integrating them behind one API would be once
 automatic
indexes are 

Re: [Neo4j] First-class type property on relationships but not nodes; why?

2011-05-05 Thread Craig Taverner
Another view of things would be to say that ideally there should be no first
class type on either relationships or nodes, since that is a domain specific
concept (as Neils says he wants two types, but Rick wants one, and some
object models type nodes by relating them to a separate node representing a
class).

Then the addition of a type to a relationship is, in my opinion, a
performance optimization for many graph algorithms since the traverser will
perform well if it has 'first class' access to this information, instead of
hitting the property store. I guess this is my take on Tobias point that the
type is a navigational feature.

Now I wonder if the traverser, and many known graph algorithms, would be
possible to make faster or easier to code if the nodes also had first class
types? I don't know the answer to this, but assume that if it did really
help, it would have been done already ;-)

On Thu, May 5, 2011 at 3:52 PM, Niels Hoogeveen
pd_aficion...@hotmail.comwrote:


 The meta model component (though in need of some attention), already allows
 the typing of a node. An important difference between the typing in the meta
 model component and the suggestion made in this thread is the fact that a
 node according to the meta model can have more than one type, while the
 RelationshipType in kernel only allows one type per relationship.
 For my modeling need, the ability to assign more than one type per node is
 essential. Adding a singular type at kernel level would only make things
 more confusing.
 I would go further than what Tobias says and would say RelationshipType is
 nothing but a name, just like various properties have names. Types would
 require much more information, like cardinality, source/target constraints
 etc. Those are all part of the meta model where they belong.

  From: tobias.ivars...@neotechnology.com
  Date: Thu, 5 May 2011 15:33:04 +0200
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] First-class type property on relationships but not
 nodes; why?
 
  The RelationshipType isn't a type. It is a navigational feature.
 
  I've slapped this link around for a few years now, every time this
 question
  has been brought up:
  http://lists.neo4j.org/pipermail/user/2008-October/000848.html
 
  The fact that RelationshipType is a navigational feature and not a type
  means that there is in fact already a corresponding thing for nodes: the
  Indexes.
 
  But I agree that there would be things we could gain by adding a Type
  concept to Nodes. Such as for example better automatic indexing. But I
 don't
  know what it would look like. And I want it to be clear that such a
 feature
  is very different from what RelationshipType is today.
 
  Cheers,
  Tobias
 
  On Thu, May 5, 2011 at 10:29 AM, Aseem Kishore aseem.kish...@gmail.com
 wrote:
 
   I've found it interesting that Neo4j has a mandatory type property on
   relationships, but not nodes. Just curious, what's the reasoning behind
 the
   design having this distinction?
  
   If you say you need to know what type of relationship these two nodes
   have, I would reply, don't you also need to know what type of nodes
 they
   are, as well?
  
   Similarly, if you say because there can be many different types of
   relationships, I would reply, there can also be many different types
 of
   nodes, and in both cases, there doesn't need to be.
  
   A perfect example is in the documentation/tutorial: movies and actors.
 Just
   the fact that we talk about the nodes in the database as movies and
   actors -- wouldn't it be helpful for the database to support that
   categorization first-class?
  
   To be precise, it's easy for us to add a type property to nodes
 ourselves
   (we do in our usage), but it's not a first-class property like
   relationships, where queries and traversals can easily and naturally
   specify
   the type or types they expect.
  
   Thanks!
  
   Aseem
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
 
 
  --
  Tobias Ivarsson tobias.ivars...@neotechnology.com
  Hacker, Neo Technology
  www.neotechnology.com
  Cellphone: +46 706 534857
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Examples of multiple indices in use?

2011-05-05 Thread Craig Taverner
This is how we use it, for performance, since some data will be much more
dense than other data, we don't want the index lookup of the sparse data to
be impacted by the dense data, we make separate indexes.

On Thu, May 5, 2011 at 3:47 PM, Peter Hunsberger peter.hunsber...@gmail.com
 wrote:

 On Thu, May 5, 2011 at 4:16 AM, Mattias Persson
 matt...@neotechnology.comwrote:

  2011/5/5 Aseem Kishore aseem.kish...@gmail.com
 
   Interesting. That's assuming a person and an organization can share the
   same
   name. Maybe an edge case in this example, but I can understand. Thanks.
  
 
  Hmm, no not share the same name, but have the name property in
 common...
  would you really like to ask an index question for a name and get back
 both
  persons and organisations mixed in the result? You may, but in many cases
  you wouldn't... or?
 
  The sepazration of indexes can also give you better performance.  For
 example, consider the case that you have 1000 organizations and 200
 people.  You really don't want to have to search the index for all 200
 people just to find 1 organization
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Lucene/Neo Indexing Question

2011-05-02 Thread Craig Taverner
Thinking back you your original domain description, cars with colors, surely
you have more properties than just colors to index?

If you have two or more properties, then you use combinations of properties
for the first level of the index tree, which provides your logical
partitioning of supernodes in a domain specific way. For example,
considering having the four properties color, manufacturer, model, year. The
first level of index nodes would be the set of unique combinations of all
possible properties (all existing combinations, actually). This set is much
larger than the set of colors. So red will occur many times. As a result you
dramatically reduce node contention, and the number of relationships per
node is much less. Then if you want to perform the query for all red cars,
actually your traverser needs to be only slightly more complex, basically
'find all cars with color red and any value of the other properties'.

This is the design of the 'amanzi-index' I started on github in December
(but did not complete). It was focusing on doing queries on multiple
properties at the same time, but does effectively cover your case of
reducing node contention, if you can add more properties to the index. It
also has the concept of a mapper from the domain specific property to the
index key, which was designed to reduce the number of index nodes, but in
your case you could also use it to increase the number of index nodes, using
some of the ideas by Jim and Michael. Jim suggested that instead or 'red'
always mapping to the same node, it could map to a set of different nodes
(randomly selected, or round robin). Michael discussed a distributed
hash-code, which I do not fully understand, but it does sound relevant :-)

So, in short, using the design of the amanzi-index you could help this
problem in two ways:

   - index together with other properties to get a domain-specific
   partitioning of the 'supernodes'
   - Add a mapper between the color and the index key to get partitioning of
   the supernodes


On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

 Hi, Michael.

 The nature of the domain model really doesn't lend itself to any logical
 partioning of supernodes, so it would indeed have to be something very
 arbitary/random.

 For now, I think we will have to either deal with the performance issues or
 switch to using Lucene for the indexing, but we can't do that yet until we
 have the ability to query the list of terms for a given key (which is a
 necessary function in our domain model).  We could perhaps keep a list of
 terms as nodes *and* index them, but that seems redundant.

 Ultimately, I think the solution is to hide the complexity via the indexing
 framework and to offer a variety of in-graph indexing models that address
 specific types of domain requirements.

 Rick

 
 From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Michael Hunger [michael.hun...@neotechnology.com]
 Sent: Monday, May 02, 2011 3:49 AM
 To: Neo4j user discussions
 Subject: Re: [Neo4j] Lucene/Neo Indexing Question

 Perhaps then it is sensible to introduce a second layer of nodes, so that
 you split down your supernodes and distribute the write contention?

 Would be interesting if putting a round robin on that second level of color
 nodes would be enough to spread lock contention?

 This is what peter talks about in his activity stream update scenario.

 And in general perhaps a step to a more performant in-graph index.

 When thinking about in-graph indexes I thought it might perhaps be
 interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes
 then having from the index-root node relationships with the (re-distributed)
 hashcode  (x-1) relationship-types to the bucket nodes and below the bucket
 node rels with the concrete value as an relationship attribute to the
 concrete nodes.

 I think this will be addressed even better with Craig's indexes or the
 Collection abstractions that Andreas Kollegger is working on.

 Cheers

 Michael

 Am 02.05.2011 um 12:16 schrieb Rick Bullotta:

  Hi, Niels.
 
  That's what we're doing now, but it has performance issues with large #'s
 of relationships when cars are constantly being added, since the color
 nodes become synchronization bottlenecks for updates.
 
  Rick
 
  
  From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On
 Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com]
  Sent: Sunday, May 01, 2011 9:41 AM
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] Lucene/Neo Indexing Question
 
  One option would be to create a unique value node for each distinct color
 and create a relationship from car to that value node. The value nodes can
 be grouped together with relationships to some reference node.
 
  This gives the opportunity of finding all distinct colors, and it allows
 you to find all cars with that 

[Neo4j] Geoprocessing with Neo4j Spatial and OSM

2011-05-02 Thread Craig Taverner
Hi all,

I have applied to FOSS4G to talk about Geoprocessing with Neo4j Spatial and
OSM. This talk will include the new work we've done on the open street map
model. In addition, we got two GSoC students this year, on related projects
OSM Editor and Geoprocessing with OSM, and so they are likely to
contribute some interesting new content as well.

If you are interested in graph databases in GIS, OSM or geoprocessing,
consider voting for my talk at http://community-review.foss4g.org/. I have
included the abstract of the talk below.

Regards, Craig

-
What better way to perform geoprocessing than on a graph! And what better
dataset to play with than Open Street Map!
Since we presented Neo4j Spatial at FOSS4G last year, our support for
geoprocessing functions and for modeling, editing and visualization of OSM
data has improved considerably. We will discuss the advantages of using a
graph database for geographic data and geoprocessing, and we will
demonstrate this using the amazing Open Street Map data model.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] New blog post on non-graph stores for graph-y things

2011-04-26 Thread Craig Taverner

 On foreign key I think it was a subconscious choice to avoid it, since it
 has very strong semantics in other data models. I wanted to try to convey
 the concept of pointers without muddying that with the stricter semantics
 of foreign keys and referential integrity.


Perhaps I'm over-optimistic, but I would like to find some common
terminology we could use when describing the differences between different
databases types. It really helps people understand a new database if you can
compare and contrast the finer details and subtle differences. I have found
using the term 'foreign key' effective precisely because it brings to mind
the rdbms approach, and helps the user see a mapping between modeling in
rdbms and modeling in graphs.

But I agree that 'foreign key' brings other aspects that may not be
appropriate, and so a more general term would be better. You say 'pointer',
but that to my mind is an aspect of a foreign key and a relationship/edge.
Perhaps there is no single magic word, and we have to pick and choose to
suite the circumstances ;-)
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] New blog post on non-graph stores for graph-y things

2011-04-24 Thread Craig Taverner
Hi Jim,

As always, I enjoyed reading your blog. It was well written and made the
point (including even a plug in the last line ;-)

While Aseem's observations are valid, I think you handled it correctly, with
the product recall example being only a relatively small win for graph
databases, and moving to to the bigger wins with deeper traversals.

One thing I realized while reading it was that, although you did not
emphasize it, the example of the Document store was actually an example of
the use of foreign keys, and is applicable to all non-graph databases,
including relational databases.

I wonder, is the use of the term 'foreign key' applicable to all these
cases? I think so, but have not found the term used much (and not in the
blog). I think of a foreign key as being the reference from one object (or
table, or kv entry, or document) to another. So your documents containing
'friend:other', where, to my mind, using foreign keys. I feel the
distinctive difference between a foreign key and a true graph is the need
for an index to provide performance in the join. Graphs have two advantages
over foreign keys, one is that the relationships can be traversed in both
directions, removing the need for the complementary foreign key Aseem
describes, and another is that the performance of a local graph traversal is
so high (implicit local index), that no index is required to traverse. I
think both these points are described in your blog entry, although in
different terms.

Regards, Craig

On Thu, Apr 21, 2011 at 7:19 PM, Jim Webber j...@neotechnology.com wrote:

 Hi guys,

 A while ago we were discussing using non-graph native backend for graph
 operations. I've finally gotten around to writing up my thoughts on the
 thread here:

 http://jim.webber.name/2011/04/21/e2f48ace-7dba-4709-8600-f29da3491cb4.aspx

 As always, I'd value your thoughts and feedback.

 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-22 Thread Craig Taverner

 Good catch, forgot to add the in-graph representation of the results to my
 mail, thanks for adding that part. Temporary (transient) nodes and
 relationships would really rock here, with the advantage that with HA you
 have them distributed to all cluster nodes.
 Certainly Craig has to add some interesting things to this, as those
 resemble probably his in graph indexes / R-Trees.


I certainly make use of this model, much more so for my statistical analysis
than for graph indexes (but I'm planning to merge indexes and statistics).

However, in my case the structures are currently very domain specific. But I
think the idea is sound and should be generalizable. What I do is have a
concept of a 'dataset' on which queries can be performed. The dataset is
usually the root of a large sub-graph. The query parser (domain specific)
creates a hashcode of the query, checks if the dataset node already has a
resultset (as a connected sub-graph with its own root node containing the
previous query hashcode), and if so return that (traverse it), otherwise
perform the complete dataset traversal, creating the resultset as a new
subgraph and then return it. This works well specifically for statistical
queries, where the resultset is much smaller than the dataset, so adding new
subgraphs has small impact on the database size, and the resultset is much
faster to return, so this is a performance enhancement for multiple requests
from the client. Also, I keep the resultset permanently, not temporarily.
Very few operations modify the dataset, and if they do, we delete all
resultsets, and they get re-created the next time. My work on merging the
indexes with the statistics is also planned to only recreate 'dirty' subsets
of the result-set, so modifying the dataset has minimal impact on the query
performance.

After reading Rick's previous email I started thinking of approaches to
generalizing this, but I think your 'transient' nodes perhaps encompass
everything I thought about. Here is an idea:

   - Have new nodes/relations/properties tables on disk, like a second graph
   database, but different in the sense that it has one-way relations into the
   main database, which cannot be seen by the main graph and so are by
   definition not part of the graph. These can have transience and expiry
   characteristics. Then we can build the resultset graphs as transient graphs
   in the transient database, with 'drill-down' capabilities to the original
   graph (something I find I always need for statistical queries, and something
   a graph is simply much better at than a relational database).
   - Use some kind of hashcode in the traversal definition or query to
   identify existing, cached, transient graphs in the second database, so you
   can rely on those for repeated queries, or pagination or streaming, etc.

As traversers are lazy a count operation is not so easily possible, you
 could run the traversal and discard the results. But then the client could
 also just pull those results until it reaches its
 internal tresholds and then decide to use more filtering or stop the
 pulling and ask the user for more filtering (you can always retrieve n+1 and
 show the user that there are more that n results available).


Yes. Count needs to perform the traversal. So the only way to not have to
traverse twice is to keep a cache. If we make the cache a transient
sub-graph (possibly in the second database I described above), then we have
the interesting behaviour that count() takes a while, but subsequent
queries, pagination or streaming, are fast.

Please don't forget that a count() query in a RDBMS can be as ridicully
 expensive as the original query (especially if just the column selection was
 replaced with count, and sorting, grouping etc was still left in place
 together with lots of joins).


Good to hear they have the same problem as us :-)
(or even more problems)

Sorting on your own instead of letting the db do that mostly harms the
 performance as it requires you to build up all the data in memory, sort it
 and then use it. Instead of having the db do that more efficiently, stream
 the data and you can use it directly from the stream.


Client side sorting makes sense if you know the domain well enough to know,
for example, you will receive a small enough result set to 'fit' in the
client, and want to give the user multiple interactive sort options without
hitting the database again. But I agree that in general it makes sense to
get the database to do the sort.

Cheers, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Craig Taverner

 I can only think of a few use cases where loosing some of the expected
 result is ok, for instance if you want to peek at the result.


IMHO, paging is, by definition, a peek. Since the client controls when the
next page will be requested, it is not possible, or reasonable, to enforce
that the complete set of pages (if every requested) will represent a
consistent result set. This is not supported by relational databases either.
The result set, and meaning of a page, can change between requests. So it
can, and does happen, data some of the expected result is lost.

This is completely different to the streaming result, which I see Jim
commented on, and so I might just reply to his mail too :-)

I'm waiting for one of those SlapOnTheFingersExceptions' that Tobias has
 been handing out :)


My fingers are, as yet, unscathed. The slap can come at any moment! :-)

This sounds really cool, would be a great thing to look into!


Should you want examples, I have a wiki page on this topic at
http://redmine.amanzi.org/wiki/geoptima/Geoptima_Event_Log

http://redmine.amanzi.org/wiki/geoptima/Geoptima_Event_Log
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-21 Thread Craig Taverner
I think Jim makes a great point about the differences between paging and
streaming, being client or server controlled. I think there is a related
point to be made, and that is that paging does not, and cannot, guarantee a
consistent total result set. Since the database can change between pages
requests, they can be inconsistent. It is possible for the same record to
appear in two pages, or for a record to be missed. This is certainly how
relational databases work in this regard.

But in the streaming case, we expect a complete and consistent result set.
Unless, of course, the client cuts off the stream. The use case is very
different, while paging is about getting a peek at the data, and rarely
about paging all the way to the end, streaming is about getting the entire
result, but streamed for efficiency.

On Thu, Apr 21, 2011 at 5:00 PM, Jim Webber j...@neotechnology.com wrote:

 This is indeed a good dialogue. The pagination versus streaming was
 something I'd previously had in my mind as orthogonal issues, but I like the
 direction this is going. Let's break it down to fundamentals:

 As a remote client, I want to be just as rich and performant as a local
 client. Unfortunately,  Deutsch, Amdahl and Einstein are against me on that,
 and I don't think I am tough enough to defeat those guys.

 So what are my choices? I know I have to be more granular to try to
 alleviate some of the network penalty so doing operations bulkily sounds
 great.

 Now what I need to decide is whether I control the rate at which those bulk
 operations occur or whether the server does. If I want to control those
 operations, then paging seems sensible. Otherwise a streamed (chunked)
 encoding scheme would make sense if I'm happy for the server to throw
 results back at me at its own pace. Or indeed you can mix both so that pages
 are streamed.

 In either case if I get bored of those results, I'll stop paging or I'll
 terminate the connection.

 So what does this mean for implementation on the server? I guess this is
 important since it affects the likelihood of the Neo Tech team implementing
 it.

 If the server supports pagination, it means we need a paging controller in
 memory per paginated result set being created. If we assume that we'll only
 go forward in pages, that's effectively just a wrapper around the traversal
 that's been uploaded. The overhead should be modest, and apart from the
 paging controller and the traverser, it doesn't need much state. We would
 need to add some logic to the representation code to support next links,
 but that seems a modest task.

 If the server streams, we will need to decouple the representation
 generation from the existing representation logic since that builds an
 in-memory representation which is then flushed. Instead we'll need a
 streaming representation implementation which seems to be a reasonable
 amount of engineering. We'll also need a new streaming binding to the REST
 server in JAX-RS land.

 I'm still a bit concerned about how rude it is for a client to just drop
 a streaming connection. I've asked Mark Nottingham for his authoritative
 opinion on that. But still, this does seem popular and feasible.

 Jim





 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] REST results pagination

2011-04-20 Thread Craig Taverner
To respond to your arguments it would be worth noting a comment by Michael
DeHaan later on in this thread. He asked for 'something more or less
resembling a database cursor (see MongoDB's API).' The trick is to achieve
this without having to store a lot of state on the server, so it is robust
against server restarts or crashes.

If we compare to the SQL situation, there are two numbers passed by the
client, the page size and the offset. The state can be re-created by the
database server entirely from this information. How this is implemented in a
relational database I do not know, but whether the database is relational or
a graph, certain behaviors would be expected, like robustness against
database content changes between the requests, and coping with very long
gaps between requests. In my opinion the database cursor could be achieved
by both of the following approaches:

   - Starting the traversal from the beginning, and only returning results
   after passing the cursor offset position
   - Keeping a live traverser in the server, and continuing it from the
   previous position

Personally I think the second approach is simply a performance optimization
of the first. So robustness is achieved by having both, with the second one
working when possible (no server restarts, timeout not expiring, etc.), and
falling back to the first in other cases. This achieves performance and
robustness. What we do not need to do with either case is keep an entire
result set in memory between client requests.

Now when you add sorting into the picture, then you need to generate the
complete result-set in memory, sort, paginate and return only the requested
page. If the entire process has to be repeated for every page requested,
this could perform very badly for large result sets. I must believe that
relational databases do not do this (but I do not know how they paginate
sorted results, unless the sort order is maintained in an index).

To avoid keeping everything in memory, or repeatedly reloading everything to
memory on every page request, we need sorted results to be produced on the
stream. This can be done by keeping the sort order in an index. This is very
hard to do in a generic way, which is why I thought it best done in a domain
specific way.

Finally, I think we are really looking at two, different but valid use
cases. The need for generic sorting combined with pagination, and the need
for pagination on very large result sets. The former use case can work with
re-traversing and sorting on each client request, is fully generic, but will
perform badly on large result sets. The latter can perform adequately on
large result sets, as long as you do not need to sort (and use the database
cursor approach to avoid loading the result set into memory).

On Wed, Apr 20, 2011 at 2:01 PM, Jacob Hansson ja...@voltvoodoo.com wrote:

 On Wed, Apr 20, 2011 at 11:25 AM, Craig Taverner cr...@amanzi.com wrote:

  I think sorting would need to be optional, since it is likely to be a
  performance and memory hug on large traversals. I think one of the key
  benefits of the traversal framework in the Embedded API is being able to
  traverse and 'stream' a very large graph without occupying much memory.
 If
  this can be achieved in the REST API (through pagination), that is a very
  good thing. I assume the main challenge is being able to freeze a
 traverser
  and keep it on hold between client requests for the next page. Perhaps
 you
  have already solved that bit?
 

 While I agree with you that the ability to effectively stream the results
 of
 a traversal is a very useful thing, I don't like the persisted traverser
 approach, for several reasons. I'm sorry if my tone below is a bit harsh, I
 don't mean it that way, I simply want to make a strong case for why I think
 the hard way is the right way in this case.

 First, the only good restful approach I can think of for doing persisted
 traversals would be to create a traversal resource (since it is an object
 that keeps persistent state), and get back an id to refer to it. Subsequent
 calls to paged results would then be to that traversal resource, updating
 its state and getting results back. Assuming this is the correct way to
 implement this, it comes with a lot of questions. Should there be a timeout
 for these resources, or is the user responsible for removing them from
 memory? What happens when the server crashes and the client can't find the
 traversal resources it has ids for?

 If we somehow solve that or find some better approach, we end up with an
 API
 where a client can get paged results, but two clients performing the same
 traversal on the same data may get back the same result in different order
 (see my comments on sorting based on expected traversal behaviour below).
 This means that the API is really only useful if you actually want to get
 the entire result back. If that was the problem we wanted to solve, a
 streaming solution is a much easier and faster

Re: [Neo4j] How to combine both traversing and index queries?

2011-04-19 Thread Craig Taverner
Another approach to this problem is to consider that an index is actually
structured as a graph (a tree), and so if you write the tree into the graph
together with your data model, you can combined the index and the traversal
into a pure graph traversal. Of course, it is insufficient to simply build
both the index tree and the domain model as two graphs that only connect at
the result nodes. You need to build a combined graph that achieves the
purpose of both indexing and domain structure. This is a very domain
specific thing and so there are no general purpose solutions. You have to
build the graph to suite your domain.

One approach is to build the domain graph first, then decide why you want
indexing, and without adding lucene (or any external index) to the mix,
think about how to modify the graph to also achieve the same effect.

On Mon, Apr 18, 2011 at 8:54 PM, Willem-Paul Stuurman 
w.p.stuur...@knollenstein.com wrote:

 Hi Ville,

 We ran into a similar problem basically wanting to search only part of the
 graph using Lucene. We used traversing to determine the nodes to search from
 and from there on use Lucene to do a search on nodes connected to the nodes
 from the traverse result.

 We solved it as follows:
 - defined a TransactionEventHandler to auto-update the indexes with node
 properties, but also add relationships to the same index. We use the
 relationship.name() as the property name for Lucene, with the 'other node'
 id as the value.
 - traverse to get a set of nodes from where on the search. We apply the ACL
 here to only return nodes the user is allowed to see.
 - create a BooleanQuery for Lucene with the relationship.name() field
 names and id's. So if the relationship would be 'IS_FRIEND_OF' and we want
 to do a full text search for 'trinity' on friends of people with ids 1,2 and
 3, we create a query that contains: +(name:trinity) +(isfriendof:1
 isfriendof:2 isfriendof:3)

 To make sure we only get back 'person' nodes we also indexed the node type
 (in our case 'emtype'), so the complete query is:
 +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3)

 This way you can easily traverse to define the 'edges' of where to search
 and let Lucene handle the search within that region.

 Optionally we add the ACL to the Lucene query as well using the same
 technique, basically adding all group ids the current user is member of and
 has a 'CAN_ACCESS' relationship with the node:
 +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3)
 +(canaccess:233 canaccess:254 canaccess:324)

 It works for us because in our case we know the traversal will return a
 reasonable set of nodes (not thousands+). Lucene can return thousands of
 nodes, but that's not a problem of course. And we can still use the fun
 stuff like sorting, paging and score results.

 Hope this helps.

 Cheers

 Paul


 PS: we always use lower case field names without underscores because
 somehow it makes Lucene happier


 On 18 apr 2011, at 11:19, Mattias Persson wrote:

  2011/4/18 Michael Hunger michael.hun...@neotechnology.com:
  Would it be also possible to go the other way round?
 
  E.g. have the index-results (name:Vil*) as starting point and traverse
 backwards the two steps to your start node? (Either using a traversal or the
 shortest path graph algo with a maximum way-length)?
 
  That's what I suggested, but it doesn't exist yet :)
 
  To do it that way today (do a traversal from each and every index
  result) would probably be slower than doing one traversal with
  filtering.
 
 
  Cheers
 
  Michael
 
  Am 18.04.2011 um 11:03 schrieb Mattias Persson:
 
  Hi Ville,
 
  2011/4/14 Ville Mattila vi...@mattila.fi:
  Hi there,
 
  I am somehow stuck with a problem of combining traversing and queries
  to indices efficiently - something like finding all people with a name
  starting with Vil* two steps away from a reference node.
 
  Traversing all friends within two steps from the reference node is
  trivial, but I find it a bit inefficient to apply a return evaluator
  in each of the nodes visited during traversal. Or is it so? How about
  more complex criteria which may involve more than one property or even
  more complex (Lucene) queries?
 
 
  The best solution IMHO (one that isn't available yet) would be to let
  a traversal have multiple starting points, that is have the index
  result as starting point.
 
  I think that doing a traversal and filtering with an evaluator is the
  way to go. Have you tried doing this and saw a bad performance for it?
 
  I was thinking to spice up my Neo4j setup with Elasticsearch
  (www.elasticsearch.org) to dedicate Neo4j to keep track of the
  relationships and ES to index all the data in them, however it makes
  me feel very uncomfortable to keep up the consistency when data gets
  updated. However, now I need to keep also Neo4j indices updated. And
  not to be said, combining traversal and an external index is yet more
  complicated. However I like 

Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.

2011-03-31 Thread Craig Taverner
Hi Robert,

I took a look at this and the issue is that you are using the
OSMGeometryEncoder to decode the RTree nodes. And the GeometryEncoder is
designed to be specific to your data model, while the RTree internal data is
hard-coded into the RTree design. So there is no guarantee that any
particular data model's geometry encoder will store a bounding box in the
same way as the RTree does.

In this particular case the RTree uses the conventions of the JTS Envelope,
while the OSMGeometryEncoder uses the conventions of the GeoTools envelope.
I looked around and the WKBGeometryEncoder we use for ESRI Shapefile support
uses JTS, so your code would have worked there.

So you should definitely not use the OSM-specific geometry encoder for
looking at anything other than the OSM-specific geometries. The correct API
to get the bounding box from the index is, in fact, the
getLayerBoundingBox() method you already used.

So the only mistake was to grab the root node of the RTree and pass it to
the OSMGeometryEncoder as if it were an OSM Geometry Node, which it is not.

Does that clarify things?

Regards, Craig

On Tue, Mar 29, 2011 at 9:44 AM, Robert Boothby rob...@presynt.com wrote:

 Sorry about dropping out at the end of last week - had some personal
 issues to deal with. I have the following unit test code that
 illustrates the breakdown in the envelope definition:

@Test
public void useLayer() {
 final OSMLayer osmLayer =
 (OSMLayer)spatialDB.getLayer(OSM-BUCKS);
final GeometryFactory factory = osmLayer.getGeometryFactory();
 System.out.println(Unit of measure:  +

 CRS.getEllipsoid(osmLayer.getCoordinateReferenceSystem()).getAxisUnit().toString());

final Point point = factory.createPoint(new
 Coordinate(-0.812988,51.796726));//51.808721,-0.689735));

final Envelope layerBoundingBox =
 osmLayer.getIndex().getLayerBoundingBox();
final Envelope usedEnvelope =

 osmLayer.getGeometryEncoder().decodeEnvelope(((RTreeIndex)osmLayer.getIndex()).getIndexRoot());
System.out.println(Layer Bounding Box:  +
 layerBoundingBox.toString());
System.out.println(Envelope used in search: +
 usedEnvelope.toString());
assertEquals(layerBoundingBox, usedEnvelope);

SearchContain searchContain = new SearchContain(point);
osmLayer.getIndex().executeSearch(searchContain);


for(SpatialDatabaseRecord record: searchContain.getResults()){
System.out.println(Container: + record);
 for(String propertyName: record.getPropertyNames()){
final Object propertyValue =
 record.getProperty(propertyName);
if(propertyValue != null){
System.out.println(\t + propertyName +  :  +
 propertyValue);
}
}
}
}

 It throws an assertion failure at the 'assertEquals(layerBoundingBox,
 usedEnvelope)' - effectively I pull the layer index envelope out using
 the same code as the 'AbstractSearch.getEnvelope(Node geomNode)' does
 (used in AbstractSearchIntersection.needsToVisit()) and compare it
 with what the layer thinks the envelope should be. The same numeric
 values appear in the two envelopes but in different fields.

 Hopefully this gives you all you need to diagnose the problem - but if
 not let me know and we can work out how to drop my rather large test
 data set and project into a common place.

 Robert.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] question

2011-03-30 Thread Craig Taverner

  I think for that the TimelineIndex interface would have to be extended to
 be able to hold additional data so that you can do compound
 queries
 http://docs.neo4j.org/chunked/milestone/indexing-lucene-extras.html#indexing-lucene-compound
 to
 it and get exactly the functionality you're asking for with only one
 index. Another way is to just copy the LuceneTimeline code and roll this
 yourself, it's really small, mostly one-liners for each implemented method.


Alternatively just role your own graph-tree structure that provides the same
capabilities. Then you can index any combination of properties together, to
suite your planned queries. This is obviously much more work than Mattias
suggestion, and does require that you know more about your domain (ie. less
general). But it does allow you to inspect the index itself with graph
traversals, gremlin or neoclipse, which is not possible with lucene.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] question

2011-03-30 Thread Craig Taverner

 Ok, in fact it shouldn't be a performance downgrade even with large
 blobs, right? It just depends on whether the queried part refers to an
 id or similar and that node then is simply connected to the blob. I.e.
 I extract the blob only when I am sure it is the wanted. Is that what
 you meant?


If the blob is a property of a node, it is not loaded when you access that
node, but when you access its properties. I do not know enough about the
implications on performance with large blobs, only that it has been
mentioned many times before that for really large blogs, rather store them
somewhere else (eg. filesystem) and reference them from the graph (eg. path
to file, url, etc.). But I still believe that blogs are not big enough to
really be a concern, but perhaps someone more knowledgable can correct me
here?

Probably because of the database type the old way of having a look
 at the data is not possible any longer. But then which is the right
 way? Having a console and let Gremlin shine?


Filtering the neoclipse view with relationship types and directions helps.
Limiting the number of nodes returns helps a lot. I use 100 max. But use
neoclipse as a visualization tool mostly for visualizing the structure, not
for analytics.

Ok, I change my question. What do you do when you have two big types
 of data, one that does perfectly fit in the graph concept, and one the
 really doesn't have anything to do with it? I guess you put everything
 into the neo4j db and then query one with the graph traverser and the
 other one with the lucene indexer?
 My questions might seem a bit dummy, I apologize for that, I am trying
 to understand why and how I should make use of a graph database.


When I'm deciding between using a graph or using lucene, the size of the
data is not really a factor, but its 'graphiness' :-) For example, if I have
a property of very high diversity, like peoples names, then lucene is a
natural choice. If you have a property with structure, like categories or
tags, or inheritance, or other relationship concepts, then the graph is
best. There are cases in the middle, for example I generally model numerical
properties in the graph, but I think most others would use lucene. I use the
graph because it naturally leads to statistics data. For example, if we use
the time property, and collect all events in the same second and connect
them to the same 1s time node, we now know the number of events in that
second from the structure of the graph. Connect each 1s node to a 1min node,
and we know how many seconds in that minute contained data, etc. Obviously
this is a very simple special case, and usually I keep more statistical
metadata in the graph tree than mere counters, but the result is that your
index now contains lots of statistics you can query without even touching
the original data nodes (ie. very fast statistics queries).
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] question

2011-03-30 Thread Craig Taverner

 That sounds nice. My scenario is something like: I have a centralized
 database. On the Desktop side I have a workstation on which I do GIS
 analysis. People want to get a chunk of data of interest, so they can
 pollute them with their analyses until they are happy. So it is a
 bit the concept of a distributed versioning system. I have my local
 workspace, on which I play, then I could push back that result I
 liked.
 Anything like that around? :)


Not that I know of, but the issue I believe has been tackled by many users
of neo4j. It is quote domain specific, and so not easy to generalize, but
probably not that hard to implement for a limited, specific domain.

For example, I have a product that has three components, an Android client
collecting data and posting JSON packets at a central neo4j server, which
adds them to a graph. Then my desktop app, just like yours, queries the
server for a subset of the data, duplicates that in its internal, local
neo4j database, and performs statistical calculations on that. I do not
(yet) publish these results back to the central server, so I have not dealt
with any versioning or conflict resolution, but have thought about it (at
least within the scope of my domain).
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] question

2011-03-30 Thread Craig Taverner
Agreed, Rick. My opinion is the main reason to role your own index is to
make use of domain specific optimizations not available with generic
indices. In my case the main win is the combination of statistics result and
index that is possible.

But I have to confess, the real reason I started using graphs as indexes was
just that I thought the graph concept so cool, I did not want to pollute it
with something non-graphy. Foolish ideology, I know, and I grew out of that
more than a year ago, but it did influence many of my early neo4j decisions
:-)

On Wed, Mar 30, 2011 at 1:49 PM, Rick Bullotta
rick.bullo...@thingworx.comwrote:

  My experience with using large graph trees for indexes has been mixed,
 with performance issues under heavy read/write load, perhaps due to the many
 potential locks required during insertions.  We switched to the timeline
 index, fwiw.



 - Reply message -
 From: Craig Taverner cr...@amanzi.com
 Date: Wed, Mar 30, 2011 7:43 am
 Subject: [Neo4j] question
 To: Neo4j user discussions user@lists.neo4j.org

 
   I think for that the TimelineIndex interface would have to be extended
 to
  be able to hold additional data so that you can do compound
  queries
 
 http://docs.neo4j.org/chunked/milestone/indexing-lucene-extras.html#indexing-lucene-compound
  to
  it and get exactly the functionality you're asking for with only one
  index. Another way is to just copy the LuceneTimeline code and roll this
  yourself, it's really small, mostly one-liners for each implemented
 method.
 

 Alternatively just role your own graph-tree structure that provides the
 same
 capabilities. Then you can index any combination of properties together, to
 suite your planned queries. This is obviously much more work than Mattias
 suggestion, and does require that you know more about your domain (ie. less
 general). But it does allow you to inspect the index itself with graph
 traversals, gremlin or neoclipse, which is not possible with lucene.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Google Summer of Code 2011

2011-03-29 Thread Craig Taverner
Hi all,

Last year Neo4j was represented in the Google Summer of Code with two
successful projects, in collaboration with Gephi and OSGeo. This year we are
again interested in supporting GSoC projects within other open source
organizations interested in integrating with Neo4j. The
OSGeohttp://osgeo.orghas already welcomed our offer to mentor Neo4j
Spatial 
http://components.neo4j.org/neo4j-spatial/snapshot/neo4j-spatial/projects
within the OSGeo umbrella. We will be updating the Neo4j, uDig and
possibly GeoTools and GeoServer wikis where necessary. If you have ideas
that are not related to Neo4j Spatial, contact us on the mailing list and
suggest which accepted organization we could partner with. If the idea is
interesting enough, we should be able to find a mentor for it.

For further information, here are some links to follow:

   - GSoC 
ideashttp://udig.refractions.net/confluence/display/HACK/Summer+of+Codeon
the uDig wiki (focusing on Neo4j Spatial)
   - Neo4j GSoC
Ideashttp://wiki.neo4j.org/content/Google_Summer_of_Code_Ideason the
Neo4j wiki, with a wide range of interesting ideas
   - List of accepted
GSoChttp://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010organizations
we could consider partnering with for GSoC projects

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] question

2011-03-28 Thread Craig Taverner
Hi Andrea,

I am cc'ing the list with these answers, since I think there are questions
here others know much more about. I will answer all with what I know, or
think I know :-)

1)  what can I put into the node? I see that the superclass proposes
 Property types. So I was wondering if blogs are completely out of
 question.


You can put any java primitive, strings and arrays of primitives and
strings. Since anything can be serialized to byte[], you can store anything
you want really. But practically it is considered no-ideal to store very
large blobs or strings due to reduced performance. But my understanding is
that blogs are typically small enough to not be a problem for this. So just
store the blog contents as a string.

2) could neoclipse act like a kind query engine for udig? Or perhaps
 be easily adapted? I am wondering how an advanced user could browse
 the database.


There are a lot of ways of browsing the database, especially for web user
interfaces. See
http://wiki.neo4j.org/content/Visualization_options_for_graphs

I have a modified and embedded version of neoclipse inside my AWE
application (see the online videos on youtube and vimeo). I use it to allow
*advanced users to browse the database* :-)

3) Is the jo4neo the right way to go to use annotations? Is it something you
 use?


I have not used jo4neo. But it seems like a convenient approach. One concern
I have with object databases is that you risk loosing performance on
traversals if the objects need to be de-serialized on access. I know early
versions of the Ruby library neo4j.rb had that issue, but it was resolved
later. Perhaps jo4neo never had the problem.

4) Indexes in rdbm are done on a table basis and every new record gets
 inserted. Now you have to add the values to the index? It looks to me
 as if you index only what you want to, right? But while not indexing
 in an rdbm leads to slover results, in this case the result is
 missing?


This is only due to there being two search API's. If you search using the
index, you get answers from the index. If you search using the graph you get
answers from the graph. Many questions are actually faster using the graph,
so you should not be too quick to use an index at all. In fact, dare I say
it, if you need an index perhaps you have not modeled the graph correctly
:-)
(having said that, I do use the index myself, but less often than the
graph).

In an RDBMS the normal, non-indexed search is extremely slow (brute force
search), and the index is a drop-in replacement for that. But in the graph,
the only brute force search would be a full database scan, which no-one sane
would do... Instead the graph search requires knowledge of the graph
structure, and therefor the search query can be complex, and by definition
completely different to an index search. So the graph search and index
search are far too different to be placed behind the same API. However, I
could imagine that some object db wrappers like neo4j.rb and jo4neo might be
able to do this, since they have influence over, and knowledge of, the graph
structure they create.

5)  does neo4j have a replication tool? I.e. is it possible to sync a
 remote and a embedded database instance? Hibernate used to help here.
 Are there tools to help?


Yes. There are two. There is a hot backup option for making backups of live
running databases. And there is the relatively new HA (high availability)
infrastructure for keeping clusters of databases synchronized. The various
databases can be running as embedded or as server, it does not matter, they
can still be synchronized. There are rules, however as to which is master,
and which is best to write to. See
http://wiki.neo4j.org/content/High_Availability_Cluster

6) Timeseries. The only way to hande then seems to be the timeline, right?
 So I happily create a timeline like:
 timeLine = new Timeline(precipitations, firstNode, graphDb);
 timeLine.addNode(firstNode, time);
 and then add the whole timeseries nodes. By that they are indexed
 already. (btw the insertion of about 9000 values took quite some time
 more than H2 or postgres, is that normal? Not that this bothers me
 that much, but I would like to know if I am doing somethign wrong)


I have no experience with the timeline class. I have always rolled my own
time index, and it was fast.

However, the most likely issue you are facing is with too many transactions.
Group your commits. The easiest way to do this is every 1000, or 1 (pick
a number between these two :-), just do
tx.success();tx.finish();tx=db.beginTx();

The above number of 9000 could be added in one single commit. So move the
try{]finally{} around the entire loop. Then add the intermediate commits as
described above if you think you will get more data than that. I personally
commit every 1000. I found going to a bigger number helped, but not that
much. Most of the performance gains are achieved in the first 1000.

And how do I run to query it? I couldn't find any docs and testcase 

Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.

2011-03-24 Thread Craig Taverner
Hi Robert,

Interesting work you're doing. I just read your blogs and I think it would
be great to discuss your tests in more detail. Michael Hunger has done some
interesting tests on the scalability of the OSM import, and could probably
give suggestions on configuring the import.

Looking at your code below, I think you have swapped the x and y around in
the Coordinate constructor. It should be Coordinate(x,y), but the values you
have passed look like lat,long (which means y,x).

Also, the SearchContain should return geometries that contain the point you
passed, so if your point is within a lake, or building, or some other
polygon geometry, you should get results, but I do not think it will return
anything if you point is not actually contained within closed polygons. To
give a more complete answer, I think I would need to run and test your code.
Hopefully the above comments help resolve the issue.

Regards, Craig

On Thu, Mar 24, 2011 at 12:42 PM, Robert Boothby rob...@presynt.com wrote:

 Hi, I've been playing with Neo4j Spatial and the OSM data imports to
 see how it all fits together. I've been blogging on my experiences
 (http://bbboblog.blogspot.com).

 It's still early days but think that I have run into an issue. Having
 imported the OSM data successfully I've tried to execute this code to
 determine whether the centre of the town of Aylesbury (UK) is within
 the county of Buckinghamshire (which it is) and to pull back all nodes
 which contain the centre of the town:

final OSMLayer osmLayer = (OSMLayer)spatialDB.getLayer(OSM-BUCKS);
final GeometryFactory factory = osmLayer.getGeometryFactory();
final Point point = factory.createPoint(new
 Coordinate(51.796726,-0.812988));

SearchContain searchContain = new SearchContain(point);

osmLayer.getIndex().executeSearch(searchContain);
for(SpatialDatabaseRecord record: searchContain.getResults()){
System.out.println(Container: + record);
}

 The layer does contain the appropriate data imported from a .osm file
 extract for Buckinghamshire (the smallest file for an English county).

 When I've tried to run it I've got no results and when I've debugged
 it appears that the bbox property attributes (minx, maxx, miny, maxy)
 for the layer's root node are incorrect (mixed up) - minx=-1.1907455,
 maxx = 51.0852483, miny=0.3909055, maxy=52.2274931 causing the search
 to return immediately. Am I using this API correctly and have I
 stumbled into a genuine bug?

 Thank you,

 Robert Boothby.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.

2011-03-24 Thread Craig Taverner
I will need to double check this. I know there was a dispute early on with
Neo4j Spatial because the JTS library orders bbox params in one way, and
GeoTools does it another way, so you might be seeing the results of that. I
believed we sorted that all out, but perhaps not. I have have just checked
the RTree code, and it does the right thing, however it is possible that
code outside is passing in bbox parameters in a different order than
expected.

Perhaps you have some sample test code I can use to check this with?

On Thu, Mar 24, 2011 at 7:01 PM, Robert Boothby rob...@presynt.com wrote:

 Thanks for the response Craig. You're right I had mixed up the
 latitude and longitude coordinates. I've now got the expected
 answers...

 However the transposition of the elements of the spatial index's root
 node geometry envelope definitely occurs. I just wouldn't have spotted
 it if I hadn't mixed up the coordinates and debugged it. The envelopes
 of the result nodes do not have transposed elements.

 The index root node element bounding box envelope looks like:
 Env[-1.1907455 : 51.0852483, 0.3909055 : 52.2274931], my point
 envelope looks like Env[-0.812988 : -0.812988, 51.796726 : 51.796726]
 and the other node envelopes looks like Env[-0.8577898 : -0.7723875,
 51.7930991 : 51.8378447], Env[-0.8502576 : -0.8028032, 51.786247 :
 51.8211053], Env[-0.8516907 : -0.7686481, 51.7932282 :
 51.8335107],Env[-0.8574819 : -0.7733667, 51.7921887 :
 51.8394132],Env[-0.9067499 : -0.6564949, 51.6661276 : 51.9251302],
 Env[-0.8159025 : -0.8114534, 51.7961756 : 51.7997272], .

 It appears that the maxx and miny values have been swapped in the
 spatial index root node.

 There may be certain scenarios when valid coordinates are excluded
 because the spatial index root node has the incorrect envelope.

 Robert.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] [Neo4j Spatial] Need advice on modeling my spatial domain properly

2011-03-18 Thread Craig Taverner
Hi Chris,

A lot depends on your final intentions of how to use the model. There are
many, many ways to model this, and each has its pros and cons. Let me try
briefly describe two options I can think of that are related to the two
factors you suggest below.

*Option one - model time in the graph*

In this case I'm assuming you want to store information about car movements.
For example, you are a logistics company tracking your fleet, and each
car/truck has a GPS and uploads data continuously. This data would be stored
as an event stream in the database, indexed spatially in the RTree, and
indexed with other indexes too (time of event (timeline index), which car
(category index), which driver (category index), other properties of
interest (lucene), etc.). You can relate the car to the OSM model through
routing information (eg. the car is following a planned route on the OSM
graph). Perhaps you model the route as a chain of nodes also, resulting in a
three layer graph, the static OSM, the planned route and the actual route
coming in live. This approach results in a very complete data model can can
be historically mined for statistics and behaviours (eg. which cars match
planned routes best, general speed patterns, driving behaviours, etc.)

For this model there is value in adding your own geometry encoder if you
wish to expose your own data (routes, and car traces) to a map or GIS. Since
it is all point data, you could just use the SimplePointEncoder, but then
you would not see lines, only points. If you want lines, rather make your
own geometry encoder that understands how the nodes are connected in chains.
Review the code of the sample encoders, it is not complex.

*Option two - model time in analysis*

Assuming the previous case is overkill, and you have no interest in fleet
tracking and historical modeling, and all you want is a map that shows a
single point for a car as it moves, it might be possible to not include the
car in the database at all. Where do you get the car data from? If it is a
stream of information from some data source, that stream could be consumed
by the map view itself, just updating the points on the map. If you wish the
map to not have to know about your own stream, then you can use the
database. Perhaps you do something very simple, just store each car location
in a SimplePointLayer (like the blog), and whenever a car change event
arrives (from your source of car data, whatever that is), you could remove
the car node from the RTree index and re-add it (basically re-index the
point at a new location). The map needs to redraw that layer too, so you
need to trigger that. If there are lots of cars moving all the time, rather
just redraw the map layer on a timer.

The reason I called this 'model time in analysis' is that since there is no
time component in the graph, each car has only one current position, any
analysis of car behaviour would have to be done external to the graph,
perhaps on the incoming gps stream. So this is much more limited in
possibility than the previous case.

As you can see I had to make a ton of assumptions about your data and your
intentions to describe the above models. I assume the odds are low that I
matched your exact case very well, but hopefully I gave you some ideas to
think about.

Regards, Craig

On Fri, Mar 18, 2011 at 11:57 AM, Christoph K. 
klaassen.christ...@googlemail.com wrote:

 Hi peole,

 i'm working on a project, where i want to map live data of cars on streets.
 I take my map data from OSM-maps for test purposes - so there's no problem
 at all.
 But i have no idea on how to integrate my car data. Should i implement my
 own geometryencoder, so that my car nodes can contain a position property.
 Or does it make sense to relate my car nodes to point nodes, which are
 representing the current position of my car? Some advice would be great!

 greetings from bavaria
 Christoph
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] [Neo4j Spatial] Need advice on modeling my spatial domain properly

2011-03-18 Thread Craig Taverner
Hi Chris,

I'm glad my comments were of help. I think you are exactly right in starting
simple and enhancing as you need. This is certainly one of the points of a
'schema-less database' after all :-)

Neo4j can handle very large amounts of data, and every car on the planet
would fit, but perhaps not every car logging GPS points every few seconds
for a long time! You can deal with the performance issues as they arise
(although it helps to keep performance in mind as you go, of course).

If you are curious about a related system I was involved in, take a look at
the videos on my vimeo account at http://vimeo.com/craigtaverner. They show
an android application being driven around (in cars), uploading GPS and
other data to a central database (using neo4j), then being downloaded by a
desktop application containing another neo4j database which duplicates a
small subset of the data, and performs statistical analysis of the results
and displaying on a map. Not the same use case as yours, but certainly
related.

Regards, Craig

On Fri, Mar 18, 2011 at 1:11 PM, Christoph K. 
klaassen.christ...@googlemail.com wrote:

 Hi Craig,

 wow, this is a great reply :) Thank you very much for your advices.

 To be a bit more precise about the project: it's a mix of both of your
 options.
 Option 2 is a great start from which i could go on and test, how neo4j
 behaves in my special case. I get the data from the car itself (via umts or
 sth like that) and want to provide some environment informations back to
 the
 car. But it is intended that i have to deal with a huge amount of cars...
 thousands up to... yeah maybe all cars on this planet ;-) (think big).
 Option 1 with the possibility to inspect historical data would be
 interesting, but i'm not sure, if neo4j is powerful enough to store that
 amount of data, which is intended to be collected. So this would not be a
 feature at this time but interesting for later enhancements.

 I think i will try a simple encoder implementation to do a hybrid of your
 options :) this leaves the option to extend the model more easily if it's
 desired.

 greetz
 Chris

 On Fri, Mar 18, 2011 at 12:47 PM, Craig Taverner cr...@amanzi.com wrote:

  Hi Chris,
 
  A lot depends on your final intentions of how to use the model. There are
  many, many ways to model this, and each has its pros and cons. Let me try
  briefly describe two options I can think of that are related to the two
  factors you suggest below.
 
  *Option one - model time in the graph*
 
  In this case I'm assuming you want to store information about car
  movements.
  For example, you are a logistics company tracking your fleet, and each
  car/truck has a GPS and uploads data continuously. This data would be
  stored
  as an event stream in the database, indexed spatially in the RTree, and
  indexed with other indexes too (time of event (timeline index), which car
  (category index), which driver (category index), other properties of
  interest (lucene), etc.). You can relate the car to the OSM model through
  routing information (eg. the car is following a planned route on the OSM
  graph). Perhaps you model the route as a chain of nodes also, resulting
 in
  a
  three layer graph, the static OSM, the planned route and the actual route
  coming in live. This approach results in a very complete data model can
 can
  be historically mined for statistics and behaviours (eg. which cars match
  planned routes best, general speed patterns, driving behaviours, etc.)
 
  For this model there is value in adding your own geometry encoder if you
  wish to expose your own data (routes, and car traces) to a map or GIS.
  Since
  it is all point data, you could just use the SimplePointEncoder, but then
  you would not see lines, only points. If you want lines, rather make your
  own geometry encoder that understands how the nodes are connected in
  chains.
  Review the code of the sample encoders, it is not complex.
 
  *Option two - model time in analysis*
 
  Assuming the previous case is overkill, and you have no interest in fleet
  tracking and historical modeling, and all you want is a map that shows a
  single point for a car as it moves, it might be possible to not include
 the
  car in the database at all. Where do you get the car data from? If it is
 a
  stream of information from some data source, that stream could be
 consumed
  by the map view itself, just updating the points on the map. If you wish
  the
  map to not have to know about your own stream, then you can use the
  database. Perhaps you do something very simple, just store each car
  location
  in a SimplePointLayer (like the blog), and whenever a car change event
  arrives (from your source of car data, whatever that is), you could
 remove
  the car node from the RTree index and re-add it (basically re-index the
  point at a new location). The map needs to redraw that layer too, so you
  need to trigger that. If there are lots of cars moving all the time,
 rather

Re: [Neo4j] Where is the beer?

2011-03-17 Thread Craig Taverner
When I added my face, I tested to make sure it scaled the same as the
others. The results: no images scale at all, no matter what zoom level. They
are all fixed size images.

Google said we should load images up to 64x64, so I originally loaded a
64x64 image, but since it was noticeably larger than the others, I scaled it
down to 48x48. It is still a bit bigger, which is probably what is still
bothering you.

Seems we have a +1 for faces (from Peter) and a -1 for faces (from Anders).
What do others think?
(I must admit if I am going to be the only one with a face, then perhaps the
vote is clear ...)

On Thu, Mar 17, 2011 at 3:54 PM, Anders Nawroth and...@neotechnology.comwrote:

 Please don't do the pretty face thing, such icons aren't scaled in any
 sensible way when zooming out!

 Or find out how to make them scale down ...

 /anders

 On 03/17/2011 03:30 PM, Peter Neubauer wrote:
  Jordi,
  do you need an invite to add yourself? Btw, the map looks really
  pretty now! Need to get some pretty face like Craig on my icon :)
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Mon, Mar 14, 2011 at 9:02 PM, Jordi Valverdede...@eclipsi.net
  wrote:
  Invite me: jvalve...@gmail.com :-)
 
  El 14/03/11 14:21, Andreas Kollegger
  andreas.kolleg...@neotechnology.com  escribió:
 
  I've shared a map with you called Neo4j Graphistas:
  You can view and edit this map at
 
 http://maps.google.com/maps/ms?ie=UTF8hl=enoe=UTF8msa=0msid=2157872407
  36307886514.00049e70e573cbd8a91e5
 
  Where are people graphing? Add yourself to the map (or at least your
 city
  ;)
 
  Note: To edit this map, you'll need to sign into Google with this email
  address. To use a different email address, just reply to this message
 and
  ask me to invite your other one.  If you don't have a Google account,
 you
  can create one at
 
 http://www.google.com/accounts/NewAccount?reqemail=user@lists.neo4j.org.
 
  Cheers,
  Andreas
 
  On Mar 14, 2011, at 2:04 PM, Alfredas Chmieliauskas wrote:
 
  Great! I think thats a great idea!
  A
 
  On Mon, Mar 14, 2011 at 2:02 PM, Michael Hunger
  michael.hun...@neotechnology.com  wrote:
  I would,
 
  I already have extensive plans for that.
 
  I will share them with you :)
 
  Cheers
 
  Michael
 
  Am 14.03.2011 um 13:50 schrieb Alfredas Chmieliauskas:
 
  Who would like to start a social networking site for developers (on
  top of neo4j technology and community)?
  I'm in.
 
  A
 
 
  On Mon, Mar 14, 2011 at 1:45 PM, bhargav gunda
  bhargav@gmail.com  wrote:
  Stockholm, Sweden
 
  On Mon, Mar 14, 2011 at 1:41 PM, Alfredas Chmieliauskas
  al.fre...@gmail.com  wrote:
 
  Amsterdam
 
  On Mon, Mar 14, 2011 at 1:15 PM, Axel Morgnera...@morgner.de
  wrote:
  Hi everybody,
 
  as said, here's a new thread for the idea of having beer and
 talk
  meetings.
 
  Possible locations so far:
 
  Malmö
  London
  Berlin
  Frankfurt
 
  Looking forward to seeing more Neo4j people in personal!
 
  Greetings
 
  Axel
 
 
  On 14.03.2011 13:02, Peter Neubauer wrote:
 
  Berlin sounds great.
  Last year a couple of guys met up at StudiVZ, and suddenly
 we
  were 30
  people. Go for it, there is a LOT of good vibe in Beerlin!

  Cheers,

  /peter neubauer

  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer

  http://www.neo4j.org   - Your high performance
 graph
  database.
  http://startupbootcamp.org/- Öresund - Innovation
 happens
  HERE.
  http://www.thoughtmade.com - Scandinavia's coolest
 Bring-a-Thing
  party.



  On Mon, Mar 14, 2011 at 12:37 PM, Michael Hunger
  michael.hun...@neotechnology.com  wrote:
  They guys could create at least one in Malmö? Isn't Andreas
  there as
  well, and certainly some more fine folks?

  We can do one locally here in Gemany, perhaps Berlin
 (perhaps
  we can
  combine that with our monthly flight to CPH).

  Cheers

  Michael

  Am 14.03.2011 um 11:50 schrieb Jim Webber:

  Hey Rick,

  It was a pleasure to meet you too. And this got me
 thinking -
  it
  would be great to meet more folks from this list, or to form user
  groups, or generally just get a beer and talk Neo4j graphs.

  Is there, for example, a strong London contingent on this
  list? I
  only know me and Nat Pryce so far. Anyone else care to get
  together in
  London?

  Jim
 
  

Re: [Neo4j] Graph design

2011-03-16 Thread Craig Taverner
One key point of Davids suggestion is that it takes into account that each
action of the user could take place from a different IP. Massimo's original
model implied that the user would always be at the same IP for all actions,
or if he could change IP's you would not know which of them related to which
action.

So even though Davids model is more complex, it seems more correct.

Another solution is to create a uid-ip node, representing all cases where a
particular user is at a particular IP. Then that would have direct relations
to all domains (as massimo originally had), and it would have a single
relationship to it's user and it's ip nodes. The graph looks similar to
Davids, but we would have much fewer nodes (all actions from the same uid-ip
are merged).

On Wed, Mar 16, 2011 at 7:08 PM, Massimo Lusetti mluse...@gmail.com wrote:

 On Wed, Mar 16, 2011 at 7:03 PM, David Montag
 david.mon...@neotechnology.com wrote:

  Massimo,
  If you'd like, I could skype with you later this afternoon (in 4-5 hours)
  and discuss it?
  David

 Wow that's would be cool... But hopefully I'm going to be sleeping, I
 need it... Anyway I'll do my homework and come back to you!

 Thanks for the offer... really appreciated.
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j spatial

2011-03-16 Thread Craig Taverner
Hi Saikat,

There are a few places you can look for code samples. One of the best places
is the set of test cases included in neo4j spatial. You can find them at
https://github.com/neo4j/neo4j-spatial/tree/master/src/test/java/org/neo4j/gis/spatial.
In particular, since you are interested mostly in point data, take a look at
TestSimplePointLayer.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSimplePointLayer.javaand
LayersTest.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/LayersTest.java
.

What you will find in those classes is Java code for adding points to the
database, similar, but more extensive than the code in the blog.

Regarding your specific case, if you are working with a normal google map or
bing map, and want to port the points into a local database, you would need
to export them, and write a simple importer. If you have written a mashup
between google or bing maps and your own neo4j-based web application, you
should be able to use some client side coding to automate this, accessing
the map, and posting the points directly into your own server (where of
course you would have some code adding the points to the database). Does
this answer your question?

Regards, Craig

On Thu, Mar 17, 2011 at 12:32 AM, Saikat Kanjilal sxk1...@hotmail.comwrote:

 Hi Folks,
 I was reading through the docs on neo4j spatial and was wondering about a
 few things:

 1) If I have a google or bing map and I manually plot some points can I use
 neo4j spatial to automate the loading of those points into my neo4j db?

 2) Are there code samples for neo4j-spatial or implementations I can look
 at for a deeper look at the API's etc?

 Best Regards

 Sent from my iPhone
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Traversal framework

2011-03-15 Thread Craig Taverner
I like the pipes idea. What I would like to see is nested traversers. The
pipe example below seems to imply single hops at each step, but it would be
nicer to allow each step to traverse until it reached a certain criteria, at
which point a different traversal would take over.

In the old and current API's it seems to do this you need to create a
traversal, iterate over it, and create a new traversal inside the loop.

We created a Ruby DSL for nested traversals a year or so ago that looks a
bit like:


  chart 'Distribution analysis' do
self.domain_axis='categories'
self.range_axis='values'
select 'First dataset',:categories='name',:values='value' do
  from {
from {
  traverse(:outgoing,:CHILD,1)
  where {type=='gis' and name=='network.csv'}
}
traverse(:outgoing,:AGGREGATION,1)
where {name=='azimuth' and get_property(:select)=='max' and
distribute=='auto'}
  }
  traverse(:outgoing,:CHILD,:all)
end
  end

This is quite a complex example, but the key points are the from method
which defines where to start a traversal, and the traverse method which
defines the traversal itself, with the where method which is like the old
ReturnableEvaluator.

Will the new pipes provide something like this?

On Tue, Mar 15, 2011 at 9:19 AM, Massimo Lusetti mluse...@gmail.com wrote:

 On Tue, Mar 15, 2011 at 9:11 AM, Mattias Persson
 matt...@neotechnology.com wrote:

  I'm positive that some nice API will enter the kernel at some point,
 f.ex.
  I'm experimenting with an API like this:
 
   for(Node node :
 
 PipeBuilder.fromNode(startNode).into(otherNode(A)).into(otherNode(B)).nodes())
  {
  // node will be (3) from the example above
   }
 
 
  I hope I didn't confuse you with all this :)

 Nope, the opposite. Thanks for the clarification and that kind of API
 would be a killer feature IMHO.

 It will be even more pleasant to work with neo4j...

 Cheers
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4j Spatial and GeoTools

2011-03-15 Thread Craig Taverner
Hi,

There were a few comments on twitter about the use of GeoTools in Neo4j
Spatial, so I wanted to elaborate on the discussion with a short description
of where and why we include some GeoTools libraries in Neo4j Spatial.

The discussion started with two tweets by
martengustafsonhttp://twitter.com/#!/martengustafson
:

Current state of Java GIS libraries and their interdependencies:
#failhttp://twitter.com/#!/search?q=%23fail

followed shortly by:

I'm sorry Neo4J spatial. I'm sure you're nice and all but the endless cruft
that is OpenGIS and GeoTools brought you down with them.


There followed a short chat between Martin and I to find out what his
concerns were. You can follow the chat on twitter, but my summary of it is
that Martin noticed the use of the Coordinate and Geometry classes in my
blog post at
http://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html. In
addition he noticed that to run the test cases, or the code on the blog, you
needed to include geotools libraries in the classpath, and he had negative
experiences with geotools in the past. So he felt this dependency reflected
negatively on Neo4j Spatial.

My answers to Martin were to explain briefly why we use GeoTools and where.
I would like to elaborate in more detail here. Firstly, the main core of
Neo4j Spatial does not use GeoTools, but rather JTS, a lower level topology
library with a lot less dependency-complexity than GeoTools, and so
hopefully much less of a concern to Martin. The Coordinate and Geometry
classes used in the blog are from JTS. Martin admitted that he thought JTS
used GeoTools, not the other way round. The TestSimplePointLayer test case
the blog was based on has no GeoTools dependencies itself.

However, the fact remains that several GeoTools libraries are still
dependencies of Neo4j Spatial. While the core design does not require
GeoTools, there are three places they are used:

   - The API's to expose the Neo4j Spatial data to well known GIS's that use
   GeoTools, like
GeoServerhttp://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServerand
   uDig http://wiki.neo4j.org/content/Neo4j_Spatial_in_uDig. Obviously we
   need GeoTools libraries to enable GeoTools compatibility.
   - Some current and future import/export utiltiies. The ShapefileImporter,
   for example, uses GeoTools support for reading shapefiles. We have
   investigated, but not included, GeoJSON support based on GeoTools also.
   - The DefaultLayer implementation uses GeoTools for the WKB and WKT
   Geometry Encoders, since WKB and WKT are well supported in GeoTools. While
   this is currently part of the core code, the design of Neo4j Spatial is
   plugable, so you do not need to use this code. However, you do need to
   include the relevant GeoTools libraries.

Early on in the project, it was considered to split Neo4j Spatial into two
parts, a core that only required JTS and extensions that required GeoTools.
However, this route was not taken for a few reasons. Chief among them was
the fact that most people trying out Neo4j Spatial were happy with maven,
and so dependencies were not a serious issue. So simplicity of development
won out, and we kept GeoTools dependencies in the core.

Recently I felt the negative side of this when I developed the
neo4j-spatial.rb https://rubygems.org/gems/neo4j-spatial Ruby gem, and
needed to include dependencies in the gem. The maven dependencies were a bit
over zealous and so too many libraries were included. Michael Hunger came to
the rescue and cleaned up the dependencies somewhat, so the current gem is
quite a lot thinner than the earlier ones.

Anyway, aside from my larger-than-necessary early gems, Martins concerns on
twitter are the first community criticism of the use of GeoTools in Neo4j
Spatial. I would like to know if others feel this is something we should be
concerned with, and try to split the core out so as not to require GeoTools?
My personal impression is that the benefits far outweigh the disadvantages,
but I would like to know what others think.

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Nice change from all this Java

2011-03-10 Thread Craig Taverner
Cool syntax. I love that he has not skimped on docs.

But isn't the EPL going to be a problem here? (a'la EPL/GPL clash)

On Wed, Mar 9, 2011 at 10:38 PM, Andres Taylor 
andres.tay...@neotechnology.com wrote:

 Hey all,

 Wanted to share something I just found on reddit.

 https://github.com/wagjo/borneo

 After being force fed all this Java, a bit of Clojure was very welcome.
 Looks really nice too.

 Andrés
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neoclipse Wish List

2011-03-07 Thread Craig Taverner

  But as far as I know there is not filter on properties of nodes or
  relationships. Should be easy to add though.

 How would you like that to look?


Perhaps this could use the REST API traversal syntax also?

 - List of existing lucene indices available to query

 The new search dialog now lists the node indices from the integrated
 index. The relationship indices will soon come, but there's more
 refactoring needed to add relationship search. The next step is then to
 add Lucene query support, not only lookup for exact matches.


Excellent. I had not checked that.

 - visualize sub-graph not by a fixed depth, but by a traversal query
  (possibly using the syntax of the REST API, since that is dynamically
  interpreted)

 Nice idea to use the REST API syntax!


And it is hopefully easy to implement :-)
(so we can see it soon?)

Regards, Craig
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insertion, how to index relationships?

2011-03-07 Thread Craig Taverner

 I think you'll have to add a dummy key/value to each relationship, like
 exists/true or whatever. The overhead for that is insignificant and
 once
 relationships are indexed with whatever key/value they can be queried with
 those additional start/end node.


And I believe you can index the relationship with these dummy values even if
the properties do not exist on the relationship itself, right? That saves a
lot of space.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neoclipse Wish List

2011-03-05 Thread Craig Taverner
You can filter the number of nodes in the preferences settings (click the
gear icon on the top left, and then click neo4j, and change maximum number
of nodes).

You can filter the relationships by types in the relationships view,
deselect the types (also by incoming/outgoing).

But as far as I know there is not filter on properties of nodes or
relationships. Should be easy to add though.

My wish list includes:

   - Menu of possible alternative roots (places in the database to jump
   directly to, possible based on a lucene query).
   - List of existing lucene indices available to query
   - visualize sub-graph not by a fixed depth, but by a traversal query
   (possibly using the syntax of the REST API, since that is dynamically
   interpreted)


On Sat, Mar 5, 2011 at 10:38 PM, Rick Bullotta 
rick.bullo...@burningskysoftware.com wrote:

 Neoclipse is an awesome tool, but here are a few items that would greatly
 increase the utility and usability:



 1) Provide a limit on the # of nodes/relationships that are displayed (and
 a
 warning that additional nodes and relationships were not shown)

 2) Provide display filters based on node and/or relationship property
 values
 and relationship types



 I think these would greatly improve the situation when very large or
 complex
 graphs are involved, since now the visualizations become far too busy to
 do anything.



 Thoughts?



 Rick

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Release of 1.0.0

2011-03-04 Thread Craig Taverner
Thanks for a great gem Andreas,

One thing I noticed is that rubygems.org still lists version 0.4.6 as the
official release.

On Thu, Mar 3, 2011 at 10:42 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Amazing work Andreas,
 the community is truly thankful to both you and all the contributors
 and Rubyists out there!

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Wed, Mar 2, 2011 at 7:58 PM, Andreas Ronge andreas.ro...@gmail.com
 wrote:
  Hi
 
  Just promoted 1.0.0.beta.32 to version 1.0.0neo4j.rb version 1.0.0
  #neo4j #jruby #rails
  I also have written a blog how to get started with Neo4j/Rails 3
 
  http://blog.jayway.com/2011/03/02/neo4j-rb-1-0-0-and-rails-3/
 
  Thanks a lot for all your feed back and contributions !
 
  /Andreas
 
  --
  You received this message because you are subscribed to the Google Groups
 neo4jrb group.
  To post to this group, send email to neo4...@googlegroups.com.
  To unsubscribe from this group, send email to
 neo4jrb+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/neo4jrb?hl=en.
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Load last 10 created nodes from neo4j

2011-03-04 Thread Craig Taverner
What about taking the id of the last node created, and just decrementing
backwards by 1, ten times, and get those ten nodes? This will not take into
account id-reuse, though, so if you have node deletion, this will not
necessarily give the last ten added, only the ten with highest id. I expect
that quite often it will be the same thing, though. Depending on what you
need this for, perhaps this is good enough?

On Fri, Mar 4, 2011 at 2:59 PM, Rick Bullotta 
rick.bullo...@burningskysoftware.com wrote:

 It would be pretty easy to use an index to do this (keep only the 10 most
 recent in the index), but you'd need to implement code everywhere you
 add/delete nodes and relationships, and if you're using an abstraction
 layer, it wouldn't be possible.

 In short, it's absolutely possible, but you'll need to implement it in your
 code.

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
 On
 Behalf Of mike_t
 Sent: Friday, March 04, 2011 8:54 AM
 To: user@lists.neo4j.org
 Subject: [Neo4j] Load last 10 created nodes from neo4j

 Hi, is it possible to load for example the last 10 created nodes or
 relationships from the neo4j db?

 I know it is not the normal use case for a graph db but however i need a
 solution.

 Thanks for your answers,

 Mike

 --
 View this message in context:

 http://neo4j-user-list.438527.n3.nabble.com/Load-last-10-created-nodes-from-
 neo4j-tp2633139p2633139.html
 Sent from the Neo4J User List mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to copy a complete database?

2011-03-03 Thread Craig Taverner
What would be the consequence of running a background thread that iterated
through all nodes and relationships, and if any had a short string property,
it would re-save the property? I assume the properties store would get a lot
of empty space in the beginning, or would old-id reuse kick in and prevent
this?

It seems like something can can be done at application level easily enough.
And if only some fraction of your properties are really short strings, this
should be more efficient than copying the entire database.

On Thu, Mar 3, 2011 at 11:52 AM, Tobias Ivarsson 
tobias.ivars...@neotechnology.com wrote:

 No there is no simpler way, yet. We've been thinking about creating a
 short
 string compression tool for accomplishing this, but haven't done so yet.

 Cheers,
 Tobias

 On Thu, Mar 3, 2011 at 11:35 AM, Balazs E. Pataki pat...@dsd.sztaki.hu
 wrote:

  Hi,
 
  I have a big database based on Neo4J 1.2. Now, if I would like to use
  the short strings feature of Neo4j 1.3 M03 I should regenerate my full
  database, that is all strings should be reset so that it may or may not
  be stored according to the new short strings policy.
 
  It seems to me that the easiest way to do this would be to somehow be
  able to copy the full 1.2 database to a newly created 1.3 M03 database
  by traversing the 1.2 database. But there maybe a simpler (neo4j
  builtin) way to do this. Any hints about this?
 
  Thanks,
  ---
  balazs
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Tobias Ivarsson tobias.ivars...@neotechnology.com
 Hacker, Neo Technology
 www.neotechnology.com
 Cellphone: +46 706 534857
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Can anyone compile the latest Neo4J Spatial?

2011-02-27 Thread Craig Taverner
That error is from the PNG exporter, and has never had any side effects.
Also that code is not called at all by the test case I am running, which is
TestOSMImport. You should see that error only from TestDynamicLayers.

Having said that, I have added a fix for it, and will commit that later
today or tomorrow.

On Sun, Feb 27, 2011 at 12:42 PM, Andreas Kollegger 
andreas.kolleg...@neotechnology.com wrote:

 On a fresh and clean Ubuntu VM (after installing java, maven, git, etc), I
 just cloned neo4j-spatial and tried `mvn clean install`.

 During the build, I noticed a few these scattered about:

 Feb 27, 2011 11:58:26 AM org.geotools.map.MapContent finalize
 SEVERE: Call MapContent dispose() to prevent memory leaks

 But ended up with a successful build. No failures, no errors.

 Cheers,
 Andreas

 On Feb 27, 2011, at 7:22 AM, Peter Neubauer wrote:

  Mmmh,
  the index provider kernel extension subsystem has been changed between
  1.3.M01 and M02. I suspect an incompatible kernel version being
  resolved by maven. let me try to run this tomorrow from home with
  moving away my current maven repo and get everything fresh. (Sitting
  on a 3G conenction right now).
 
  Hopefully I can tell you tonight, otherwise tomorrow how that works,
  ok? Also, you could try to move away your ~/.m2/repository for one
  build and try getting all artifacts fresh from the netz?
 
  Cheers,
 
  /peter neubauer
 
  GTalk:  neubauer.peter
  Skype   peter.neubauer
  Phone   +46 704 106975
  LinkedIn   http://www.linkedin.com/in/neubauer
  Twitter  http://twitter.com/peterneubauer
 
  http://www.neo4j.org   - Your high performance graph
 database.
  http://startupbootcamp.org/- Öresund - Innovation happens HERE.
  http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
  On Sun, Feb 27, 2011 at 2:03 AM, Nolan Darilek no...@thewordnerd.info
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  On 02/26/2011 05:56 PM, Craig Taverner wrote:
  It is working for me too.
 
  One thing that is interesting about the error message is that it says
 it
  looks like another instance is running in the *same JVM*. Is that the
 usual
  error message? (complete text was this is usually caused by another
 Neo4j
  kernel already running in this JVM for this particular store).
 
  The error is occurring at the very start of the very first test case in
 the
  TestSpatial class, so cannot be due to another test in that class.
 
  Still, I would take Peters advice, check no other java test processes
 are
  running, manually delete the database to be sure, and then try again.
 
 
  I don't mean to be difficult, but I *literally* did:
 
  git clone ... neo4j-spatial
  cd neo4j-spatial
  mvn install
 
  If I can get more pristine than that then do let me know, but I can't
  see how.
 
  The one process you'll see open in this transcript is a web app. It has
  nothing to do with Neo4J in anything other than it hosts its jars in its
  dependencies. The database is not even used at this time and, indeed,
  the exact same behavior happens if it isn't running.
 
  My next question, does someone have a development dependency hanging
  around in their local m2 repository that I don't? When you've verified
  that you can build a clean tree, you've first backed up ~/.m2 and
  removed it? In any case:
 
  desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$ ps auxw |grep
 java
  nolan12111  1.4  2.8 1423840 109504 pts/1  Sl+  16:07   2:22
   [97;45m [Kjava [m [K -XX:+HeapDumpOnOutOfMemoryError
  - -XX:+CMSClassUnloadingEnabled -Dsbt.log.noformat=true -jar
  /home/nolan/bin/sbt-launcher.jar jetty
  nolan13153  0.0  0.0   7624   896 pts/4S+   18:55   0:00 grep
  - --color=auto  [97;45m [Kjava [m [K
   ]0;nolan@nolan-desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$
  git clone git://github.com/neo4j/neo4j-spatial
  Initialized empty Git repository in
  /home/nolan/src/neo4j/neo4j-spatial/.git/
  remote: Counting objects: 3065, done. [K
  ...
  Resolving deltas: 100% (1247/1247), done.
   ]0;nolan@nolan-desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$
 cd
  neo4j-spatial
   ]0;nolan@nolan-desktop:
  ~/src/neo4j/neo4j-spatial nolan@nolan-desktop
 :~/src/neo4j/neo4j-spatial$
  mvn install
  [INFO] Scanning for projects...
  [INFO]
  -
 
  [INFO] Building Neo4j Spatial Components
  [INFO]task-segment: [install]
  [INFO]
  -
 
  [WARNING] POM for 'org.rrd4j:rrd4j:pom:2.0.6:provided' is invalid.
 
  Its dependencies (if any) will NOT be available to the current build.
  [INFO] [enforcer:enforce {execution: enforce-maven}]
  [INFO] [license:check {execution: check-licenses}]
  [INFO] Checking licenses...
  [INFO] [dependency:unpack-dependencies {execution: get-test-data}]
  [INFO] Unpacking
 
 /home/nolan/.m2/repository/org/neo4j/spatial/osm

Re: [Neo4j] MMap Error on importing large data

2011-02-27 Thread Craig Taverner
Funny you should suggest this. 10 minutes ago I started a new test run that
does exactly that. If any test case takes more than 5min to import, I run 3
gc() with 1s sleeps in between. OK, so not your 5s sleep, but let's see if
it helps.

I also reduced Xmx to 1600M, and reduced the mmap settings by a similar
amount, hopefully generally reducing the apps memory consumption somewhat.
Let's see what happens this time.

On Sun, Feb 27, 2011 at 6:45 PM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 you can try to null the batch inserter and all its external deps that you
 control add several System.gc() with some thread.sleep(5000) in between that
 should free the heap
 you can also output runtimes free memory or even better have a jconsole run
 concurrently to see heap allocation history (it might even show mmap)

 Michael

 Sent from my iBrick4


 Am 27.02.2011 um 18:33 schrieb Craig Taverner cr...@amanzi.com:

 
  What about the IOException Operation not permitted ?
  Can you check the access rights on your store?
 
 
  They look fine (644 and 755). Also, it would seem strange for the access
  rights to change in the middle of a run. The database is being written to
  continuously for  about 5 hours successfully before this error. I also
 note
  that I have 20GB free space, so running out of disk space seems unlikely.
  Having said that, I will do another run with a parallel check for disk
 space
  also.
 
  While googling I saw that you had a similar problem in November, that
 Johan
  answered.
  From the answer it seems that the kernel adapts its memory usage and
  segmentation from the store size.
  So as the store size before the import was zero, probably some of the
  adjustments that normally
  take place for such a large store won't be done.
 
 
  I create both the batch inserter and the graph database service with a
 set
  configuration, as in the top of the file at
 
 https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/Neo4jTestCase.java
 
  So your suggestion to run the batch insert in a first VM run and the API
  work in a second one makes a lot of
  sense to me, because the kernel is then able to optimize memory usage at
  startup (if you didn't supply a config file).
 
 
  I will try that tomorrow perhaps. I would need to extract the test code
 to a
  place I can use from a console app first. But I noticed also that Mattias
  thought that two JVM's would not help.
 
  Regarding the test-issue. I would really love to have this code elsewhere
  and just used in the tests, then it could be used
  by other people too and that would it perhaps also easier to reproduce
 your
  problem just with the data file.
 
 
  I can do that. I'm short of time right now, but will see if I can get to
  that soon. Should be relatively simple to extract to the OSMDataset, so
  other users can call it. Basically the code traverses both the GIS
 (layers)
  views of the OSM data model, and the OSM view (ways, nodes, changesets,
  users) and produces some statistics on what is found. Could be generally
  interesting. The one messy part is the code also makes a number of
  assertions for expected patterns, and this only makes sense in the JUnit
  test. I would need to save the stats to a map, return that to the junit
 code
  so it can make the assertions later.
 
  Can you point me to the data file used and attach the test case that you
  probably modified locally? Then I'd try this at my machine.
 
 
  I've just pushed the code to github. The test class is the TestOSMImport.
  Currently it skips a test if the test data is missing, and there is only
  data for two specific test cases in the code base (Billesholm and Malmö).
 To
  get it to run the big tests, simply download denmark.osm and/or
 croatia.osm
  from downloads.cloudmade.com. At the moment croatia.osm imports fine, at
  reasonable performance, but denmark.osm is the one giving the problems.
 
  Looks like the memory mapped buffer configuration needs to be tweaked.
 
 
  From Johans previous answer, combined with something I read on the wiki,
 it
  seems that the batch inserter needs different mmap settings than the
 normal
  API. I read that the batch inserter uses the heap for its mmap, while the
  normal API does not. If I understand correctly, this means that when
 using
  the batch inserter, we have to use smaller mmap, otherwise we might fill
 the
  heap too soon?
 
  In any case, it seems like keeping mmap settings relatively small should
  avoid this problem, although might not lead to best performance? Have I
  understood correctly?
 
  On Windows heap buffers are used by default and auto configuration
  will look how much heap is available. Getting out of memory exceptions
  is an indication that the configuration passed in is using more memory
  than available heap.
 
 
  I am currently using -Xmx2048 on a 4GB ram machine, 32bit java, and the
  settings:
 
 static {
 NORMAL_CONFIG.put

[Neo4j] MMap Error on importing large data

2011-02-26 Thread Craig Taverner
Hi,

I was importing a reasonably large OSM dataset into Neo4j Spatial, and this
involves a batch inserter which imports everything, followed by switching to
a normal embedded graph database for adding nodes to the RTree index, which
is an in-graph tree structure. The batch inserter phase worked fine, but
sometime into the RTree index (normal graph API tree creation), the process
terminated and I got the error message in the console:

mmap failed for CEN and END part of zip file


Since this was running in JUnit, I also got a stack trace in junit console,
which I have included below, but the key elements are that it occurred on a
line in my code that extracts a double[] property from a node:

double[] bbox = (double[]) geomNode.getProperty(bbox);


This was certainly not the first time that method was called, and in fact
this code has been stable for many months, so I think something deeper
inside is going wrong. The stack trace goes further to enter logging, so it
seems like it was trying to print a warning, but the logging seems to be
trying to load a jar file, which gave a ZipException.

I do not know which is more relevant, the 'mmap' error in the console, or
the  logWarn/ZipFile error in the stack trance. Has anyone seen something
like this, or have any ideas how I can trace this further? It took nearly 5
hours to run to this point, so it is not easy to duplicate. Could this in
any way be due to issues with memory or the heap versus memory mapping
question. Considering that I switch between the batch inserter and the
normal API in the same java runtime with the same settings, are there things
that I should be taking into account here.

Regards, Craig

java.lang.InternalError
at sun.misc.URLClassPath$JarLoader.*getResource*(Unknown Source)
at sun.misc.URLClassPath.getResource(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at sun.misc.Launcher$ExtClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.util.ResourceBundle$RBClassLoader.loadClass(Unknown Source)
at java.util.ResourceBundle$Control.newBundle(Unknown Source)
at java.util.ResourceBundle.loadBundle(Unknown Source)
at java.util.ResourceBundle.findBundle(Unknown Source)
at java.util.ResourceBundle.findBundle(Unknown Source)
at java.util.ResourceBundle.getBundleImpl(Unknown Source)
at java.util.ResourceBundle.getBundle(Unknown Source)
at java.util.logging.Level.getLocalizedName(Unknown Source)
at java.util.logging.SimpleFormatter.format(Unknown Source)
at java.util.logging.StreamHandler.publish(Unknown Source)
at java.util.logging.ConsoleHandler.publish(Unknown Source)
at java.util.logging.Logger.log(Unknown Source)
at java.util.logging.Logger.doLog(Unknown Source)
at java.util.logging.Logger.log(Unknown Source)
at java.util.logging.Logger.warning(Unknown Source)
at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.*logWarn*
(PersistenceWindowPool.java:605)
at
org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.refreshBricks(PersistenceWindowPool.java:505)
at
org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(PersistenceWindowPool.java:123)
at
org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(CommonAbstractStore.java:490)
at
org.neo4j.kernel.impl.nioneo.store.AbstractDynamicStore.getLightRecords(AbstractDynamicStore.java:397)
at
org.neo4j.kernel.impl.nioneo.store.PropertyStore.getRecord(PropertyStore.java:343)
at
org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.propertyGetValue(WriteTransaction.java:1147)
at
org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.loadPropertyValue(NioNeoDbPersistenceSource.java:407)
at
org.neo4j.kernel.impl.persistence.PersistenceManager.loadPropertyValue(PersistenceManager.java:79)
at
org.neo4j.kernel.impl.core.NodeManager.loadPropertyValue(NodeManager.java:572)
at org.neo4j.kernel.impl.core.Primitive.getPropertyValue(Primitive.java:538)
at org.neo4j.kernel.impl.core.Primitive.getProperty(Primitive.java:158)
at org.neo4j.kernel.impl.core.NodeProxy.*getProperty*(NodeProxy.java:134)
at
org.neo4j.gis.spatial.osm.OSMGeometryEncoder.decodeEnvelope(OSMGeometryEncoder.java:115)
at org.neo4j.gis.spatial.RTreeIndex.getEnvelope(RTreeIndex.java:243)
at org.neo4j.gis.spatial.RTreeIndex.chooseSubTree(RTreeIndex.java:349)
at org.neo4j.gis.spatial.RTreeIndex.add(RTreeIndex.java:76)
at org.neo4j.gis.spatial.osm.OSMLayer.addWay(OSMLayer.java:102)
at org.neo4j.gis.spatial.osm.OSMImporter.reIndex(OSMImporter.java:212)
at org.neo4j.gis.spatial.osm.OSMImporter.reIndex(OSMImporter.java:185)
at
org.neo4j.gis.spatial.TestOSMImport.loadTestOsmData(TestOSMImport.java:99)
at 

Re: [Neo4j] MMap Error on importing large data

2011-02-26 Thread Craig Taverner
Hi,

I ran the test again with parallel logging of lsof once a minute and ps once
every ten minutes. The open file descriptors oscillated around 200, peaking
at 218, and at 211 just before the crash, so this seems OK. The memory was
around 2.3GB for the entire batch insertion, but went up to nearly 2.8
shortly before the crash (during the normal graph database service phase). I
was only checking memory every 10 minutes on a several hour run, so I do not
have details from immediately before the crash. I wonder if the batch
inserter is not freeing memory, and the normal graph API needs more memory
than it can get. Perhaps it is not possible (or sensible) to run them both
in the same JVM in series like I do?

I redirected the console to a file and this time I got two stack traces, one
from JUnit and one in the console file:

   - The junit stack trace is similar to before, at a different point in my
   code (a getSingleRelationship), but at the same point in the neo4j code on a
   logWarn call at line 605 of PersistenceWindowPool.java. It seems from
   Michaels previous mail, and the memory trace I ran, that this is very much
   related to running out of memory.
   - The console stack trace starts with the error *MappedMemException:
   Unable to map pos=10086537 recordSize=9 totalSize=52443*
   from MappedPersistenceWindow.java:60. This traces back to the same
   getSingleRelationship as the other stack trace. The underlying exception is
   IOException: Operation not permitted from MappedPersistenceWindow.java:52.

I have attached both stack traces to this file.

To answer Michael's comments about JUnit, the reason I have been running
this in JUnit, is that my junit test case has a lot of code that analyses
the final graph structure and creates statistics about the graph that are
useful for verifying that the OSM import worked and conformed to expected
patterns. I use this as part of the standard test suite on small OSM files.
The occasional use of this on large OSM files is convenient for me, since I
can just uncomment some test case and get very useful statistics from the
post-import test code. I could move it to a separate console app, but then I
would want to move all the post-import test and verification code to a
common place, while really it is primarily for the test cases.

Thanks for all the advice so far.

Regards, Craig

On Sat, Feb 26, 2011 at 3:08 PM, Mattias Persson
matt...@neotechnology.comwrote:

 It may be that too many files are open... there has been some previous
 mail about batch insertion (refering to lucene index insertion)
 keeping files open. Could you do an:

lsof -n | grep name-of-your-store-dir | wc

 and see if that returns a high number,  1000 or something?

 2011/2/26 Michael Hunger michael.hun...@neotechnology.com:
  The error occurs here:
 catch ( OutOfMemoryError e )
 {
 e.printStackTrace();
 ooe++;
 logWarn( Unable to allocate direct buffer );
 }
 
  And I assume that's why the jdk classloader can't load the stuff needed
 for localizing  Level.WARN.
  And that results in the error.
 
  So what would be much more helpful is the stacktrace from
 e.printStackTrace() from your console.
 
  The one from the zipfile is probably rather from the JDK reading from one
 of its JARs.
 
  So if that error ocurred during creating your RTree, you should be able
 to reproduce it as the store is still intact after the initial import.
 
  You can also zip the store and your code and share it via dropbox.
 
  What are your: heap and mmap settings and which version of of jdk, neo4j
 etc. you were running?
 
  We can try to run it on one of our test servers then it should be faster
 and not bother someones box for so long.
 
  Cheers
 
  Michael
 
  P.S: What bothers me is that you run imports via a JUnit - Test, isn't
 there OSMImport.main() exactly for that purpose ?
 
 
  Am 26.02.2011 um 12:38 schrieb Craig Taverner:
 
  Hi,
 
  I was importing a reasonably large OSM dataset into Neo4j Spatial, and
 this
  involves a batch inserter which imports everything, followed by
 switching to
  a normal embedded graph database for adding nodes to the RTree index,
 which
  is an in-graph tree structure. The batch inserter phase worked fine, but
  sometime into the RTree index (normal graph API tree creation), the
 process
  terminated and I got the error message in the console:
 
  mmap failed for CEN and END part of zip file
 
 
  Since this was running in JUnit, I also got a stack trace in junit
 console,
  which I have included below, but the key elements are that it occurred
 on a
  line in my code that extracts a double[] property from a node:
 
  double[] bbox = (double[]) geomNode.getProperty(bbox);
 
 
  This was certainly not the first time that method was called, and in
 fact
  this code has been stable for many months, so I think something deeper
  inside is going wrong. The stack trace goes further

Re: [Neo4j] Can anyone compile the latest Neo4J Spatial?

2011-02-26 Thread Craig Taverner
It is working for me too.

One thing that is interesting about the error message is that it says it
looks like another instance is running in the *same JVM*. Is that the usual
error message? (complete text was this is usually caused by another Neo4j
kernel already running in this JVM for this particular store).

The error is occurring at the very start of the very first test case in the
TestSpatial class, so cannot be due to another test in that class.

Still, I would take Peters advice, check no other java test processes are
running, manually delete the database to be sure, and then try again.

On Sat, Feb 26, 2011 at 9:30 PM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Nolan,
 I am running GIT fresh tests without problems. Are you having some old
 Java process running? Seems Neo4j refuses to start because it can't
 lock the files ...

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Sat, Feb 26, 2011 at 8:26 PM, Nolan Darilek no...@thewordnerd.info
 wrote:
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
 
  Wondering if something didn't get committed somewhere? I tried both with
  an update as well as with a fresh checkout and am getting a ton of test
  errors, not failures.
 
  Here's a sample Surefire output. This was gotten from a fresh checkout
  running mvn install:
 
  -
 
 ---
  Test set: org.neo4j.gis.spatial.TestSpatial
  -
 
 ---
  Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: 1.558
  sec  FAILURE!
  Test Import of billesholm.osm(org.neo4j.gis.spatial.TestSpatial$1)  Time
  elapsed: 1.513 sec   ERROR!
  java.lang.AbstractMethodError
 at
 
 org.neo4j.kernel.KernelExtension$KernelData.loadAll(KernelExtension.java:178)
 at
  org.neo4j.kernel.EmbeddedGraphDbImpl$2.load(EmbeddedGraphDbImpl.java:164)
 at
  org.neo4j.kernel.EmbeddedGraphDbImpl.init(EmbeddedGraphDbImpl.java:169)
 at
 
 org.neo4j.kernel.EmbeddedGraphDatabase.init(EmbeddedGraphDatabase.java:80)
 at
 
 org.neo4j.gis.spatial.Neo4jTestCase.reActivateDatabase(Neo4jTestCase.java:123)
 at
 org.neo4j.gis.spatial.Neo4jTestCase.setUp(Neo4jTestCase.java:88)
 at org.neo4j.gis.spatial.TestSpatial.setUp(TestSpatial.java:255)
 at junit.framework.TestCase.runBare(TestCase.java:132)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:243)
 at junit.framework.TestSuite.run(TestSuite.java:238)
 at
 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
 at
 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59)
 at
 
 org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120)
 at
 
 org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103)
 at org.apache.maven.surefire.Surefire.run(Surefire.java:169)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at
 
 org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350)
 at
 
 org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021)
 
  Test Spatial Index on
  billesholm.osm(org.neo4j.gis.spatial.TestSpatial$2)  Time elapsed: 0.003
  sec   ERROR!
  org.neo4j.graphdb.TransactionFailureException: Could not create data
  source [nioneodb], see nested exception for cause of error
 at
 
 org.neo4j.kernel.impl.transaction.TxModule.registerDataSource(TxModule.java:153)
 at
 org.neo4j.kernel.GraphDbInstance.start(GraphDbInstance.java:106)
 at
  org.neo4j.kernel.EmbeddedGraphDbImpl.init(EmbeddedGraphDbImpl.java:167)
 at
 
 org.neo4j.kernel.EmbeddedGraphDatabase.init(EmbeddedGraphDatabase.java:80)
 at
 
 org.neo4j.gis.spatial.Neo4jTestCase.reActivateDatabase(Neo4jTestCase.java:123)
 at
 

  1   2   3   >