Re: [Neo4j] Question about REST interface concurrency
Does your disk benchmark tests flush the data to disk or just write to it, making file system / OS flush when ever it feel like it (making it much faster, of course)? 2011/4/25 Stephen Roos sr...@careerarcgroup.com: Hi Jim, I took a look at my disk utilization and I'm only getting up to about 9379 KBps (write). My disk benchmarking tests show max write rates to be around 220 MBps, so I shouldn't be maxed out there. Interestingly, I don't see that much data in the graph.db directory (I see about 15 MB there after creating 150k empty nodes, no relationships, no index). The largest file is nioneo_logical.log.1 (14 MB), the next largest is the neostore.nodestore.db (1.3 MB). I don't know if that information is helpful, but I thought it was a bit strange that I'm sustaining disk write rates of 9 MBps for over 40 secs yet I don't have anywhere close to 9 * 40 MB of data. I do wonder about the flush operation though. Flush is a blocking operation, maybe that's the bottleneck even though the disk isn't over utilized. I'll look into that. Let me know if you have any other ideas. Thanks! Stephen -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 22, 2011 3:34 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] New blog post on non-graph stores for graph-y things
On foreign key I think it was a subconscious choice to avoid it, since it has very strong semantics in other data models. I wanted to try to convey the concept of pointers without muddying that with the stricter semantics of foreign keys and referential integrity. Perhaps I'm over-optimistic, but I would like to find some common terminology we could use when describing the differences between different databases types. It really helps people understand a new database if you can compare and contrast the finer details and subtle differences. I have found using the term 'foreign key' effective precisely because it brings to mind the rdbms approach, and helps the user see a mapping between modeling in rdbms and modeling in graphs. But I agree that 'foreign key' brings other aspects that may not be appropriate, and so a more general term would be better. You say 'pointer', but that to my mind is an aspect of a foreign key and a relationship/edge. Perhaps there is no single magic word, and we have to pick and choose to suite the circumstances ;-) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Here at 49 responses, I'd like to reiterate Craigs point earlier that we are really talking about several separate issues, and I'm wondering if we should split this discussion up, because it is getting very hard to follow it. As I see it, we are looking at* *three things: * Paging** **Use case: *UI code that presents a paged, infinite scrolled or similar interface to the user. Peeking at results for debugging or other purposes. * Streaming** **Use case: *Returning huge data sets without killing anyone. * Sorting **Use case: *Presenting lists of things to users, applications that care about the order of results for some other reason. I think we've come to agree that streaming and paging serve similar but different purposes and are not quite able to replace each others functionality. I also took the liberty to elevate sorting to it's own topic, because I believe it should be a generic thing you can do on a result set, whereas paging and streaming are different means of returning a result set. If we want to continue this discussion, would anyone object to splitting it into these three parts? /Jake On Sun, Apr 24, 2011 at 2:18 AM, Rick Bullotta rick.bullo...@thingworx.comwrote: Let's discuss sometime soon. Creating resources that need to be cached or saved in session state bring with them a whole bunch of negative aspects... - Reply message - From: Michael Hunger michael.hun...@neotechnology.com Date: Fri, Apr 22, 2011 10:57 pm Subject: [Neo4j] REST results pagination To: Neo4j user discussions user@lists.neo4j.org I spent some time looking at what others are doing for inspiration. I kind of like the Riak/Basho approach with multipart-chunks and the approach of explictely creating a resource for the query that can be navigated (either via pages or first,next,[prev,last] links) and expires (and could be reconstructed). Cheers Michael Good discussion: http://stackoverflow.com/questions/924472/paging-in-a-rest-collection CouchDB: http://wiki.apache.org/couchdb/HTTP_Document_API startKey + limit, endKey + limit, sorting, insert/update order Mongooese: [cursor-id]+batch_size OrientDB: .../[limit] Sones: no real rest API, but a SQL on top of the graph: http://developers.sones.de/documentation/graph-query-language/select/ with limit, offset, but also depth (for graph) HBase explcitly creates scanners, which can be then access with next operations, and expire after no activity for a certain timeout riak: http://wiki.basho.com/REST-API.html client-id header for client identification - sticky? optional query parameters for including properties, and if to stream the data keys=[true,false,stream] If “keys=stream”, the response will be transferred using chunked-encoding, where each chunk is a JSON object. The first chunk will contain the “props” entry (if props was not set to false). Subsequent chunks will contain individual JSON objects with the “keys” entry containing a sublist of the total keyset (some sublists may be empty). riak seems to support partial json, non closed elements: -d '{props:{n_val:5' returns multiple responses in one go, Content-Type: multipart/mixed; boundary=YinLMzyUR9feB17okMytgKsylvh --YinLMzyUR9feB17okMytgKsylvh Content-Type: application/x-www-form-urlencoded Link: /riak/test; rel=up Etag: 16vic4eU9ny46o4KPiDz1f Last-Modified: Wed, 10 Mar 2010 18:01:06 GMT {bar:baz} (this block can be repeated n times) --YinLMzyUR9feB17okMytgKsylvh-- * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 Query results: Content-Type – always multipart/mixed, with a boundary specified Understanding the response body The response body will always be multipart/mixed, with each chunk representing a single phase of the link-walking query. Each phase will also be encoded in multipart/mixed, with each chunk representing a single object that was found. If no objects were found or “keep” was not set on the phase, no chunks will be present in that phase. Objects inside phase results will include Location headers that can be used to determine bucket and key. In fact, you can treat each object-chunk similarly to a complete response from read object, without the status code. HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.6 (eat around the stinger) Expires: Wed, 10 Mar 2010 20:24:49 GMT Date: Wed, 10 Mar 2010 20:14:49 GMT Content-Type: multipart/mixed; boundary=JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Length: 970 --JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Type: multipart/mixed; boundary=OjZ8Km9J5vbsmxtcn1p48J91cJP --OjZ8Km9J5vbsmxtcn1p48J91cJP Content-Type: application/json Etag: 3pvmY35coyWPxh8mh4uBQC Last-Modified: Wed, 10 Mar 2010 20:14:13 GMT {riak:CAP} --OjZ8Km9J5vbsmxtcn1p48J91cJP-- --JZi8W8pB0Z3nO3odw11GUB4LQCN Content-Type: multipart/mixed; boundary=RJKFlAs9PrdBNfd74HANycvbA8C --RJKFlAs9PrdBNfd74HANycvbA8C Location: /riak/test/doc2 Content-Type: application/json Etag:
Re: [Neo4j] Regarding Sub grouping In Graph DB
2011/4/25 Chris Gioran chris.gio...@neotechnology.com: If I understand you correctly, you want to create groups of your nodes + relationships that exist in your graph. The way you do that depends on some things. 1. What is the degree of logical separation you desire? If two nodes, A and B have a relationship in common, will they always end up in the same category? If not, should the relationship remain?* 2. How do you plan to access those nodes? If you want to find out the category of nodes and relationships as you iterate over your data, then maybe a tagging property would work. If you want direct access to the nodes that belong to a category, an option is a supernode which represents that category and have it relate to every node that belongs to it. Creating a tagging property has the advantage of being indexable and works both on nodes and relationships. A supernode (the quotes are justified since we are talking about a simple node that just receives special semantics, not some inherently different node kind) is a more graphy way of doing things and probably faster too. Example of tagging property: // Add Node resource = graphDb.createNode(); resource.setProperty( category, some category ); categoryIndex.add( resource, category, resource.getProperty( category ) ); // Get for ( Node resource : categoryIndex.get( category, some category ) ) { // Each resource with that category } Example of having the tags as nodes with relationships to its group members: // Add // TODO: check if the category you're creating exists first, // or create all categories up front Node category = categoryIndex.get( name, some category ).getSingle(); Node resource = graphDb.createNode(); resource.createRelationshipTo( category, CATEGORY ); // Get Node category = categoryIndex.get( name, some category ).getSingle(); for ( Relationship rel : category.getRelationships( CATEGORY, Direction.INCOMING ) ) { Node resource = rel.getStartNode(); } Where the latter solution is traversal friendly, whereas the former isn't. I am sure there are other ways and I would also like to see how people categorize entities in their graphs. *What currently cannot be done is role-based hard separation of nodes. Suppose there are two nodes, a and b with a relation a--b. Say a ends up in category A and b in category B and the relationship stays. If someone with access to A accesses a, then she will be able to access b, no matter what. cheers, CG On Mon, Apr 25, 2011 at 9:53 PM, pooja naik npooj...@yahoo.com wrote: Hi Chris, Thanks for a prompt response. I am trying to create a Network Graph Infrastructure for SP 's , in such a way that they can lease out the parts of this network infrastructure to other small companies. This requirement make it necessary to create a logical boundary on the main Network graph , in other words to create a overlay graphs on the underlying physical graph. Does categories would still help me here? Is there any another way to do it? Let me know Thanks n Regards Pooja From: Chris Gioran chris.gio...@neotechnology.com To: Neo4j user discussions user@lists.neo4j.org Sent: Monday, April 25, 2011 11:34 AM Subject: Re: [Neo4j] Regarding Sub grouping In Graph DB Hi Pooja, what would qualify as a subcategory in your use case? The most straightforward thing I can come up with is a property in each node named category or something similar, that has an enumerated value (not a Java Enumeration but something treated as such) that would place the node in its category. This is one of many ways to go ahead, depending on what degrees of separation you want, flexibility etc. Could you elaborate a little on what you are trying to do? cheers, CG On Mon, Apr 25, 2011 at 9:27 PM, pooja naik npooj...@yahoo.com wrote: Hi all, I am using ne04j for a IP network resource graph in my project. I would like to know whether there is a way to divide the graphical network into sub categories in neo4j? Any help or pointers is appreciated. Thanks Pooja ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Lucene/Neo Indexing Question
Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
In addition to what Jake just said about splitting this thread apart, I'd like to bring up what Rick suggested about getting together to thrash this out. Can you guys think about when we might want a skype call for this? We have to take into account timezones to cover from CET through to PST (unless anyone is even further east?). Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Self-referencing relationships
I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Hi, Another point of contention. For Blueprints Sail, Neo4j is not fully compliant with the direct mapping between property graph-RDF graph as it doesn't support self-loops. https://github.com/tinkerpop/blueprints/wiki/Sail-Ouplementation Thanks, Marko. http://markorodriguez.com On Apr 26, 2011, at 10:20 AM, Andreas Kollegger wrote: Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Thanks for the explanation, Andreas. We are modeling the company's infrastructure. Examples of nodes are hosts, switches, consoles, bootboxes, etc. Examples of relationships are uplink, boots, etc. An example of a self-referencing node is a console that also serves as a bootbox for itself. Again, we could certainly model this with a middle node but in our case that feels like a hack in what would've otherwise been a very clean graph exactly mirroring of our real-world setup. Shaunak On Apr 26, 2011, at 9:20 AM, Andreas Kollegger wrote: Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo4jSpatial] Find way/region where count of things is greater/equal/less X
Christoph, you could just update some counter and index on that when you are indexing an entity into a particular bounding box or other area of your map? Do you have any definition of what a spot is? Is that a fixed area? In that case, this approach might work? This would be an approach of updating this index when things are moved/inserted into your structure, but still I think that might work out... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Apr 26, 2011 at 9:45 AM, Christoph K. klaassen.christ...@googlemail.com wrote: Hey guys, i'm back on the track and try to do some things with neo4j Spatial. As the subject of this post tells, i'm looking for a way to find a spot on my OSM map, where more (or less, or equal) than x cars/people/buildings are located. Is there a way to perform this task without traversing my whole map rectangle by rectangle? Use case: i try to find bottlenecks dynamically on my osm map which is determined by an amount x of cars in a region y. Thanks for your help =) Greetings from Munich Christoph ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Hi Shaunak, Interesting domain. So I guess you'd like a relationship which goes: +--console--BOOTBOX_FOR--+ | | +---+ Which I find is an interesting model for your domain. It will match the physical infrastructure well, but even in your email you used the noun a bootbox which to me highlights a key domain entity. And such domain entities deserve their own node. Though this sketch will be invariably wrong since I don't know your domain: my console--BOOTED_BY--my bootbox my machine--HAS_A--console my machine--IS_A--bootbox I think you don't often want a node in the middle, but could be missing a more subtle composite. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Wiki documentation neo4j+restfulie.
Hi Jim, We (Adriano and Jose) wrote the Restfulie example and it is already published on the wiki. Thanks for the attention. It is the basic doc, just like the rest-client one. Thanks 2011/4/19 Jim Webber j...@neotechnology.com Hi José, Please feel free to add to the wiki. We've had a problem with spammers recently, so if you run into permissions problems please shout. Jim On 22 Mar 2011, at 20:19, jdbjun...@gmail.com wrote: Hi, going through the neo4j documentation I found some examples of how access neo4j api using two rest libaries (rest-client, neography). After reading it, I've decided to do the same tests using the library restfulie, which I'm committer. Am I allowed to change the wiki adding the restfulie example? If it is ok, is any one willing to review it before changing the wiki? Thanks, José Donizetti. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Adriano Almeida Caelum | Ensino e Inovação www.caelum.com.br ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Where I've wanted self referencing nodes is when mapping a sequence of user actions: user clicks button A - [then clicks] - button B - [then clicks] - button B (again) - [then clicks] - button C - [then ... Run this over a few 10's of thousands of users (incrementing counts), and a few dozen buttons, and you can start to find things like: This sequence of button presses is the most common path for users through our prouduct. If a user is on button B, it is pretty likely they'll click it again. The group of users who are based in California tend to click buttons on the left side of the screen, while the ones in Massachusetts tend to click buttons on the right side of the screen. There may have been other ways to do this modeling, but the way I implemented it really needed self referencing relationships to count all the [then clicks] relationships, not just the ones that move to new buttons. Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Rick Otten rot...@windfish.net O=='=+ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
Am 26.04.2011 18:20, schrieb Andreas Kollegger: What does you model look like, that you expect to require self-referencing? Oh, I think a loop is just the smallest circle within a graph. So as Neo4j supports graphs instead of just hierarchical trees it would be a natural fit ;) Perhaps it would be best for both sides of the discussion to support a compile time or runtime switch turning loops/selfedges on|off. Just my 2'edges ;) Achim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
I've wanted to do similar tracking of paths within a graph, but am not clear about your approach. Were you creating new relationships between each node directly to represent an event? I suppose you'd have to add the user id and a sequence number into each relationship to keep the tracking distinct and preserver order. Which would work, except for repeats to the same node. Instead of sequence numbers in the relationships, you could just use a sequence of events. I tend to try to keep distinct concepts within the graph separate. Like possible paths (a graph of possibilities) vs actual paths (which are trees because they are a sequence of events). With buttons, I would first create the graph of buttons with possible paths: A - B - C - {wherever} Then track each user traversal by referencing those nodes in sequence: User1 -- event1 event1 -clicked- A event1 -next- event2 event2 -clicked- B event2 -next- event 3 event3 -clicked- B event3 -next- event4 event4 -clicked- C etc... Subsequent users would either increment counts along that path or create new branches when they diverge. Cheers, Andreas On Apr 26, 2011, at 12:57 PM, Rick Otten wrote: Where I've wanted self referencing nodes is when mapping a sequence of user actions: user clicks button A - [then clicks] - button B - [then clicks] - button B (again) - [then clicks] - button C - [then ... Run this over a few 10's of thousands of users (incrementing counts), and a few dozen buttons, and you can start to find things like: This sequence of button presses is the most common path for users through our prouduct. If a user is on button B, it is pretty likely they'll click it again. The group of users who are based in California tend to click buttons on the left side of the screen, while the ones in Massachusetts tend to click buttons on the right side of the screen. There may have been other ways to do this modeling, but the way I implemented it really needed self referencing relationships to count all the [then clicks] relationships, not just the ones that move to new buttons. Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1] http://www.mail-archive.com/user@lists.neo4j.org/msg03996.html [2] https://trac.neo4j.org/ticket/239 [3] https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/impl/core/RelationshipImpl.java#L45 Thank you, shaunak kashyap technical yahoo shau...@yahoo-inc.com direct 408-349-4024mobile 408-203-2450 701 first avenue, sunnyvale, ca, 94089-0703, us phone (408) 349 3300fax (408) 349 3301 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Rick Otten rot...@windfish.net O=='=+ ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org
Re: [Neo4j] REST results pagination
On Fri, 2011-04-22 at 17:43 +0100, Jim Webber wrote: Hi Michael, Just in case we're not talking about the same kind of streaming -- when I think streaming, I think streaming uploads, streaming downloads, etc. I'm thinking chunked transfers. That is the server starts sending a response and then eventually terminates it when the whole response has been sent to the client. Although it seems a bit rude, the client could simply opt to close the connection when it's read enough providing what it has read makes sense. Sometimes document fragments can make sense: In this case we certainly don't have well-formed XML, but some streaming API (e.g. stax) might already have been able to create some local objects on the client side as the Earth and Mars nodes came in. I don't think this is elegant at all, but it might be practical. I've asked Mark Nottingham for his view on this since he's pretty sensible about Web things. Any intermediate proxies would have to cache the whole thing; many proxies are not designed for streaming responses so might read the whole thing before relaying it (although they seem to be getting a bit better at this with video over http). So the server would probably end up generating the whole thing if there was a proxy in the path. I think its workable, but not sure it is ideal... Justin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Hi, Mattias. Here's a use case: I have a million nodes representing cars, and those nodes are all tagged with some value, let's say a color name, as a property. I have indexed those nodes on the color property value. Now I'd like to present a list of the distinct color values with which nodes (cars) have been tagged. At present, I'd need to iterate through all million, read the property, and maintain a distinct HashSet as I iterate through them. I've tried using relationships from the car node(s) to a set of color node(s), but had scalability/performance issues when there are lots of car nodes being added/deleted (the color node quickly becomes a hot spot/synchronization choke point). Rick -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Mattias Persson Sent: Tuesday, April 26, 2011 2:17 PM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Hi Rick, No, not really. What the use case for having such a method? 2011/4/26 Rick Bullotta rick.bullo...@thingworx.com: Hi, all. Is there a method or suggested approach for obtaining a list of all of the distinct key values in a given index? I don't care about the indexed nodes or relationships themselves, just the value(s) of the key. Thanks, Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Self-referencing relationships
In my case I wasn't looking to retrace a specific user's path, but rather trying to identify popular paths. It was 'quick and easy' to do with a graph model (except for some confusion about counting self referencing relationships). I was, in fact, adding user nodes so I could see the types of things certain groups of users did (within my data sample), but that was simply leveraging the graph to answer two classes of questions at the same time, rather than directly relevant to the popular path problem. I think I also created event 1, event 2, nodes and linked them to the button nodes. These didn't really tell me if someone was likely to jump from button B to button E (since they could get to button B at any event number) but they did tell me whether certain buttons were more likely to be selected early in the user experience or later. There were lots of graph based explorations possible in that quick study. I barely cracked the surface in the time I had to work on it... I've wanted to do similar tracking of paths within a graph, but am not clear about your approach. Were you creating new relationships between each node directly to represent an event? I suppose you'd have to add the user id and a sequence number into each relationship to keep the tracking distinct and preserver order. Which would work, except for repeats to the same node. Instead of sequence numbers in the relationships, you could just use a sequence of events. I tend to try to keep distinct concepts within the graph separate. Like possible paths (a graph of possibilities) vs actual paths (which are trees because they are a sequence of events). With buttons, I would first create the graph of buttons with possible paths: A - B - C - {wherever} Then track each user traversal by referencing those nodes in sequence: User1 -- event1 event1 -clicked- A event1 -next- event2 event2 -clicked- B event2 -next- event 3 event3 -clicked- B event3 -next- event4 event4 -clicked- C etc... Subsequent users would either increment counts along that path or create new branches when they diverge. Cheers, Andreas On Apr 26, 2011, at 12:57 PM, Rick Otten wrote: Where I've wanted self referencing nodes is when mapping a sequence of user actions: user clicks button A - [then clicks] - button B - [then clicks] - button B (again) - [then clicks] - button C - [then ... Run this over a few 10's of thousands of users (incrementing counts), and a few dozen buttons, and you can start to find things like: This sequence of button presses is the most common path for users through our prouduct. If a user is on button B, it is pretty likely they'll click it again. The group of users who are based in California tend to click buttons on the left side of the screen, while the ones in Massachusetts tend to click buttons on the right side of the screen. There may have been other ways to do this modeling, but the way I implemented it really needed self referencing relationships to count all the [then clicks] relationships, not just the ones that move to new buttons. Hi Shaunak, As you've noticed, self-referencing nodes have been considered before, and I remember being perplexed by the lack when I first became a Neo4j user. Changing the support is simple enough, but there was obviously a conscious design decision. Why? Anecdotally (and wiser, longer memoried minds should correct me), self-referencing nodes lead to more trouble than they're worth. So they're considered an error because there is more value in being alerted that you just related a node to itself, then there is value in the few cases where you absolutely must have them. Being a database, the decision has been to err on the side of avoiding problems, even at the cost of some convenience. An approach that could allow for intentional self-referencing, while still protecting against accidental self-references, would be implement an explicit Node.relateToSelfAs(RelationshipType type). What does you model look like, that you expect to require self-referencing? Best, Andreas On Apr 26, 2011, at 11:15 AM, Shaunak Kashyap wrote: I know this topic has been discussed before[1] and that a trac issue was also created[2]. I see that a patch was submitted as part of the issue but from browsing the source code[3] it appears that self-referencing relationships are still a no-go in Neo4J. Are there any plans to apply the patch submitted by tobias OR, in general, to provide for self-referencing relationships in Neo4J? We are evaluating Neo4J against OrientDB for an internal tool at Yahoo. While Neo4J looks like a far more mature product overall, the lack of self-referencing relationships might become a sticking point (yes, I understand they can be worked around with a middle node but that requires the application to handle special cases which we would like to avoid). References: [1]
[Neo4j] Paging of REST results
Just thought I'd weigh in on the paging of REST results. It's essential for my app and is unfortunately forcing me to stick with mysql for part of the app. I hope a couple of concrete examples will help. 1) Drop down AJAX type-ahead showing the first 4 results of searching for a someone's name. 2m names total as nodes. This is working ok with mysql because I can use LIMIT so over the wire I only send back 4 results. If the user types David there are over 7000 nodes that match. Eliminates the possibility of using the REST query api. A deeper concern is the Index api for Neo4j does not expose the Lucene IndexSearch fields that would allow something giving an offset when retrieving a document. I.e. Document doc = hits.doc(offset); If it did, I would be tempted to write my own plugin, but it seems in this case I would have to extend Index For the final example, once the user sees the dropdown, it's highly unlikely that their David is in the top four results from the AJAX type-ahead, so there is an option to click for more results. That brings them to a -- MIKAMAI | Making Media Social http://mikamai.com +447868260229 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Paging of REST results
Oops, hit send too early. To finish up: once the user clicks for more results, they get something very similar to Google search results, and they can scroll through the pages. Again, very easy with mysql using LIMIT. Would really love to eliminate the mysql implementation since it takes up a lot of memory on the server. I was a bit disappointed in the speed of the Index searching. The mysql fulltext searches are usually subsecond, but with Lucene I was sometimes seeing over 10 seconds to come back with results. Maybe because it has to create so many Nodes + sending it over the wire? Any suggestions on improving the speed or am I stuck with mysql until paging is implemented in the REST API? Thanks, Todd On Tue, Apr 26, 2011 at 9:41 PM, Todd Chaffee t...@mikamai.com wrote: Just thought I'd weigh in on the paging of REST results. It's essential for my app and is unfortunately forcing me to stick with mysql for part of the app. I hope a couple of concrete examples will help. 1) Drop down AJAX type-ahead showing the first 4 results of searching for a someone's name. 2m names total as nodes. This is working ok with mysql because I can use LIMIT so over the wire I only send back 4 results. If the user types David there are over 7000 nodes that match. Eliminates the possibility of using the REST query api. A deeper concern is the Index api for Neo4j does not expose the Lucene IndexSearch fields that would allow something giving an offset when retrieving a document. I.e. Document doc = hits.doc(offset); If it did, I would be tempted to write my own plugin, but it seems in this case I would have to extend Index For the final example, once the user sees the dropdown, it's highly unlikely that their David is in the top four results from the AJAX type-ahead, so there is an option to click for more results. That brings them to a -- MIKAMAI | Making Media Social http://mikamai.com +447868260229 -- MIKAMAI | Making Media Social http://mikamai.com +447868260229 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Jim, From what I understand, it flushes with various granularities, though I'd suspect that it's not flushing after writes the size of empty nodes, so this is certainly a possible bottleneck point. I've been looking through the code and don't see exactly where the flush takes place. Can you point me at the right class? I did come across the PersistenceWindowPool class which seems to come into play when the underlying node record is updated during the transaction commit. It looks as if the windows are mapped over contiguous blocks of the primitives ID space and that because the new node IDs are typically sequential, each of my create-node operations is likely to target the same window. These windows are locked and waiting threads are queued up to wait for the locking thread to notify on unlock. Am I reading the code correctly? If so, do you have any thoughts on how we might remove that bottleneck? Thanks again for your help, Stephen -Original Message- From: Mattias Persson [mailto:matt...@neotechnology.com] Sent: Tuesday, April 26, 2011 12:19 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Does your disk benchmark tests flush the data to disk or just write to it, making file system / OS flush when ever it feel like it (making it much faster, of course)? 2011/4/25 Stephen Roos sr...@careerarcgroup.com: Hi Jim, I took a look at my disk utilization and I'm only getting up to about 9379 KBps (write). My disk benchmarking tests show max write rates to be around 220 MBps, so I shouldn't be maxed out there. Interestingly, I don't see that much data in the graph.db directory (I see about 15 MB there after creating 150k empty nodes, no relationships, no index). The largest file is nioneo_logical.log.1 (14 MB), the next largest is the neostore.nodestore.db (1.3 MB). I don't know if that information is helpful, but I thought it was a bit strange that I'm sustaining disk write rates of 9 MBps for over 40 secs yet I don't have anywhere close to 9 * 40 MB of data. I do wonder about the flush operation though. Flush is a blocking operation, maybe that's the bottleneck even though the disk isn't over utilized. I'll look into that. Let me know if you have any other ideas. Thanks! Stephen -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 22, 2011 3:34 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user