Re: [Neo4j] traversal framework question

2010-12-07 Thread Mattias Persson
Hi Joshi,

the problem may be that your traversal description will traverse from
actor to director (in both directions) and also from director to
actors (also in both directions).

Your "manual" traverser traverses actors to directors in both
directions and then only incoming relationships from director to
actor. So there's a difference there.

In your case it may be better to do the manual thing, or instead have
two relationship types, actor_worked_with_director and
director_worked_with_actor so that you can specify a more granular
traversal description with:

actor_worked_with_director: BOTH
director_worked_with_actor: INCOMING

2010/12/7, Joshi Hemant - hjoshi :
> I have graph of 2 types of nodes : actors and directors. The graph is
> constructed such that an actor may have worked with multiple directors
> and the director may have worked with different actors. The objective is
> to find the list of actors (sorted by frequency) who have shared the
> most number of directors (for the given Actor1). The traversal
> description I wrote looks like :
> Actor1 --> Director1 <-- Actor2
> Actor1 --> Director2 <--Actor2
> Actor1 --> Director3 <-- Actor2
> Actor1 --> Director4 <-- Actor3
> ... and so on
>
> for(Node otherActorNode : Traversal.description().relationships(
> MyRelationshipTypes.WORKED_WITH,Direction.BOTH)
>   .breadthFirst().uniqueness(Uniqueness.NODE_PATH)
>   .prune( Traversal.pruneAfterDepth( 2 ) )
>   .traverse(givenActorNode).nodes()){
>   //Keep frequency updated for otherActorNode
> }
> // Display sorted list of otherActorNode that have worked with common
> director
>
> The problem is that I am getting higher (incorrect) scores of the shared
> number of unique directors for a given 2 actors. I used node_path
> uniqueness to make sure that we do not traverse same path twice.
> Instead of one traverser call if I split it into 2 calls:
> 1. Get all directors the givenActorNode has worked_with
> 2. For all director nodes, get incoming worked_with relationship and
> count frequencies (except givenActorNode)
> I get the correct results:
>
> Am I missing in the single traversal description above?
> -Hemant
>
> ***
> The information contained in this communication is confidential, is
> intended only for the use of the recipient named above, and may be legally
> privileged.
>
> If the reader of this message is not the intended recipient, you are
> hereby notified that any dissemination, distribution or copying of this
> communication is strictly prohibited.
>
> If you have received this communication in error, please resend this
> communication to the sender and delete the original message or any copy
> of it from your computer system.
>
> Thank You.
> 
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Mattias Persson
2010/12/7, Andres Taylor :
> On Tue, Dec 7, 2010 at 1:54 PM, Mattias Persson
> wrote:
>
>
>> The (new) integrated index API cleans up deleted nodes/relationships that
>> are left behind automatically and lazily (at least the lucene impl does)
>> so
>> no worries for the index part at least.
>>
>
> Unless you are unlucky, and the index points to a node id that's been
> reused. In which case you'll get false positives, and very hard to
> find/reproduce bugs. I think a more general auto-index solution is needed.
> Don't you agree?
>

Yes, you're right, there's always that risk. Auto-indexing requires a
meta model, sort of... I think that's why there isn't a generic
auto-indexer already. Sure, there is a meta-model component, but it
hasn't received any love in a long time.

> Andrés
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cats and Dogs, living together

2010-12-07 Thread Javier de la Rosa
On Tue, Dec 7, 2010 at 10:39, Peter Neubauer
 wrote:
> There is a strong desire for having some graph visualization in the
> Neo4j Admin console, so - if you think it is interesting, I think
> there might be a strong case for the projects working together on the
> visualization component. I don't have the timeframe laid out yet but
> Neo Technology can dedicate resources to it early next year.
>
> Would that make sense to you?

Of course yes. We're not very familiar using Java, but I think it's
not needed, because the most of the code in the web admin console
would be Javascript. The only thing to decide is the format of data to
interchange between the server and the browser to paint the graph and
the properties.

Anyways, we will keep working on Sylva and its visual component,
although in a slower way than we would like.
For whatever, I'm always available at my e-mail.

>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Tue, Dec 7, 2010 at 3:44 PM, Javier de la Rosa  wrote:
>> On Mon, Dec 6, 2010 at 16:57, Peter Neubauer
>>  wrote:
>>> Very very cool Javier!
>>
>> Thank you :)
>>
>>> Is this built using the Neo4j Python bindings or pure REST? Also, is
>>> there a public website available to refer to?
>>
>> By now we are using only REST, but the performance is not what we
>> expected when we process large nodes returned by a traversal (so much
>> HTTP requests). So we are now evaluating whether using de Python
>> binding or building a Java socket server and a Python socket client
>> could be better. When you make a traversal or use the indices, the
>> REST API returns the URLs of each of the nodes returned, so we need
>> make one HTTP request more per node. It would be great if we could
>> send an optional param to make the server returned all properties.
>>
>>> Another question - regarding visualisation, what was your experience
>>> of the best performing lib for JavaScript out there regarding large
>>> amount of nodes and relationships to render, and adaptability for UI?
>>> Currently, it seems there is
>>>
>>> - TheJIT
>>> - Processing.js
>>> - Graphdracula
>>
>> TheJIT was our first approach, but with large datasets the behaviour
>> is not very fast. Besides, the interaction ways are a bit limited and
>> hard to expand.
>> Processing.js is, with no doubt, the most promising solution. We were
>> happy using Porcessing.js, but you need build all you need to
>> represent graphs, nodes and edges. It's very low level programation
>> and by now the browsers can't with it, therefore we used a mixed
>> version between Processing.js in the browser and NetworkX in the
>> server side for some calculations of layout and etc.
>> Graphdracula was an inspiration for us. It's very beautiful, but it's
>> also very incomplete yet. I guess in the near future will be very
>> useful, but we need total control of all happens in the UI. So, now we
>> are using Räphael, the core library of Graphdracula, and we are
>> implementing several layout algorithms in Javascript and some ways to
>> interact with nodes to expand the graph by browsing.
>>
>> I hope to setup a Sylva test site soon, then I will e-mail to this
>> list, if that's alright with you.
>>
>> Best regards.
>>
>>>
>>>
>>> Cheers,
>>>
>>> /peter neubauer
>>>
>>> GTalk:      neubauer.peter
>>> Skype       peter.neubauer
>>> Phone       +46 704 106975
>>> LinkedIn   http://www.linkedin.com/in/neubauer
>>> Twitter      http://twitter.com/peterneubauer
>>>
>>> http://www.neo4j.org               - Your high performance graph database.
>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>>
>>>
>>>
>>> On Thu, Dec 2, 2010 at 6:03 PM, Javier de la Rosa  wrote:
 Hi, everybody,

 I work in a lab at University of Western Ontario, with humanists
 people. The needs of the humanist research make useless the current
 SQL databases because it's hard to change the schemas or build queries
 with several JOINs. So, we are developing a system which mixes a
 relational database and Neo4j.

 Sylva [1], as it's called (previously Graphgamel), stores all data in
 the Neo4j database as a graph. On the other hand, de multimedia files
 (image, video and audio files) are stores using the relational one
 (over Django). Besides, the relational part allows the definition of
 lazy and dynamic schemas, very usefull to model the world from the
 humanistic point of view. The users can create nodes and relationships
 but according to a certain kind of integrity defined in the schema.

 Sylva also has a very early version of visualization through Raphäel
 and Processing.js.
 Her

Re: [Neo4j] Performance DB with large datasets

2010-12-07 Thread Marius Kubatz
Hi,

there is still no difference in the performance, which is somewhat disturbing.
I cant't see the allocation of nioneo memory mapping in the java
process at all. It goes up to the heap size and then stops there.

Marius


2010/12/7 Marius Kubatz :
> Hi Peter,
>
> thank you very much for your quick reply, unfortunately there is no
> messages.log, seems I have an older db version.
>
> I'm sending you the ls dump from the directory:
> total 5318580
> 11  active_tx_log
> 4096  lucene
> 4096  lucene-fulltext
> 27  neostore
> 9  neostore.id
> 34954011  neostore.nodestore.db
> 9  neostore.nodestore.db.id
> 1917225350 2 neostore.propertystore.db
> 133  neostore.propertystore.db.arrays
> 9  neostore.propertystore.db.arrays.id
> 190425  neostore.propertystore.db.id
> 10485  neostore.propertystore.db.index
> 9  neostore.propertystore.db.index.id
> 10449 neostore.propertystore.db.index.keys
> 9  neostore.propertystore.db.index.keys.id
> 2047597776  neostore.propertystore.db.strings
> 30790905  neostore.propertystore.db.strings.id
> 901093347  neostore.relationshipstore.db
> 149433  neostore.relationshipstore.db.id
> 20  neostore.relationshiptypestore.db
> 9  neostore.relationshiptypestore.db.id
> 215  neostore.relationshiptypestore.db.names
> 9  neostore.relationshiptypestore.db.names.id
> 2097160 nioneo_logical.log.1
> 4  nioneo_logical.log.active
> 88  tm_tx_log.1
> 29365  tm_tx_log.2
>
> I have 3.848.862 nodes and 53.355.402 relationships in my graph.
>
> thus I created the following neo4j props file:
> neostore.nodestore.db.mapped_memory=30M
> neostore.relationshipstore.db.mapped_memory=1685M
> neostore.propertystore.db.mapped_memory=1000M
> neostore.propertystore.db.strings.mapped_memory=1000M
> neostore.propertystore.db.arrays.mapped_memory=0M
>
> I have 8GB Ram and gave JavaVM is running with -Xmx2048m , and the
> mapping should consume 4GB.
> Just started the experiment again the first run is traversing a
> neighborhood of : 124 nodes and 2.279.166 edges,
> so I'm very curious how this will end :)
>
> Thanks for your help!
>
> Regards
>
> Marius
>



-- 
"Programs must be written for people to read, and only incidentally
for machines to execute."

- Abelson & Sussman, SICP, preface to the first edition
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Andres Taylor
On Tue, Dec 7, 2010 at 1:54 PM, Mattias Persson
wrote:


> The (new) integrated index API cleans up deleted nodes/relationships that
> are left behind automatically and lazily (at least the lucene impl does) so
> no worries for the index part at least.
>

Unless you are unlucky, and the index points to a node id that's been
reused. In which case you'll get false positives, and very hard to
find/reproduce bugs. I think a more general auto-index solution is needed.
Don't you agree?

Andrés
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Adam Mendlik
I have the same requirements on a project I'm working on. What I did was use
an abstract wrapper class for Node that helps with deletion.

protected final Node underlyingNode;
protected boolean isDeleted;

public boolean isDeleted() {
 return isDeleted;
}

public void delete() {
Iterable rels = this.underlyingNode.getRelationships();
for ( Relationship rel : rels)
{
  rel.delete();
}
this.underlyingNode.delete();
this.isDeleted = true;
}
I agree that it would be nice to have similar functionality right in the
API, at least for the forced deletion operation.
-Adam
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] traversal framework question

2010-12-07 Thread Joshi Hemant - hjoshi
I have graph of 2 types of nodes : actors and directors. The graph is
constructed such that an actor may have worked with multiple directors
and the director may have worked with different actors. The objective is
to find the list of actors (sorted by frequency) who have shared the
most number of directors (for the given Actor1). The traversal
description I wrote looks like :
Actor1 --> Director1 <-- Actor2
Actor1 --> Director2 <--Actor2
Actor1 --> Director3 <-- Actor2
Actor1 --> Director4 <-- Actor3
... and so on

for(Node otherActorNode : Traversal.description().relationships(
MyRelationshipTypes.WORKED_WITH,Direction.BOTH)
.breadthFirst().uniqueness(Uniqueness.NODE_PATH)
.prune( Traversal.pruneAfterDepth( 2 ) )
.traverse(givenActorNode).nodes()){
//Keep frequency updated for otherActorNode 
}
// Display sorted list of otherActorNode that have worked with common
director

The problem is that I am getting higher (incorrect) scores of the shared
number of unique directors for a given 2 actors. I used node_path
uniqueness to make sure that we do not traverse same path twice.
Instead of one traverser call if I split it into 2 calls:
1. Get all directors the givenActorNode has worked_with
2. For all director nodes, get incoming worked_with relationship and
count frequencies (except givenActorNode)
I get the correct results:

Am I missing in the single traversal description above? 
-Hemant

***
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.


___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Cats and Dogs, living together

2010-12-07 Thread Peter Neubauer
Javier,
thanks for the feedback!

There is a strong desire for having some graph visualization in the
Neo4j Admin console, so - if you think it is interesting, I think
there might be a strong case for the projects working together on the
visualization component. I don't have the timeframe laid out yet but
Neo Technology can dedicate resources to it early next year.

Would that make sense to you?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Dec 7, 2010 at 3:44 PM, Javier de la Rosa  wrote:
> On Mon, Dec 6, 2010 at 16:57, Peter Neubauer
>  wrote:
>> Very very cool Javier!
>
> Thank you :)
>
>> Is this built using the Neo4j Python bindings or pure REST? Also, is
>> there a public website available to refer to?
>
> By now we are using only REST, but the performance is not what we
> expected when we process large nodes returned by a traversal (so much
> HTTP requests). So we are now evaluating whether using de Python
> binding or building a Java socket server and a Python socket client
> could be better. When you make a traversal or use the indices, the
> REST API returns the URLs of each of the nodes returned, so we need
> make one HTTP request more per node. It would be great if we could
> send an optional param to make the server returned all properties.
>
>> Another question - regarding visualisation, what was your experience
>> of the best performing lib for JavaScript out there regarding large
>> amount of nodes and relationships to render, and adaptability for UI?
>> Currently, it seems there is
>>
>> - TheJIT
>> - Processing.js
>> - Graphdracula
>
> TheJIT was our first approach, but with large datasets the behaviour
> is not very fast. Besides, the interaction ways are a bit limited and
> hard to expand.
> Processing.js is, with no doubt, the most promising solution. We were
> happy using Porcessing.js, but you need build all you need to
> represent graphs, nodes and edges. It's very low level programation
> and by now the browsers can't with it, therefore we used a mixed
> version between Processing.js in the browser and NetworkX in the
> server side for some calculations of layout and etc.
> Graphdracula was an inspiration for us. It's very beautiful, but it's
> also very incomplete yet. I guess in the near future will be very
> useful, but we need total control of all happens in the UI. So, now we
> are using Räphael, the core library of Graphdracula, and we are
> implementing several layout algorithms in Javascript and some ways to
> interact with nodes to expand the graph by browsing.
>
> I hope to setup a Sylva test site soon, then I will e-mail to this
> list, if that's alright with you.
>
> Best regards.
>
>>
>>
>> Cheers,
>>
>> /peter neubauer
>>
>> GTalk:      neubauer.peter
>> Skype       peter.neubauer
>> Phone       +46 704 106975
>> LinkedIn   http://www.linkedin.com/in/neubauer
>> Twitter      http://twitter.com/peterneubauer
>>
>> http://www.neo4j.org               - Your high performance graph database.
>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>
>>
>>
>> On Thu, Dec 2, 2010 at 6:03 PM, Javier de la Rosa  wrote:
>>> Hi, everybody,
>>>
>>> I work in a lab at University of Western Ontario, with humanists
>>> people. The needs of the humanist research make useless the current
>>> SQL databases because it's hard to change the schemas or build queries
>>> with several JOINs. So, we are developing a system which mixes a
>>> relational database and Neo4j.
>>>
>>> Sylva [1], as it's called (previously Graphgamel), stores all data in
>>> the Neo4j database as a graph. On the other hand, de multimedia files
>>> (image, video and audio files) are stores using the relational one
>>> (over Django). Besides, the relational part allows the definition of
>>> lazy and dynamic schemas, very usefull to model the world from the
>>> humanistic point of view. The users can create nodes and relationships
>>> but according to a certain kind of integrity defined in the schema.
>>>
>>> Sylva also has a very early version of visualization through Raphäel
>>> and Processing.js.
>>> Here you can see some screenshots [2, 3, 4] and a video demo [5]. Our
>>> goal is to adapt the django-qbe project [6] to our schema tool in
>>> order to produce Gremlin queries in a visual way.
>>>
>>> But by now we are using the Neo4 REST component but it's not very fast
>>> and it has some limitations.
>>>
>>> It's an alpha version, but it goes without saying :-)
>>>
>>> [1] https://github.com/escalant3/graphgamel
>>> [2] http://dl.dropbox.com/u/2630535/sylva.png
>>> [3] http://dl.dropbox.com/u/2630535/plexigraph.png
>>> [4] http://dl.dropbox.com/u/2630535/grafo.png
>>> [5] http://www.youtub

Re: [Neo4j] Cats and Dogs, living together

2010-12-07 Thread Javier de la Rosa
On Mon, Dec 6, 2010 at 16:57, Peter Neubauer
 wrote:
> Very very cool Javier!

Thank you :)

> Is this built using the Neo4j Python bindings or pure REST? Also, is
> there a public website available to refer to?

By now we are using only REST, but the performance is not what we
expected when we process large nodes returned by a traversal (so much
HTTP requests). So we are now evaluating whether using de Python
binding or building a Java socket server and a Python socket client
could be better. When you make a traversal or use the indices, the
REST API returns the URLs of each of the nodes returned, so we need
make one HTTP request more per node. It would be great if we could
send an optional param to make the server returned all properties.

> Another question - regarding visualisation, what was your experience
> of the best performing lib for JavaScript out there regarding large
> amount of nodes and relationships to render, and adaptability for UI?
> Currently, it seems there is
>
> - TheJIT
> - Processing.js
> - Graphdracula

TheJIT was our first approach, but with large datasets the behaviour
is not very fast. Besides, the interaction ways are a bit limited and
hard to expand.
Processing.js is, with no doubt, the most promising solution. We were
happy using Porcessing.js, but you need build all you need to
represent graphs, nodes and edges. It's very low level programation
and by now the browsers can't with it, therefore we used a mixed
version between Processing.js in the browser and NetworkX in the
server side for some calculations of layout and etc.
Graphdracula was an inspiration for us. It's very beautiful, but it's
also very incomplete yet. I guess in the near future will be very
useful, but we need total control of all happens in the UI. So, now we
are using Räphael, the core library of Graphdracula, and we are
implementing several layout algorithms in Javascript and some ways to
interact with nodes to expand the graph by browsing.

I hope to setup a Sylva test site soon, then I will e-mail to this
list, if that's alright with you.

Best regards.

>
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Thu, Dec 2, 2010 at 6:03 PM, Javier de la Rosa  wrote:
>> Hi, everybody,
>>
>> I work in a lab at University of Western Ontario, with humanists
>> people. The needs of the humanist research make useless the current
>> SQL databases because it's hard to change the schemas or build queries
>> with several JOINs. So, we are developing a system which mixes a
>> relational database and Neo4j.
>>
>> Sylva [1], as it's called (previously Graphgamel), stores all data in
>> the Neo4j database as a graph. On the other hand, de multimedia files
>> (image, video and audio files) are stores using the relational one
>> (over Django). Besides, the relational part allows the definition of
>> lazy and dynamic schemas, very usefull to model the world from the
>> humanistic point of view. The users can create nodes and relationships
>> but according to a certain kind of integrity defined in the schema.
>>
>> Sylva also has a very early version of visualization through Raphäel
>> and Processing.js.
>> Here you can see some screenshots [2, 3, 4] and a video demo [5]. Our
>> goal is to adapt the django-qbe project [6] to our schema tool in
>> order to produce Gremlin queries in a visual way.
>>
>> But by now we are using the Neo4 REST component but it's not very fast
>> and it has some limitations.
>>
>> It's an alpha version, but it goes without saying :-)
>>
>> [1] https://github.com/escalant3/graphgamel
>> [2] http://dl.dropbox.com/u/2630535/sylva.png
>> [3] http://dl.dropbox.com/u/2630535/plexigraph.png
>> [4] http://dl.dropbox.com/u/2630535/grafo.png
>> [5] http://www.youtube.com/watch?v=r04eV7vghfs (sorry, not subtitles
>> or audio yet)
>> [6] http://versae.github.com/qbe/
>>
>> On Wed, Dec 1, 2010 at 12:52, Andreas Kollegger
>>  wrote:
>>> Would anybody be willing to share experiences with trying to introduce 
>>> Neo4j into a system with another relational (or other NoSQL) database?
>>>
>>> We're starting to think about best practices for integration:
>>> * Hybrid data-modeling: what goes where?
>>> * XA transactions
>>> * message queues for data distribution
>>> * data migration strategies
>>>
>>> Any problems or feature-requests related to living in a 
>>> multi-storage-platform world are welcome.
>>>
>>> Cheers,
>>> Andreas
>>>
>>>
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>
>>
>>
>> --
>> Javier de la Rosa
>> http://versae.es
>> __

Re: [Neo4j] Problem with the lucene index

2010-12-07 Thread Javier de la Rosa
On Tue, Dec 7, 2010 at 07:10, Peter Neubauer
 wrote:
> Hi there,
> while Matthias is right in that the current REST API is exposing the
> old Index API, for the next Milestone we are working on server
> extensions which can execute Java and even Script code against the
> running Server database instance and return both node, path,
> relationship and custom represenations back. This will enable you (or
> us) to write an extension that actually does a lookup and returns the
> matching nodes, so you don't have to wait until the API is fixed. We
> are going to get this into 1.2 M06, so it is only 1 week hence :)

Great! Does this mean we will see Lucene index in the REST API?


-- 
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance DB with large datasets

2010-12-07 Thread Marius Kubatz
Hi Peter,

thank you very much for your quick reply, unfortunately there is no
messages.log, seems I have an older db version.

I'm sending you the ls dump from the directory:
total 5318580
11  active_tx_log
4096  lucene
4096  lucene-fulltext
27  neostore
9  neostore.id
34954011  neostore.nodestore.db
9  neostore.nodestore.db.id
1917225350 2 neostore.propertystore.db
133  neostore.propertystore.db.arrays
9  neostore.propertystore.db.arrays.id
190425  neostore.propertystore.db.id
10485  neostore.propertystore.db.index
9  neostore.propertystore.db.index.id
10449 neostore.propertystore.db.index.keys
9  neostore.propertystore.db.index.keys.id
2047597776  neostore.propertystore.db.strings
30790905  neostore.propertystore.db.strings.id
901093347  neostore.relationshipstore.db
149433  neostore.relationshipstore.db.id
20  neostore.relationshiptypestore.db
9  neostore.relationshiptypestore.db.id
215  neostore.relationshiptypestore.db.names
9  neostore.relationshiptypestore.db.names.id
2097160 nioneo_logical.log.1
4  nioneo_logical.log.active
88  tm_tx_log.1
29365  tm_tx_log.2

I have 3.848.862 nodes and 53.355.402 relationships in my graph.

thus I created the following neo4j props file:
neostore.nodestore.db.mapped_memory=30M
neostore.relationshipstore.db.mapped_memory=1685M
neostore.propertystore.db.mapped_memory=1000M
neostore.propertystore.db.strings.mapped_memory=1000M
neostore.propertystore.db.arrays.mapped_memory=0M

I have 8GB Ram and gave JavaVM is running with -Xmx2048m , and the
mapping should consume 4GB.
Just started the experiment again the first run is traversing a
neighborhood of : 124 nodes and 2.279.166 edges,
so I'm very curious how this will end :)

Thanks for your help!

Regards

Marius
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Chris Gioran
On Tue, Dec 7, 2010 at 3:09 PM, Andreas Ronge  wrote:
> An Node#isDeleted() method would also be fine.

The way I see it, there are two concerns here.
The first is focused at the lower levels, where the
WriteTransaction/LockReleaser discover an illegal operation - deletion
of an already deleted primitive. This is a hard error and at their
level it should throw an exception and of course set the tx to
rollback only. This is mainly an engineering decision.
The second is the user level where either the same logic should apply
or a check should be made first to make sure things don't go downhill.
Obviously the current approach is the former.
Having just a isDeleted() method is kind of awkward because it would
litter the code with if statements and things would be even worse with
(checked) exceptions. Maybe stealing a bit off the id would be a
better solution and have the NodeImpl/NodeProxy objects do the check
internally. BTW, I think that from a user perspective with the current
kernel such a wrapper object (with a boolean field possibly) would be
the best approach, minimizing the bookkeeping in "business logic"
code.

What I find more interesting to discuss are the semantics of
operations on primitives. At the moment there is no standard to adhere
to and in that respect there is a decision to be made. What I mean is:
what is the proper thing to do, conceptually, when doing basic
primitive manipulations. Since there is an effort to standardize a
graph traversal algebra, a similar thing should be done on a data
definition level, with rationalization and detailed description of
what is the Right Thing (TM) to do when, for instance, one deletes a
Node from a graph, regardless of implementation. Obviously my thinking
is influenced from the relational model, where there are hard
constraints on different things - primary keys are an obvious example
here. In that case, the proper thing to do was to make it propagate a
hard error all the way up and all implementations do exactly that. In
this way, behavior is standardized for all common operations. Should
graph databases, beginning with Neo, undergo a similar process? Such
an effort would give definite answers to most such problems, for
example the "forced"/cascading deletion issue mentioned before.

On the other hand, maybe I am overthinking this.

cheers,
CG
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Andreas Ronge
An Node#isDeleted() method would also be fine.


On Tue, Dec 7, 2010 at 1:54 PM, Mattias Persson
 wrote:
> 2010/12/7 Andreas Ronge 
>
>> Hi
>>
>> I want to avoid keeping track if a node has been deleted or not.
>> How can I implemented this ?
>>
>> I tried to simply catch the exception but then I can't commit the
>> transaction.
>>
>>  Node node = db.createNode()
>>  try {
>>    node.delete()
>>    node.delete()
>>  } catch { }
>>  tx.success
>>  tx.finish  // BANG - org.neo4j.graphdb.TransactionFailureException:
>> Unable to commit transaction
>>
>> The same applies for deleting relationships.
>>
>
> Maybe it could be allowed to delete a node more than once in the same
> transaction... I don't know what drawbacks that would have?
>
>
>>
>> Also, it would be great if there was a force parameter on the delete
>> method.
>>
>> I would prefer a boolean if the delete was successful or not instead
>> of an Exception
>> (same for with node.getSingleRelationship() and maybe other methods)
>>
>> In saw on the list that other people also have requested a similar
>> feature, Alexandru Popescu:
>> >2. I was surprised to see a `Node`.delete() failing. The reason was it
>> >had relationships. I think adding a method `Node`.delete(boolean
>> >force) would
>> >make code much easier. The method would automatically:
>> >
>> >- remove all relationships
>>
>
> This one has been discussed since the dawn of time. It can potentially have
> unexpected side effects on your graph by deleting relationships it maybe
> wasn't aware of was connected to it. But if that's what (a lot of) people
> want then I can't say it shouldn't be there.
>
>
>> >- clean up indexes
>> >
>>
> The (new) integrated index API cleans up deleted nodes/relationships that
> are left behind automatically and lazily (at least the lucene impl does) so
> no worries for the index part at least. It does so by just slipping in
> delete commands for such stray entities into future write transactions.
>
>>
>> /Andreas
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Mattias Persson
2010/12/7 Andreas Ronge 

> Hi
>
> I want to avoid keeping track if a node has been deleted or not.
> How can I implemented this ?
>
> I tried to simply catch the exception but then I can't commit the
> transaction.
>
>  Node node = db.createNode()
>  try {
>node.delete()
>node.delete()
>  } catch { }
>  tx.success
>  tx.finish  // BANG - org.neo4j.graphdb.TransactionFailureException:
> Unable to commit transaction
>
> The same applies for deleting relationships.
>

Maybe it could be allowed to delete a node more than once in the same
transaction... I don't know what drawbacks that would have?


>
> Also, it would be great if there was a force parameter on the delete
> method.
>
> I would prefer a boolean if the delete was successful or not instead
> of an Exception
> (same for with node.getSingleRelationship() and maybe other methods)
>
> In saw on the list that other people also have requested a similar
> feature, Alexandru Popescu:
> >2. I was surprised to see a `Node`.delete() failing. The reason was it
> >had relationships. I think adding a method `Node`.delete(boolean
> >force) would
> >make code much easier. The method would automatically:
> >
> >- remove all relationships
>

This one has been discussed since the dawn of time. It can potentially have
unexpected side effects on your graph by deleting relationships it maybe
wasn't aware of was connected to it. But if that's what (a lot of) people
want then I can't say it shouldn't be there.


> >- clean up indexes
> >
>
The (new) integrated index API cleans up deleted nodes/relationships that
are left behind automatically and lazily (at least the lucene impl does) so
no worries for the index part at least. It does so by just slipping in
delete commands for such stray entities into future write transactions.

>
> /Andreas
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] How to delete a node when it's already deleted ?

2010-12-07 Thread Andreas Ronge
Hi

I want to avoid keeping track if a node has been deleted or not.
How can I implemented this ?

I tried to simply catch the exception but then I can't commit the transaction.

  Node node = db.createNode()
  try {
node.delete()
node.delete()
  } catch { }
  tx.success
  tx.finish  // BANG - org.neo4j.graphdb.TransactionFailureException:
Unable to commit transaction

The same applies for deleting relationships.

Also, it would be great if there was a force parameter on the delete method.

I would prefer a boolean if the delete was successful or not instead
of an Exception
(same for with node.getSingleRelationship() and maybe other methods)

In saw on the list that other people also have requested a similar
feature, Alexandru Popescu:
>2. I was surprised to see a `Node`.delete() failing. The reason was it
>had relationships. I think adding a method `Node`.delete(boolean
>force) would
>make code much easier. The method would automatically:
>
>- remove all relationships
>- clean up indexes
>

/Andreas
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Problem with the lucene index

2010-12-07 Thread Peter Neubauer
Hi there,
while Matthias is right in that the current REST API is exposing the
old Index API, for the next Milestone we are working on server
extensions which can execute Java and even Script code against the
running Server database instance and return both node, path,
relationship and custom represenations back. This will enable you (or
us) to write an extension that actually does a lookup and returns the
matching nodes, so you don't have to wait until the API is fixed. We
are going to get this into 1.2 M06, so it is only 1 week hence :)

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Dec 7, 2010 at 12:27 PM, Mattias Persson
 wrote:
> Unfortunately the REST API isn't on par with the new index api, so REST
> exposes the old IndexService. This will be fixed soon.
>
> 2010/12/7 Kaan Meralan 
>
>> Hi,
>>
>> Nowadays I am playing with Neo4j (neo4j-1.2.M04) and I have some problems
>> with the lucene indexing.
>>
>> So I construct a graph (via batch inserter) and index a property (via
>> lucene
>> index batch inserter) for prototyping a simple system. I can see the nodes
>> both with webadmin and rest api (curl and php) but I couldn't see any
>> record
>> related to my indexing. Even "curl -H Accept:application/json
>> http://xxx:7474/index/"; returns nothing. The weird thing is that although
>> I
>> can see the index, lucene and lucene-fulltext directories under graph.db
>> directory, index directory seems to be empty (only lucene.log.1,
>> lucene.log.active and lucene-store.db). Lucene directory is not empty and
>> contains a lucene index folder named with my indexed property, though.
>>
>> Does anybody have any idea?
>>
>> Thanks...
>>
>> //kaan
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Performance DB with large datasets

2010-12-07 Thread Peter Neubauer
Marius,
could you send the size of the store files, the number of
nodes/relationships and your current neo4j.properties you are running
with? Examples and details are at
http://wiki.neo4j.org/content/Configuration_Settings and
http://docs.neo4j.org/html/milestone/#_configuration_amp_performance

The current configuration is normally dumped into DB_DIR/messages.log
at startup, so if you can provide that it would be great.

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Dec 7, 2010 at 12:51 PM, Marius Kubatz  wrote:
> Hi,
>
> I'm conducting experiments with two databases and have noticed a
> radical performance drop when dealing with large databases.
> I am working with dense triadic datasets, consisting of three node
> types: A,B,C and hyperedges Y.
> Basically one hyperedge y := (a,b,c) is stored in the db as 3 neo4j
> relationships between nodes: a-b, a-c, b-c.
>
> Large DB Setup, size of the DB in Disk is 5.09 GB.
> The largest files are node.properties and node.properties.string with
> 2.X GB each.
> #of nodes per type
> A: 71,756
> B: 3,322,519
> C: 454,587
>
> #of Hyperedges:
> Y: 17,785,134
>
> Small DB Setup, size of the DB in Disk is 796.9 MB
> #of nodes per type
> A: 48,471
> B: 169,960
> C: 47,984
>
> #of Hyperedges:
> 8,963,895
>
> Machine: AMD X2 2.8 ghz, 8 GB RAM, Ubuntu 9.10 64bit
> Java VM with -Xmx6000m,
> Neo4j as embedded DB
> Neo4j transaction commit every 1 actions.
> No insterts into the graph, just retrieval of nodes and edges.
> Read operations only on node and edge properties.
> Write operations only on node properties.
>
> The algorithm basically iterates through all neighbors of two nodes
> and then through all edges in the neighborghood.
>
> An average computation on the small DB creates a neighborhood of ~200
> neigbors and ~1.200.000 edges and takes 16 seconds.
> An average computation on the large DB creates a neighborhood of ~350
> neighbors and ~2.500.000 edges and takes over 30 minutes...
>
> Could it be that the properties files are not cached and loaded each
> time the properties are called?
> Or is my transaction buffer set to high?
> Any ideas how to improve the performance?
>
> Thank you for your help and best regards,
>
> Marius
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Performance DB with large datasets

2010-12-07 Thread Marius Kubatz
Hi,

I'm conducting experiments with two databases and have noticed a
radical performance drop when dealing with large databases.
I am working with dense triadic datasets, consisting of three node
types: A,B,C and hyperedges Y.
Basically one hyperedge y := (a,b,c) is stored in the db as 3 neo4j
relationships between nodes: a-b, a-c, b-c.

Large DB Setup, size of the DB in Disk is 5.09 GB.
The largest files are node.properties and node.properties.string with
2.X GB each.
#of nodes per type
A: 71,756
B: 3,322,519
C: 454,587

#of Hyperedges:
Y: 17,785,134

Small DB Setup, size of the DB in Disk is 796.9 MB
#of nodes per type
A: 48,471
B: 169,960
C: 47,984

#of Hyperedges:
8,963,895

Machine: AMD X2 2.8 ghz, 8 GB RAM, Ubuntu 9.10 64bit
Java VM with -Xmx6000m,
Neo4j as embedded DB
Neo4j transaction commit every 1 actions.
No insterts into the graph, just retrieval of nodes and edges.
Read operations only on node and edge properties.
Write operations only on node properties.

The algorithm basically iterates through all neighbors of two nodes
and then through all edges in the neighborghood.

An average computation on the small DB creates a neighborhood of ~200
neigbors and ~1.200.000 edges and takes 16 seconds.
An average computation on the large DB creates a neighborhood of ~350
neighbors and ~2.500.000 edges and takes over 30 minutes...

Could it be that the properties files are not cached and loaded each
time the properties are called?
Or is my transaction buffer set to high?
Any ideas how to improve the performance?

Thank you for your help and best regards,

Marius
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Problem with the lucene index

2010-12-07 Thread Mattias Persson
Unfortunately the REST API isn't on par with the new index api, so REST
exposes the old IndexService. This will be fixed soon.

2010/12/7 Kaan Meralan 

> Hi,
>
> Nowadays I am playing with Neo4j (neo4j-1.2.M04) and I have some problems
> with the lucene indexing.
>
> So I construct a graph (via batch inserter) and index a property (via
> lucene
> index batch inserter) for prototyping a simple system. I can see the nodes
> both with webadmin and rest api (curl and php) but I couldn't see any
> record
> related to my indexing. Even "curl -H Accept:application/json
> http://xxx:7474/index/"; returns nothing. The weird thing is that although
> I
> can see the index, lucene and lucene-fulltext directories under graph.db
> directory, index directory seems to be empty (only lucene.log.1,
> lucene.log.active and lucene-store.db). Lucene directory is not empty and
> contains a lucene index folder named with my indexed property, though.
>
> Does anybody have any idea?
>
> Thanks...
>
> //kaan
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Problem compiling neo4j-rdf-sail

2010-12-07 Thread Mattias Persson
2010/12/7, Schmidt, Dennis :
> Hey Mattias,
>
> It really seems to be something connected with at least my Windows (7). I
> used "mvn package" as well as "mvn clean install" and neither one worked. So
> after your mail I tried everything again on my Mac, and there we go.
> Compiled flawlessly and with no tests failing. However, I still had to
> manually update the  and the parent-pom version.
>

great that the source of the problem is known! Let's see if there's a
fix for it. I'll make sure the right repository and parent-pom tags
are committed.

> Thanks a lot,
> Dennis
>
>> a) How do you package it, with "mvn package" or similar? Maven produces
>> jar
>> files under target/
>> b) Those tests aren't failing on my machine (just tested), neither on our
>> build
>> box . Do they fail even with a clean
>> checkout from subversion, or only with your local modifications (assuming
>> you have local modifications to some parts of the code when you build). Do
>> those tests fail if you run them individually also? i.e. only run f.ex.
>> testBasic() in your test session. It might be a Windows issue though, I'll
>> try to run it on a Windows machine as well.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user