Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-28 Thread Johan Svensson
On Thu, Sep 22, 2011 at 2:15 PM, st3ven  wrote:
>
> Hi Johan,
>
> I changed the settings as you described, but that changed the speed not
> really significantly.

The previous configuration would make the machine use swap and that
will kill performance.

>
> To store the degree as a property on each node is an option, but I want the
> node degree to be calculated from the graph database as I also want to check

The problem is that you are trying to access a 85GB+ dataset using
only 16GB RAM. The recommendation then is to aggregate the information
(store the degree count as a property).

Peter also mentioned using HA (cache sharding) but if you can just get
some more RAM into the machine you will see an improvement.

SSD disk would also help here since you are touching all edges in the
graph while a mechanical disk (in this setup) will have horrible
performance ( low throughput with 99% load on disk). There are SSD
solutions that handle terabytes of data today and they are dropping in
price.

Regards,
Johan
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-22 Thread st3ven
Hi Johan,

I changed the settings as you described, but that changed the speed not
really significantly.

To store the degree as a property on each node is an option, but I want the
node degree to be calculated from the graph database as I also want to check
some other metrics on the entire graph.

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3358544.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-22 Thread st3ven
Hi Linan,

that would just fit for that scenario and I wouldn't use the graph database
to get the node degree. In that scenario I could also use my relationship
file to calculate the node degree, but I also want to check some other
metrics on the graph database like cluster coefficient and so on. Than I
would again ran into the same problem. I need something to walk very fast
through the entire graph database.

Maybe you have another tip for me ;-).

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3358539.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-21 Thread Johan Svensson
Hi Stephan,

You could try lower the heap size to -Xmx2G and cache_type=weak with
10G memory mapped for relationships. The machine only has 16G RAM and
will not be able to process such a large dataset at in-memory speeds.

Another option is to calculate degree at insertion time and store it
as a property on each node.

Regards,
Johan

On Wed, Sep 21, 2011 at 12:44 PM, st3ven  wrote:
> Hi Linan,
>
> I just tried it with the outgoing relationships, but unfortunately that
> didn't speed things up.
>
> The size of my db is around 140GB and so it is not possible for me to dumb
> the full directory into a ramfs.
> My files on the hard disk have the following size:
> neostore.nodestore.db = 31MB
> neostore.relationshipstore.db = 85GB
> neostore.propertystore.db = 65GB
> neostore.propertystore.db.strings = 180MB
> Is there maybe a chance of reducing the size of my database?
>
> Cheers,
> Stephan
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-21 Thread Linan Wang
Hi stephan,
I mis-calculated the size of relationshipstore.db. i thought it was
around 8G instead of 85G. the only option left i think is to build
index. something like this:
idx = db.index().forNode("knows");
idx.add(thisguy, "knows", thatguy.getId());
idx.add(thatguy, "known_by", thisguy.getId());
the benefit is that when querying, the return size is pre-calculated
so it would save some iteration time.
the problem is the index files size, should around 85G.

On Wed, Sep 21, 2011 at 11:44 AM, st3ven  wrote:
> Hi Linan,
>
> I just tried it with the outgoing relationships, but unfortunately that
> didn't speed things up.
>
> The size of my db is around 140GB and so it is not possible for me to dumb
> the full directory into a ramfs.
> My files on the hard disk have the following size:
> neostore.nodestore.db = 31MB
> neostore.relationshipstore.db = 85GB
> neostore.propertystore.db = 65GB
> neostore.propertystore.db.strings = 180MB
> Is there maybe a chance of reducing the size of my database?
>
> Cheers,
> Stephan
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Best regards

Linan Wang
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-21 Thread st3ven
Hi Linan,

I just tried it with the outgoing relationships, but unfortunately that
didn't speed things up.

The size of my db is around 140GB and so it is not possible for me to dumb
the full directory into a ramfs.
My files on the hard disk have the following size:
neostore.nodestore.db = 31MB
neostore.relationshipstore.db = 85GB
neostore.propertystore.db = 65GB
neostore.propertystore.db.strings = 180MB 
Is there maybe a chance of reducing the size of my database?

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-21 Thread st3ven
Peter,
I don't think that this would help me, because I wouldn't use the graph to
get the node degree and to get the node degree I could also just use my file
with all relationships, but I want to use the graph database to get that.

The problem for me is that I don't just want to get the node degree, I also
want to check some other metrics on the graph database like clustering
coefficient and than I would also have the same problem that I can't read up
the entire database.

Creating an index for the node degree would just fit for that scenario, but
than I can't go on.

Maybe you have another tip for me ;-).

Thanks for your help!

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355067.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-21 Thread st3ven
Unfortunately the SSD is not an option, because I would need a SSD with
around 150GB as my database is 140GB big.

Yesterday I already tried to configure Neo4j to use more memory for mapping,
but it seems that Neo4j does't allocate the whole memory I configured. 
I noticed that my system just uses 4GB after Neo4j is running for a while,
but I got 16GB to use.

I downloaded the following configuration file 
http://dist.neo4j.org/neo_default.props
http://dist.neo4j.org/neo_default.props  and changed the entries like this:

neostore.nodestore.db.mapped_memory=31M
neostore.relationshipstore.db.mapped_memory=8G
neostore.propertystore.db.mapped_memory=90M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.strings.mapped_memory=180M
neostore.propertystore.db.arrays.mapped_memory=130M

My files on the hard disk have the following size:
neostore.nodestore.db = 31MB
neostore.relationshipstore.db = 85GB
neostore.propertystore.db = 65GB
neostore.propertystore.db.strings = 180MB

Shall I maybe also change something at the Cache Settings in that file?
What settings would be good for me?

As I already said I am using right now the following Java parameters:
-server -Xmx8G -XX:+UseParallelGC -XX:+UseNUMA

Is there also something I should change?

Thanks for your help!

Cheers, 
Stephan




--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355044.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Linan Wang
Stephan,
what's the size of your db? if it's under 10G, how about just dump the
full directory into to a ramfs. leave 1G to jvm and it'll do heavy io
on the ramfs. i think it's a simple solution and could yield
interesting result. please let me know the result if you tried. thanks

On Tue, Sep 20, 2011 at 5:41 PM, Peter Neubauer
 wrote:
> Steven,
> the index is built into the DB, so you can use something like
> http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-index.html
> to index all your nodes into Lucene (in one index, the node as key,
> the number of relationships as numeric value when creating them). When
> reading, you would simply request all keys from the index and iterate
> over them. I am not terribly sure how much fast it is, but given that
> you are just loading up documents, Lucene should be reasonably fast.
>
> Let us know if that works out!
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Tue, Sep 20, 2011 at 6:01 PM, st3ven  wrote:
>> Hello Peter,
>>
>> it's a pity that neo4j doesn't support full graph-scans.
>>
>> Is there maybe a possibility to cache more relationships to speed things up
>> a little bit.
>> I recognized that only the iteration over the relationships is taking hours.
>> The time to get all relationships of one node is quite fast.
>>
>> I think I could try your second solution:
>> - Store the relationships as a property in an Index (e.g. Lucene) and
>> as the index for all entries. Thus, you are using an index for what it
>> is good at - global operations over all documents.
>>
>> But I didn't understood it correctly. Do you mean an Index which stores the
>> ID of a relationship and creating such an Index for every node?
>> Could you maybe give me a code example for that?
>> That would be very kind of you.
>>
>> The first solution is not really realizable, because I don't know the number
>> of relationships of every node.
>> I would have to count the relationships before the insertion and that would
>> make my database useless for the node degree query.
>>
>> Thank you very much for your help!
>>
>> Cheers,
>> Stephan
>>
>> --
>> View this message in context: 
>> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html
>> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Best regards

Linan Wang
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Peter Neubauer
Steven,
the index is built into the DB, so you can use something like
http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-index.html
to index all your nodes into Lucene (in one index, the node as key,
the number of relationships as numeric value when creating them). When
reading, you would simply request all keys from the index and iterate
over them. I am not terribly sure how much fast it is, but given that
you are just loading up documents, Lucene should be reasonably fast.

Let us know if that works out!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Sep 20, 2011 at 6:01 PM, st3ven  wrote:
> Hello Peter,
>
> it's a pity that neo4j doesn't support full graph-scans.
>
> Is there maybe a possibility to cache more relationships to speed things up
> a little bit.
> I recognized that only the iteration over the relationships is taking hours.
> The time to get all relationships of one node is quite fast.
>
> I think I could try your second solution:
> - Store the relationships as a property in an Index (e.g. Lucene) and
> as the index for all entries. Thus, you are using an index for what it
> is good at - global operations over all documents.
>
> But I didn't understood it correctly. Do you mean an Index which stores the
> ID of a relationship and creating such an Index for every node?
> Could you maybe give me a code example for that?
> That would be very kind of you.
>
> The first solution is not really realizable, because I don't know the number
> of relationships of every node.
> I would have to count the relationships before the insertion and that would
> make my database useless for the node degree query.
>
> Thank you very much for your help!
>
> Cheers,
> Stephan
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread st3ven
Hello Peter,

it's a pity that neo4j doesn't support full graph-scans.

Is there maybe a possibility to cache more relationships to speed things up
a little bit.
I recognized that only the iteration over the relationships is taking hours.
The time to get all relationships of one node is quite fast.

I think I could try your second solution:
- Store the relationships as a property in an Index (e.g. Lucene) and
as the index for all entries. Thus, you are using an index for what it
is good at - global operations over all documents. 

But I didn't understood it correctly. Do you mean an Index which stores the
ID of a relationship and creating such an Index for every node?
Could you maybe give me a code example for that?
That would be very kind of you.

The first solution is not really realizable, because I don't know the number
of relationships of every node.
I would have to count the relationships before the insertion and that would
make my database useless for the node degree query.

Thank you very much for your help!

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Linan Wang
hi stephan,
my theory is that most of the time would spent on retrieving imcoming
relationships. could you try again but this time only retrieve
outgoing relationship?

for (Node node : db.getAllNodes()) {
   if (node.getId() > 0) {
   long test = System.currentTimeMillis();
   Iterable rels =
node.getRelationships(knows, Direction.OUTGOING);
   System.out.print("Retrieval:"
   +
(System.currentTimeMillis() - test));

   test = System.currentTimeMillis();
   int count =
com.google.common.collect.Iterables.size(rels);
   System.out.print("ms; Counting:"
   +
(System.currentTimeMillis() - test));
   System.out.println("ms; number of
edges:" + count);
   }
   }

On Tue, Sep 20, 2011 at 4:37 PM, st3ven  wrote:
> Hello again,
>
> the bottle neck is at the iteration.
> I did some tests with it to check whether the iteration or relationship
> retrievel is to slow.
>
> My test results look like this:
>
> Retrieval:1ms; Counting:158ms; number of edges:116407
> Retrieval:0ms; Counting:2ms; number of edges:1804
> Retrieval:0ms; Counting:0ms; number of edges:22
> Retrieval:0ms; Counting:0ms; number of edges:31
> Retrieval:0ms; Counting:0ms; number of edges:39
> Retrieval:0ms; Counting:2ms; number of edges:1213
> Retrieval:0ms; Counting:0ms; number of edges:57
> Retrieval:0ms; Counting:36ms; number of edges:59420
> Retrieval:0ms; Counting:335ms; number of edges:175156
> Retrieval:1ms; Counting:168ms; number of edges:146697
> Retrieval:0ms; Counting:354ms; number of edges:285051
> Retrieval:0ms; Counting:0ms; number of edges:50
> Retrieval:0ms; Counting:11ms; number of edges:20960
> Retrieval:0ms; Counting:0ms; number of edges:43
> Retrieval:0ms; Counting:0ms; number of edges:51
> Retrieval:0ms; Counting:1ms; number of edges:647
> Retrieval:0ms; Counting:5ms; number of edges:10216
> Retrieval:0ms; Counting:2ms; number of edges:3444
> Retrieval:0ms; Counting:0ms; number of edges:1128
> Retrieval:1ms; Counting:312ms; number of edges:319127
> Retrieval:1ms; Counting:0ms; number of edges:5
> Retrieval:0ms; Counting:760ms; number of edges:104741
> Retrieval:0ms; Counting:11ms; number of edges:9210
> Retrieval:0ms; Counting:0ms; number of edges:31
> Retrieval:1ms; Counting:3ms; number of edges:3116
> Retrieval:0ms; Counting:37ms; number of edges:70835
> Retrieval:0ms; Counting:383ms; number of edges:296445
> Retrieval:1ms; Counting:0ms; number of edges:120
> Retrieval:0ms; Counting:2ms; number of edges:1526
> Retrieval:0ms; Counting:0ms; number of edges:71
> Retrieval:0ms; Counting:42ms; number of edges:35960
> Retrieval:0ms; Counting:90ms; number of edges:9644
> Retrieval:0ms; Counting:186ms; number of edges:129981
> Retrieval:0ms; Counting:1ms; number of edges:1213
> Retrieval:1ms; Counting:143ms; number of edges:124495
> Retrieval:0ms; Counting:0ms; number of edges:58
> Retrieval:0ms; Counting:75ms; number of edges:56195
> Retrieval:0ms; Counting:99ms; number of edges:92574
> Retrieval:0ms; Counting:0ms; number of edges:13
> Retrieval:0ms; Counting:50ms; number of edges:26350
> Retrieval:0ms; Counting:2ms; number of edges:1856
> Retrieval:1ms; Counting:376ms; number of edges:114166
> Retrieval:0ms; Counting:9528ms; number of edges:11956
> Retrieval:0ms; Counting:50047ms; number of edges:12645
> Retrieval:1ms; Counting:43687ms; number of edges:15025
>
> The first results came up very fast, because they seem to have been cached
> cause I did that quite often.
> As you can see the last 4 results weren't cached and it took a huge amount
> of time to do the iteration over the relationships.
>
> I checked that with the following code:
>
> for (Node node : db.getAllNodes()) {
>                        if (node.getId() > 0) {
>                                long test = System.currentTimeMillis();
>                                Iterable rels = 
> node.getRelationships(knows);
>                                System.out.print("Retrieval:"
>                                                + (System.currentTimeMillis() 
> - test));
>
>                                test = System.currentTimeMillis();
>                                int count = 
> com.google.common.collect.Iterables.size(rels);
>                                System.out.print("ms; Counting:"
>                                                + (System.currentTimeMillis() 
> - test));
>                                System.out.println("ms; number of edges:" + 
> count);
>                        }
>                }
> Is there maybe a possibilty to cache more relationships or do you have any
> idea hot to speedup the iteration progress.
>
> Thanks for your help again!
>
> Cheers,
> Stephan
>
> --
> View this message in context: 
> http://n

Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Michael Hunger
The "retrieval" is only virtual, as it is lazy.

When I get back to my machine on Thursday, I gonna run your tests and get back 
to you. I have made some modifications on the relationship loading and want to 
see how that affects this.

There are issues loading lots of relationships with cold caches in a one-by-one 
usecase. As the larger segment caching only kicks in if there are a certain 
number of misses of the memory mapped file loading.

Using an SSD would also speed up your use-case.

Configuring Neo4j to use more memory for memory mapping would also help.

Cheers

Michael

Am 20.09.2011 um 17:37 schrieb st3ven:

> Hello again,
> 
> the bottle neck is at the iteration.
> I did some tests with it to check whether the iteration or relationship
> retrievel is to slow.
> 
> My test results look like this:
> 
> Retrieval:1ms; Counting:158ms; number of edges:116407
> Retrieval:0ms; Counting:2ms; number of edges:1804
> Retrieval:0ms; Counting:0ms; number of edges:22
> Retrieval:0ms; Counting:0ms; number of edges:31
> Retrieval:0ms; Counting:0ms; number of edges:39
> Retrieval:0ms; Counting:2ms; number of edges:1213
> Retrieval:0ms; Counting:0ms; number of edges:57
> Retrieval:0ms; Counting:36ms; number of edges:59420
> Retrieval:0ms; Counting:335ms; number of edges:175156
> Retrieval:1ms; Counting:168ms; number of edges:146697
> Retrieval:0ms; Counting:354ms; number of edges:285051
> Retrieval:0ms; Counting:0ms; number of edges:50
> Retrieval:0ms; Counting:11ms; number of edges:20960
> Retrieval:0ms; Counting:0ms; number of edges:43
> Retrieval:0ms; Counting:0ms; number of edges:51
> Retrieval:0ms; Counting:1ms; number of edges:647
> Retrieval:0ms; Counting:5ms; number of edges:10216
> Retrieval:0ms; Counting:2ms; number of edges:3444
> Retrieval:0ms; Counting:0ms; number of edges:1128
> Retrieval:1ms; Counting:312ms; number of edges:319127
> Retrieval:1ms; Counting:0ms; number of edges:5
> Retrieval:0ms; Counting:760ms; number of edges:104741
> Retrieval:0ms; Counting:11ms; number of edges:9210
> Retrieval:0ms; Counting:0ms; number of edges:31
> Retrieval:1ms; Counting:3ms; number of edges:3116
> Retrieval:0ms; Counting:37ms; number of edges:70835
> Retrieval:0ms; Counting:383ms; number of edges:296445
> Retrieval:1ms; Counting:0ms; number of edges:120
> Retrieval:0ms; Counting:2ms; number of edges:1526
> Retrieval:0ms; Counting:0ms; number of edges:71
> Retrieval:0ms; Counting:42ms; number of edges:35960
> Retrieval:0ms; Counting:90ms; number of edges:9644
> Retrieval:0ms; Counting:186ms; number of edges:129981
> Retrieval:0ms; Counting:1ms; number of edges:1213
> Retrieval:1ms; Counting:143ms; number of edges:124495
> Retrieval:0ms; Counting:0ms; number of edges:58
> Retrieval:0ms; Counting:75ms; number of edges:56195
> Retrieval:0ms; Counting:99ms; number of edges:92574
> Retrieval:0ms; Counting:0ms; number of edges:13
> Retrieval:0ms; Counting:50ms; number of edges:26350
> Retrieval:0ms; Counting:2ms; number of edges:1856
> Retrieval:1ms; Counting:376ms; number of edges:114166
> Retrieval:0ms; Counting:9528ms; number of edges:11956
> Retrieval:0ms; Counting:50047ms; number of edges:12645
> Retrieval:1ms; Counting:43687ms; number of edges:15025
> 
> The first results came up very fast, because they seem to have been cached
> cause I did that quite often.
> As you can see the last 4 results weren't cached and it took a huge amount
> of time to do the iteration over the relationships.
> 
> I checked that with the following code:
> 
> for (Node node : db.getAllNodes()) {
>   if (node.getId() > 0) {
>   long test = System.currentTimeMillis();
>   Iterable rels = 
> node.getRelationships(knows);
>   System.out.print("Retrieval:"
>   + (System.currentTimeMillis() - 
> test));
> 
>   test = System.currentTimeMillis();
>   int count = 
> com.google.common.collect.Iterables.size(rels);
>   System.out.print("ms; Counting:"
>   + (System.currentTimeMillis() - 
> test));
>   System.out.println("ms; number of edges:" + 
> count); 
>   }
>   }
> Is there maybe a possibilty to cache more relationships or do you have any
> idea hot to speedup the iteration progress.
> 
> Thanks for your help again!
> 
> Cheers,
> Stephan
> 
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352415.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread st3ven
Hello again,

the bottle neck is at the iteration.
I did some tests with it to check whether the iteration or relationship
retrievel is to slow.

My test results look like this:

Retrieval:1ms; Counting:158ms; number of edges:116407
Retrieval:0ms; Counting:2ms; number of edges:1804
Retrieval:0ms; Counting:0ms; number of edges:22
Retrieval:0ms; Counting:0ms; number of edges:31
Retrieval:0ms; Counting:0ms; number of edges:39
Retrieval:0ms; Counting:2ms; number of edges:1213
Retrieval:0ms; Counting:0ms; number of edges:57
Retrieval:0ms; Counting:36ms; number of edges:59420
Retrieval:0ms; Counting:335ms; number of edges:175156
Retrieval:1ms; Counting:168ms; number of edges:146697
Retrieval:0ms; Counting:354ms; number of edges:285051
Retrieval:0ms; Counting:0ms; number of edges:50
Retrieval:0ms; Counting:11ms; number of edges:20960
Retrieval:0ms; Counting:0ms; number of edges:43
Retrieval:0ms; Counting:0ms; number of edges:51
Retrieval:0ms; Counting:1ms; number of edges:647
Retrieval:0ms; Counting:5ms; number of edges:10216
Retrieval:0ms; Counting:2ms; number of edges:3444
Retrieval:0ms; Counting:0ms; number of edges:1128
Retrieval:1ms; Counting:312ms; number of edges:319127
Retrieval:1ms; Counting:0ms; number of edges:5
Retrieval:0ms; Counting:760ms; number of edges:104741
Retrieval:0ms; Counting:11ms; number of edges:9210
Retrieval:0ms; Counting:0ms; number of edges:31
Retrieval:1ms; Counting:3ms; number of edges:3116
Retrieval:0ms; Counting:37ms; number of edges:70835
Retrieval:0ms; Counting:383ms; number of edges:296445
Retrieval:1ms; Counting:0ms; number of edges:120
Retrieval:0ms; Counting:2ms; number of edges:1526
Retrieval:0ms; Counting:0ms; number of edges:71
Retrieval:0ms; Counting:42ms; number of edges:35960
Retrieval:0ms; Counting:90ms; number of edges:9644
Retrieval:0ms; Counting:186ms; number of edges:129981
Retrieval:0ms; Counting:1ms; number of edges:1213
Retrieval:1ms; Counting:143ms; number of edges:124495
Retrieval:0ms; Counting:0ms; number of edges:58
Retrieval:0ms; Counting:75ms; number of edges:56195
Retrieval:0ms; Counting:99ms; number of edges:92574
Retrieval:0ms; Counting:0ms; number of edges:13
Retrieval:0ms; Counting:50ms; number of edges:26350
Retrieval:0ms; Counting:2ms; number of edges:1856
Retrieval:1ms; Counting:376ms; number of edges:114166
Retrieval:0ms; Counting:9528ms; number of edges:11956
Retrieval:0ms; Counting:50047ms; number of edges:12645
Retrieval:1ms; Counting:43687ms; number of edges:15025

The first results came up very fast, because they seem to have been cached
cause I did that quite often.
As you can see the last 4 results weren't cached and it took a huge amount
of time to do the iteration over the relationships.

I checked that with the following code:

for (Node node : db.getAllNodes()) {
if (node.getId() > 0) {
long test = System.currentTimeMillis();
Iterable rels = 
node.getRelationships(knows);
System.out.print("Retrieval:"
+ (System.currentTimeMillis() - 
test));

test = System.currentTimeMillis();
int count = 
com.google.common.collect.Iterables.size(rels);
System.out.print("ms; Counting:"
+ (System.currentTimeMillis() - 
test));
System.out.println("ms; number of edges:" + 
count); 
}
}
Is there maybe a possibilty to cache more relationships or do you have any
idea hot to speedup the iteration progress.

Thanks for your help again!

Cheers,
Stephan

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352415.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Linan Wang
hi stephan
i'm wondering if any difference if you could specify the relationship
when counting degrees:
RelationshipType knows = DynamicRelationshipType.withName("KNOWS");

Iterable rels = node.getRelationship(knows);
count = com.google.common.collect.Iterables.size(rels);

besides, do you know where is the bottle neck is, the node iteration
or relationship retrieval?

On Tue, Sep 20, 2011 at 1:38 PM, st3ven  wrote:
> Hi,
>
> I already tried these java parameters, but that didn't really speedup the
> process and i already turned atime off.
> As Java parameters I am using right now -d64 -server -Xms7G -Xmx14G
> -XX:+UseParallelGC -XX:+UseNUMA
> What I've also noticed is, that reading from the database is really slow on
> my hard disk.
> It just reads 1mb/s and sometimes 8mb/s, but that is really slow. My hard
> disk can normally read and copy files much faster.
> Also very strange is, that the workload of the hard disk is around 99% with
> reading 1mb/s.
>
> My OS is Ubuntu Linux x64 and my file system is ext4.
>
> On the neo4j Wiki I found some performance guides, but these didn't really
> help.
> Do you know what I can do else?
>
>
> Perfomance Guides:
> http://wiki.neo4j.org/content/Linux_Performance_Guide
> http://wiki.neo4j.org/content/Linux_Performance_Guide
> http://wiki.neo4j.org/content/Configuration_Settings
> http://wiki.neo4j.org/content/Configuration_Settings
>
> I also added a configurtion file, but it seems that my Java program doesn't
> use all of the Ram.
>
> Thanks for your help!
>
> Cheers,
> Stephan
>
>
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351881.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Best regards

Linan Wang
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Peter Neubauer
Steven,
in this scenario, you are reading up the entire db, and basically have
it cold. Neo4j is not optimized in itself to do full graph-scans. I
see a few solutions for you:

- store the number of relationships as a property on nodes and read
only that. this works if the updates to your graph are not too
frequent.

- Store the relationships as a property in an Index (e.g. Lucene) and
as the index for all entries. Thus, you are using an index for what it
is good at - global operations over all documents.

- use HA or just file copy to replicate the graph on several
instances, and send a sharded query to all of them (e.g. count 100K
node degrees on all of the instances and return the result). This
query is very easy to do in a map/reduce fashion.

Is that feasible?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Sep 20, 2011 at 1:00 PM, st3ven  wrote:
> Peter,
>
> the import of the data into the graph database is not the main problem for
> me.
> The lookup of nodes from the index is fast enough for me.
> To create the database it took me nearly half a day.
>
> My main problem here is getting the node degree of every node.
> As I already said I am using this code to get the node degree of every node:
>
> for (Node node : db.getAllNodes()) {
>                        counter = 0;
>
>                        if (node.getId() > 0) {
>                                for (Relationship rel :
> node.getRelationships()) {
>                                        counter++;
>                                }
>
> System.out.println(node.getProperty("name").toString() + ": "
>                                                + counter);
>                        }
>
>                }
>
> After 3 days I only got the node degree of 8 nodes and I want to
> optimize my traversal here, cause this is very slow.
> What can I do to make this faster or do I have to change my code for getting
> the node degree?
> I only posted my import code because I thought I could maybe optimize there
> something for this traversal.
>
> Thank you very much for your help!
>
> Cheers,
> Stephan
>
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread st3ven
Hi,

I already tried these java parameters, but that didn't really speedup the
process and i already turned atime off.
As Java parameters I am using right now -d64 -server -Xms7G -Xmx14G
-XX:+UseParallelGC -XX:+UseNUMA
What I've also noticed is, that reading from the database is really slow on
my hard disk.
It just reads 1mb/s and sometimes 8mb/s, but that is really slow. My hard
disk can normally read and copy files much faster.
Also very strange is, that the workload of the hard disk is around 99% with
reading 1mb/s.

My OS is Ubuntu Linux x64 and my file system is ext4.

On the neo4j Wiki I found some performance guides, but these didn't really
help.
Do you know what I can do else?


Perfomance Guides:
http://wiki.neo4j.org/content/Linux_Performance_Guide
http://wiki.neo4j.org/content/Linux_Performance_Guide 
http://wiki.neo4j.org/content/Configuration_Settings
http://wiki.neo4j.org/content/Configuration_Settings 

I also added a configurtion file, but it seems that my Java program doesn't
use all of the Ram.

Thanks for your help!

Cheers,
Stephan



--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351881.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Linan Wang
hi Stephan,
have you set the -Xms, -XX:+UseNUMA, and -XX:+UseConcMarkSweepGC? they
could speedup the process significantly.
also, if you like, the jrockit is fast and free now. give it a try.
btw, which file system you are using? have you turned off atime?

On Tue, Sep 20, 2011 at 12:00 PM, st3ven  wrote:
> Peter,
>
> the import of the data into the graph database is not the main problem for
> me.
> The lookup of nodes from the index is fast enough for me.
> To create the database it took me nearly half a day.
>
> My main problem here is getting the node degree of every node.
> As I already said I am using this code to get the node degree of every node:
>
> for (Node node : db.getAllNodes()) {
>                        counter = 0;
>
>                        if (node.getId() > 0) {
>                                for (Relationship rel :
> node.getRelationships()) {
>                                        counter++;
>                                }
>
> System.out.println(node.getProperty("name").toString() + ": "
>                                                + counter);
>                        }
>
>                }
>
> After 3 days I only got the node degree of 8 nodes and I want to
> optimize my traversal here, cause this is very slow.
> What can I do to make this faster or do I have to change my code for getting
> the node degree?
> I only posted my import code because I thought I could maybe optimize there
> something for this traversal.
>
> Thank you very much for your help!
>
> Cheers,
> Stephan
>
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Best regards

Linan Wang
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread st3ven
Peter,

the import of the data into the graph database is not the main problem for
me. 
The lookup of nodes from the index is fast enough for me.
To create the database it took me nearly half a day.

My main problem here is getting the node degree of every node.
As I already said I am using this code to get the node degree of every node:

for (Node node : db.getAllNodes()) {
counter = 0;

if (node.getId() > 0) {
for (Relationship rel :
node.getRelationships()) {
counter++;
}
   
System.out.println(node.getProperty("name").toString() + ": "
+ counter);
}

} 

After 3 days I only got the node degree of 8 nodes and I want to
optimize my traversal here, cause this is very slow.
What can I do to make this faster or do I have to change my code for getting
the node degree?
I only posted my import code because I thought I could maybe optimize there
something for this traversal.

Thank you very much for your help!

Cheers,
Stephan


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread Peter Neubauer
Steven,
the most performant way to insert data with the BatchInserter is to
first insert the nodes only form your node file (that should be fast).
After that (or at the same time), find a way to generate the
relationship file with Neo4j IDs rather than being forced to look the
nodes up in indexes during relationship insertion. This is taking the
bulk of time, so if you could write back to a file your node IDs, then
massage the relationship text file to include node FROM and TO IDs
(e.g. using Perl or Bash or Ruby) and import that one refering to
these directly, that should be much faster.

HTH

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Sep 20, 2011 at 12:23 PM, st3ven  wrote:
> Hello neo4j-comunity,
>
>
>
> I am creating a graph database for a social network.
>
> To create the graph database I am using the Batch Inserter.
>
> The Batch Inserter inserts data from 2 files into the graph database.
>
>
>
> Files:
>
> 1. the first file contains the Nodes I want to create (about 3.5M Nodes)
>
> The file looks like this:
> Author 1
> Author 2
> Author 2 ...
>
> 2. the second file contains every Relationship between the Nodes (about 2.5
> billion Relationships)
>
>
> This file looks like this:
> Author1; Author2; timestamp
> Author2; Author3; timestamp
> Author1; Author3; timestamp...
>
> The specifications of my Computer look like this:
>
>
>
> Intel Core i7 3,4Ghz
>
> 16GB Ram
>
> Geforce GT 420 1GB
>
> 2TB harddrive
>
>
>
> My Code to create the graph database looks like this:
>
>
>
> package wikiOSN;
>
> import java.io.BufferedReader;
> import java.io.FileReader;
> import java.io.IOException;
> import java.util.Map;
>
> import org.neo4j.graphdb.DynamicRelationshipType;
> import org.neo4j.graphdb.index.BatchInserterIndex;
> import org.neo4j.graphdb.index.BatchInserterIndexProvider;
> import org.neo4j.helpers.collection.MapUtil;
> import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider;
> import org.neo4j.kernel.impl.batchinsert.BatchInserter;
> import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;
>
> public class CreateAndConnectNodes {
>
>        public static void main(String[] args) throws IOException {
>                BufferedReader bf = new BufferedReader(new FileReader(
>                                "/media/sdg1/Wikipedia/Reduced 
> Files/autoren-der-wikiartikel"));
>                BufferedReader bf2 = new BufferedReader(new FileReader(
>                                "/media/sdg1/Wikipedia/Reduced 
> Files/wikipedia-output"));
>                CreateAndConnectNodes cacn = new CreateAndConnectNodes();
>                cacn.createGraphDatabase(bf, bf2);
>
>        }
>
>        private long relationCounter = 0;
>
>        private void createGraphDatabase(BufferedReader bf, BufferedReader bf2)
>                        throws IOException {
>                BatchInserter inserter = new BatchInserterImpl(
>                                "target/socialNetwork-batchinsert");
>                BatchInserterIndexProvider indexProvider = new
> LuceneBatchInserterIndexProvider(
>                                inserter);
>                BatchInserterIndex authors = indexProvider.nodeIndex("author",
>                                MapUtil.stringMap("type", "exact"));
>                authors.setCacheCapacity("name", 10);
>
>                String zeile;
>                String zeile2;
>
>                while ((zeile = bf.readLine()) != null) {
>                        Map properties = 
> MapUtil.map("name", zeile);
>                        long node = inserter.createNode(properties);
>                        authors.add(node, properties);
>                }
>                bf.close();
>                System.out.println("Nodes created!");
>                authors.flush();
>                String node = "";
>                long node1 = 0;
>                long node2 = 0;
>                while ((zeile2 = bf2.readLine()) != null) {
>                        if (relationCounter++ % 1 == 0) {
>
>                                System.out
>                                                .println("Edges already 
> created: " + relationCounter);
>
>                        }
>                        String[] relation = zeile2.split("%;% ");
>                        if (node == "") {
>                                node = relation[0];
>                                if (authors.get("name", 
> relation[0]).getSingle() != null) {
>                                        node1 = authors.get("name", 
> relation[0]).getSingle();
>                                } else {
>                         

[Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node

2011-09-20 Thread st3ven
Hello neo4j-comunity,



I am creating a graph database for a social network.

To create the graph database I am using the Batch Inserter.

The Batch Inserter inserts data from 2 files into the graph database.



Files:

1. the first file contains the Nodes I want to create (about 3.5M Nodes)

The file looks like this:
Author 1
Author 2
Author 2 ...

2. the second file contains every Relationship between the Nodes (about 2.5
billion Relationships)


This file looks like this:
Author1; Author2; timestamp
Author2; Author3; timestamp
Author1; Author3; timestamp...

The specifications of my Computer look like this:



Intel Core i7 3,4Ghz

16GB Ram

Geforce GT 420 1GB

2TB harddrive



My Code to create the graph database looks like this:



package wikiOSN;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;

import org.neo4j.graphdb.DynamicRelationshipType;
import org.neo4j.graphdb.index.BatchInserterIndex;
import org.neo4j.graphdb.index.BatchInserterIndexProvider;
import org.neo4j.helpers.collection.MapUtil;
import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider;
import org.neo4j.kernel.impl.batchinsert.BatchInserter;
import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl;

public class CreateAndConnectNodes {

public static void main(String[] args) throws IOException {
BufferedReader bf = new BufferedReader(new FileReader(
"/media/sdg1/Wikipedia/Reduced 
Files/autoren-der-wikiartikel"));
BufferedReader bf2 = new BufferedReader(new FileReader(
"/media/sdg1/Wikipedia/Reduced 
Files/wikipedia-output"));
CreateAndConnectNodes cacn = new CreateAndConnectNodes();
cacn.createGraphDatabase(bf, bf2);

}

private long relationCounter = 0;

private void createGraphDatabase(BufferedReader bf, BufferedReader bf2)
throws IOException {
BatchInserter inserter = new BatchInserterImpl(
"target/socialNetwork-batchinsert");
BatchInserterIndexProvider indexProvider = new
LuceneBatchInserterIndexProvider(
inserter);
BatchInserterIndex authors = indexProvider.nodeIndex("author",
MapUtil.stringMap("type", "exact"));
authors.setCacheCapacity("name", 10);

String zeile;
String zeile2;

while ((zeile = bf.readLine()) != null) {
Map properties = 
MapUtil.map("name", zeile);
long node = inserter.createNode(properties);
authors.add(node, properties);
}
bf.close();
System.out.println("Nodes created!");
authors.flush();
String node = "";
long node1 = 0;
long node2 = 0;
while ((zeile2 = bf2.readLine()) != null) {
if (relationCounter++ % 1 == 0) {

System.out
.println("Edges already 
created: " + relationCounter);

}
String[] relation = zeile2.split("%;% ");
if (node == "") {
node = relation[0];
if (authors.get("name", 
relation[0]).getSingle() != null) {
node1 = authors.get("name", 
relation[0]).getSingle();
} else {
System.out.println("Autor 1: " + 
relation[0]);
break;
}

}
if (!node.equals(relation[0])) {
node = relation[0];
if (authors.get("name", 
relation[0]).getSingle() != null) {
node1 = authors.get("name", 
relation[0]).getSingle();
} else {
System.out.println("Autor 1: " + 
relation[0]);
break;
}

}
if (authors.get("name", relation[1]).getSingle() != 
null) {
node2 = authors.get("name", 
relation[1]).getSingle();
} else {
System.out.println("Autor 2: " + relation[1]);
break;
}

Map properties = 
MapUtil.map("timestamp",
Long.parseLong(relation[2].trim()));