Re: [Neo4j] Big index solutions?

2011-03-14 Thread Massimo Lusetti
On Mon, Mar 14, 2011 at 9:26 AM, Mattias Persson
matt...@neotechnology.com wrote:

 Hmm, that doesn't look very good. I'm very keen on looking at your code for
 this test, if possible, since I haven't experienced a slowdown like this
 before.

 I just did an insertion test of 1.5M indexed nodes and there was virtually
 no slowdown at all.

I'm going let you know privately the URL for downloading.

Cheers
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Massimo Lusetti
On Fri, Mar 11, 2011 at 10:31 AM, Peter Neubauer
peter.neuba...@neotechnology.com wrote:

 No,
 things are not failing, it is just that in big insertion scenarios the
 index lookup when joining nodes together into relationships, there is
 often just an exact index needed in order to do that. We have good
 experiences with Lucene, but when importing e.g. big OpenStreetMap
 datasets, we need to run lookups during insertion, and we experience
 Lucene taking a lot of time in these cases.

That's seems very similar to my use case... I need to import big,
bigger then OpenStreetMap, data and I'm using plain Lucene to do
indexing and query while indexing...

 That is why I think exact lookups, like K/V stores, would be
 interesting in these scenarios, as an alternative. They _should_
 perform better then Lucene.

I'm willing to listen to your results...

Cheers
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Massimo Lusetti
On Fri, Mar 11, 2011 at 10:37 AM, Mattias Persson
matt...@neotechnology.com wrote:

 And I'm curious about why the neo4j lucene layer adds overhead and how your
 code looks like in your own solution.

I really don't know, didn't had time to investigate in neo4j code but
I'm indexing a SHA1 hash key pointing to a Node just to prove presence
of the data within the DB using plain Lucene solutions (using the
ChenilleKit.codehaus.org project) and your Index framework to compare
them and here are the results doing 20 subsequent chunks/rows of
data

With the neo4j Index framework:

[DEBUG] 11/56/2011 14:56:08,438 StatistichengModule.PickUpPoller Fatte
20 righe in 108482 ms
[DEBUG] 11/59/2011 14:59:51,061 StatistichengModule.PickUpPoller Fatte
20 righe in 146664 ms
[DEBUG] 11/02/2011 15:02:30,317 StatistichengModule.PickUpPoller Fatte
20 righe in 159256 ms
[DEBUG] 11/05/2011 15:05:08,680 StatistichengModule.PickUpPoller Fatte
20 righe in 158363 ms
[DEBUG] 11/08/2011 15:08:34,501 StatistichengModule.PickUpPoller Fatte
20 righe in 205821 ms
[DEBUG] 11/12/2011 15:12:51,690 StatistichengModule.PickUpPoller Fatte
20 righe in 257189 ms
[DEBUG] 11/17/2011 15:17:11,589 StatistichengModule.PickUpPoller Fatte
20 righe in 259899 ms
[DEBUG] 11/21/2011 15:21:45,109 StatistichengModule.PickUpPoller Fatte
20 righe in 273520 ms
[DEBUG] 11/26/2011 15:26:04,802 StatistichengModule.PickUpPoller Fatte
20 righe in 259693 ms
[DEBUG] 11/30/2011 15:30:32,269 StatistichengModule.PickUpPoller Fatte
20 righe in 267467 ms
[DEBUG] 11/34/2011 15:34:32,804 StatistichengModule.PickUpPoller Fatte
20 righe in 240535 ms
[DEBUG] 11/43/2011 15:43:14,960 StatistichengModule.PickUpPoller Fatte
20 righe in 355129 ms
[DEBUG] 11/50/2011 15:50:22,323 StatistichengModule.PickUpPoller Fatte
20 righe in 427363 ms
[DEBUG] 11/58/2011 15:58:12,846 StatistichengModule.PickUpPoller Fatte
20 righe in 470523 ms


With the plain Lucene solution (external to neo4j db):


[DEBUG] 12/55/2011 14:55:03,997 StatistichengModule.PickUpPoller Fatte
20 righe in 48138 ms
[DEBUG] 12/55/2011 14:55:53,533 StatistichengModule.PickUpPoller Fatte
20 righe in 49537 ms
[DEBUG] 12/56/2011 14:56:54,773 StatistichengModule.PickUpPoller Fatte
20 righe in 61240 ms
[DEBUG] 12/57/2011 14:57:54,157 StatistichengModule.PickUpPoller Fatte
20 righe in 59384 ms
[DEBUG] 12/58/2011 14:58:52,667 StatistichengModule.PickUpPoller Fatte
20 righe in 58510 ms
[DEBUG] 12/00/2011 15:00:20,518 StatistichengModule.PickUpPoller Fatte
20 righe in 87851 ms
[DEBUG] 12/02/2011 15:02:29,176 StatistichengModule.PickUpPoller Fatte
20 righe in 82548 ms
[DEBUG] 12/04/2011 15:04:52,302 StatistichengModule.PickUpPoller Fatte
20 righe in 77700 ms
[DEBUG] 12/07/2011 15:07:09,584 StatistichengModule.PickUpPoller Fatte
20 righe in 77727 ms
[DEBUG] 12/08/2011 15:08:26,778 StatistichengModule.PickUpPoller Fatte
20 righe in 77194 ms
[DEBUG] 12/09/2011 15:09:39,495 StatistichengModule.PickUpPoller Fatte
20 righe in 72717 ms
[DEBUG] 12/11/2011 15:11:04,032 StatistichengModule.PickUpPoller Fatte
20 righe in 84537 ms
[DEBUG] 12/12/2011 15:12:35,806 StatistichengModule.PickUpPoller Fatte
20 righe in 91774 ms
[DEBUG] 12/14/2011 15:14:13,789 StatistichengModule.PickUpPoller Fatte
20 righe in 97983 ms
[DEBUG] 12/15/2011 15:15:48,196 StatistichengModule.PickUpPoller Fatte
20 righe in 94407 ms
[DEBUG] 12/17/2011 15:17:16,819 StatistichengModule.PickUpPoller Fatte
20 righe in 88623 ms
[DEBUG] 12/19/2011 15:19:09,743 StatistichengModule.PickUpPoller Fatte
20 righe in 76255 ms
[DEBUG] 12/20/2011 15:20:16,357 StatistichengModule.PickUpPoller Fatte
20 righe in 66614 ms
[DEBUG] 12/21/2011 15:21:16,136 StatistichengModule.PickUpPoller Fatte
20 righe in 59779 ms
[DEBUG] 12/22/2011 15:22:29,535 StatistichengModule.PickUpPoller Fatte
20 righe in 73399 ms
[DEBUG] 12/23/2011 15:23:51,419 StatistichengModule.PickUpPoller Fatte
20 righe in 81884 ms
[DEBUG] 12/25/2011 15:25:14,380 StatistichengModule.PickUpPoller Fatte
20 righe in 82961 ms
[DEBUG] 12/26/2011 15:26:21,857 StatistichengModule.PickUpPoller Fatte
20 righe in 67477 ms
[DEBUG] 12/27/2011 15:27:34,632 StatistichengModule.PickUpPoller Fatte
20 righe in 72775 ms
[DEBUG] 12/28/2011 15:28:45,497 StatistichengModule.PickUpPoller Fatte
20 righe in 70864 ms
[DEBUG] 12/29/2011 15:29:44,987 StatistichengModule.PickUpPoller Fatte
20 righe in 59491 ms
[DEBUG] 12/31/2011 15:31:46,182 StatistichengModule.PickUpPoller Fatte
20 righe in 87324 ms
[DEBUG] 12/33/2011 15:33:16,841 StatistichengModule.PickUpPoller Fatte
20 righe in 90659 ms
[DEBUG] 12/34/2011 15:34:48,457 StatistichengModule.PickUpPoller Fatte
20 righe in 91616 ms
[DEBUG] 12/36/2011 15:36:17,984 StatistichengModule.PickUpPoller Fatte
20 righe in 89527 ms
[DEBUG] 12/38/2011 15:38:02,083 StatistichengModule.PickUpPoller Fatte
20 righe in 104099 ms
[DEBUG] 12/39/2011 

Re: [Neo4j] Big index solutions?

2011-03-13 Thread Peter Neubauer
Let me know how it goes!

/peter

Send from my mobile device, please excuse typos and brevity.
On Mar 13, 2011 9:18 AM, Massimo Lusetti mluse...@gmail.com wrote:
 On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash
 ashwin.jayaprak...@gmail.com wrote:

 Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1

 It's been resurrected by another author.

 Seems pretty interesting in the context... Going to give it a whirl

 Thanks!
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Massimo Lusetti
On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer
neubauer.pe...@gmail.com wrote:

 Let me know how it goes!

Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods

As you can see JDBM is slower then Lucene in my tests and the growing
trend in ms is steeper.

Every rounds parse 20 rows of my data, the data parsed is the same
for both obviously.

Let me know if you are interested in something more.
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Massimo Lusetti
On Sun, Mar 13, 2011 at 4:43 PM, Massimo Lusetti mluse...@gmail.com wrote:

 On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer
 neubauer.pe...@gmail.com wrote:

 Let me know how it goes!

 Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods

 As you can see JDBM is slower then Lucene in my tests and the growing
 trend in ms is steeper.

 Every rounds parse 20 rows of my data, the data parsed is the same
 for both obviously.

 Let me know if you are interested in something more.

BTW I've updated with some rounds using the internal neo4j index
framework, I've had to stop it sooner cause it would have taken too
much to complete.

Cheers
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Peter Neubauer
Thanks Massimo,
Will check it out tomorrow!

/peter

Send from my mobile device, please excuse typos and brevity.
On Mar 13, 2011 4:49 PM, Massimo Lusetti mluse...@gmail.com wrote:
 On Sun, Mar 13, 2011 at 4:43 PM, Massimo Lusetti mluse...@gmail.com
wrote:

 On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer
 neubauer.pe...@gmail.com wrote:

 Let me know how it goes!

 Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods

 As you can see JDBM is slower then Lucene in my tests and the growing
 trend in ms is steeper.

 Every rounds parse 20 rows of my data, the data parsed is the same
 for both obviously.

 Let me know if you are interested in something more.

 BTW I've updated with some rounds using the internal neo4j index
 framework, I've had to stop it sooner cause it would have taken too
 much to complete.

 Cheers
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-13 Thread Massimo Lusetti
On Sun, Mar 13, 2011 at 4:50 PM, Peter Neubauer
neubauer.pe...@gmail.com wrote:

 Thanks Massimo,
 Will check it out tomorrow!

You're welcome, it's easy and funny to play with neo4j.

Please know that every test has been conducted with neo4j 1.3.M04
untuned, that every rounds involve the creation of nodes and
relationships and not only the indexing stuff, plus, every insert
operation involve at least one check of the index while/before
inserting new data.

Hoping to be able to publish the code in a near future...

Cheers
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-11 Thread Peter Neubauer
Nice Ashwin,
sounds like a great ac, will definitily keep track of it. If I do a
Neo4j Index provider for JDBM2, would you be able to help me to tweak
it to behave good?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer!
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash
ashwin.jayaprak...@gmail.com wrote:
 Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1

 It's been resurrected by another author.

 Ashwin (http://www.ashwinjayaprakash.com)
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-11 Thread Massimo Lusetti
On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer
peter.neuba...@neotechnology.com wrote:

 Nice Ashwin,
 sounds like a great ac, will definitily keep track of it. If I do a
 Neo4j Index provider for JDBM2, would you be able to help me to tweak
 it to behave good?

Did it really fail Lucene with big index? I got Lucene index with
millions of entries and it run smoothly... I'm talking of pure
Lucene's index not the Neo4j implementation.

In fact I got issues with Lucene index within Neo4j and as soon as I
started to use an external managed index (Chenillekit Lucene module) I
got no issue with big index.

Curious about this thread.

Cheers
-- 
Massimo
http://meridio.blogspot.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-11 Thread Peter Neubauer
No,
things are not failing, it is just that in big insertion scenarios the
index lookup when joining nodes together into relationships, there is
often just an exact index needed in order to do that. We have good
experiences with Lucene, but when importing e.g. big OpenStreetMap
datasets, we need to run lookups during insertion, and we experience
Lucene taking a lot of time in these cases.

That is why I think exact lookups, like K/V stores, would be
interesting in these scenarios, as an alternative. They _should_
perform better then Lucene.

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Fri, Mar 11, 2011 at 10:27 AM, Massimo Lusetti mluse...@gmail.com wrote:
 On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer
 peter.neuba...@neotechnology.com wrote:

 Nice Ashwin,
 sounds like a great ac, will definitily keep track of it. If I do a
 Neo4j Index provider for JDBM2, would you be able to help me to tweak
 it to behave good?

 Did it really fail Lucene with big index? I got Lucene index with
 millions of entries and it run smoothly... I'm talking of pure
 Lucene's index not the Neo4j implementation.

 In fact I got issues with Lucene index within Neo4j and as soon as I
 started to use an external managed index (Chenillekit Lucene module) I
 got no issue with big index.

 Curious about this thread.

 Cheers
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-11 Thread Mattias Persson
2011/3/11 Massimo Lusetti mluse...@gmail.com

 On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer
 peter.neuba...@neotechnology.com wrote:

  Nice Ashwin,
  sounds like a great ac, will definitily keep track of it. If I do a
  Neo4j Index provider for JDBM2, would you be able to help me to tweak
  it to behave good?

 Did it really fail Lucene with big index? I got Lucene index with
 millions of entries and it run smoothly... I'm talking of pure
 Lucene's index not the Neo4j implementation.

 In fact I got issues with Lucene index within Neo4j and as soon as I
 started to use an external managed index (Chenillekit Lucene module) I
 got no issue with big index.


 Curious about this thread.


And I'm curious about why the neo4j lucene layer adds overhead and how your
code looks like in your own solution.


 Cheers
 --
 Massimo
 http://meridio.blogspot.com
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-11 Thread Ashwin Jayaprakash
Ah..well..you see, I haven't used it extensively yet :)

But I'm pretty sure that the author of JDBM2 will be of help in that regard.

There is another one I have my eyes on - http://code.google.com/p/babudb/

But there's always BerkleyDB.

Regards,
Ashwin.


On Fri, Mar 11, 2011 at 1:12 AM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Nice Ashwin,
 sounds like a great ac, will definitily keep track of it. If I do a
 Neo4j Index provider for JDBM2, would you be able to help me to tweak
 it to behave good?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer!
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash
 ashwin.jayaprak...@gmail.com wrote:
  Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1
 
  It's been resurrected by another author.
 
  Ashwin (http://www.ashwinjayaprakash.com)
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-08 Thread Ashwin Jayaprakash
Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1

It's been resurrected by another author.

Ashwin (http://www.ashwinjayaprakash.com)
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2011-03-07 Thread Peter Neubauer
Paul,
thanks for the info! I am down to the same, BerkeleyDB or JDBM.
Currently I am trying to get one of the BDB team to config the index
with me, so I can have some real impression of the performance (I am
just not able to tweak things right)

Meanwhile - is there anyone out there with a good knowledge of
BerkeleyDB who could help for a couple of hours to configure the Java
Edition to perform good in a Neo4j setup?

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Sun, Mar 6, 2011 at 9:14 PM, Paul A. Jackson paul.jack...@pb.com wrote:
 Hi Peter,

 I finished my testing.  I tried jdbm tree and map, HSQL, and jboss cache as a 
 wrapper around both HSQL and jdbm.  I found that jboss cache doesn't 
 necessarily persist to disk at the end of a transaction, so it fails the acid 
 test. HSQL is super fast in memory but was terrible when forced to commit 
 every transaction. (I tested 1.8, which doesn't support transactions, only 
 each update is a transaction.  Maybe 2.0 is better.)  So that leave jdbm.  
 The tree (surprisingly) was much faster than the map.  I know from experience 
 that jdbm doesn't scale well withy multiple threads, yet, in this application 
 I was thinking it may still be a good fit.  It would be nice though if they 
 at least used a ReentrantReadWriteLock rather than method synchronization to 
 allow concurrent reads.

 Hope that helps.

 Thanks,
 -Paul

 Paul Jackson, Principal Software Engineer
 Pitney Bowes Business Insight
 4200 Parliament Place | Suite 600 | Lanham, MD  20706-1844  USA
 O: 301.918.0850 | M: 703.862.0120 | www.pb.com
 paul.jack...@pb.com

 Every connection is a new opportunityT



 Please consider the environment before printing or forwarding this email. If 
 you do print this email, please recycle the paper.

 This email message may contain confidential, proprietary and/or privileged 
 information. It is intended only for the use of the intended recipient(s). If 
 you have received it in error, please immediately advise the sender by reply 
 email and then delete this email message. Any disclosure, copying, 
 distribution or use of the information contained in this email message to or 
 by anyone other than the intended recipient is strictly prohibited. Any views 
 expressed in this message are those of the individual sender, except where 
 the sender specifically states them to be the views of the Company.

 -Original Message-
 From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
 Behalf Of Peter Neubauer
 Sent: Tuesday, December 21, 2010 11:12 AM
 To: rick.bullo...@burningskysoftware.com
 Cc: Neo4j user discussions
 Subject: Re: [Neo4j] Big index solutions?

 Mmh,
 we are looking at JDBM now, and it seems to be promising. Will inform
 you on the progress of that!

 Cheers,

 /peter neubauer

 GTalk:      neubauer.peter
 Skype       peter.neubauer
 Phone       +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter      http://twitter.com/peterneubauer

 http://www.neo4j.org               - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Tue, Dec 21, 2010 at 12:19 PM, rick.bullo...@burningskysoftware.com
 rick.bullo...@burningskysoftware.com wrote:
 That should fit in RAM just fine, except for the effect of the string
 block/page size probably.  What about a btree backed by neo relationships?
 Not fast enough?

 - Reply message -
 From: Peter Neubauer peter.neuba...@neotechnology.com
 Date: Mon, Dec 20, 2010 3:54 pm
 Subject: [Neo4j] Big index solutions?
 To: Neo4j user discussions user@lists.neo4j.org

 Hi folks,
 I wonder if any of you has seen a fast exact index solution that works
 for the batchinserter (FAST) and over big indexes (like 100M strings
 of length 20characters) that don't fit in RAM.

 Lucene is unable to cache such indexes and gets slow.

 Does anybody have experiences with other reverse lookup solutions like
 Berkeley DB, Ehcache or others? Would be great to combine them with
 the batchinserter to be able to fast insert big edge-lists with
 node-index-lookups into Neo4j ...

 Cheers,

 /peter neubauer

 GTalk:      neubauer.peter
 Skype       peter.neubauer
 Phone       +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter      http://twitter.com/peterneubauer

 http://www.neo4j.org               - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo

Re: [Neo4j] Big index solutions?

2011-03-06 Thread Paul A. Jackson
Hi Peter,

I finished my testing.  I tried jdbm tree and map, HSQL, and jboss cache as a 
wrapper around both HSQL and jdbm.  I found that jboss cache doesn't 
necessarily persist to disk at the end of a transaction, so it fails the acid 
test. HSQL is super fast in memory but was terrible when forced to commit every 
transaction. (I tested 1.8, which doesn't support transactions, only each 
update is a transaction.  Maybe 2.0 is better.)  So that leave jdbm.  The tree 
(surprisingly) was much faster than the map.  I know from experience that jdbm 
doesn't scale well withy multiple threads, yet, in this application I was 
thinking it may still be a good fit.  It would be nice though if they at least 
used a ReentrantReadWriteLock rather than method synchronization to allow 
concurrent reads.

Hope that helps.

Thanks,
-Paul

Paul Jackson, Principal Software Engineer
Pitney Bowes Business Insight
4200 Parliament Place | Suite 600 | Lanham, MD  20706-1844  USA
O: 301.918.0850 | M: 703.862.0120 | www.pb.com
paul.jack...@pb.com 
 
Every connection is a new opportunityT
 
 
 
Please consider the environment before printing or forwarding this email. If 
you do print this email, please recycle the paper.
 
This email message may contain confidential, proprietary and/or privileged 
information. It is intended only for the use of the intended recipient(s). If 
you have received it in error, please immediately advise the sender by reply 
email and then delete this email message. Any disclosure, copying, distribution 
or use of the information contained in this email message to or by anyone other 
than the intended recipient is strictly prohibited. Any views expressed in this 
message are those of the individual sender, except where the sender 
specifically states them to be the views of the Company.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Peter Neubauer
Sent: Tuesday, December 21, 2010 11:12 AM
To: rick.bullo...@burningskysoftware.com
Cc: Neo4j user discussions
Subject: Re: [Neo4j] Big index solutions?

Mmh,
we are looking at JDBM now, and it seems to be promising. Will inform
you on the progress of that!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Dec 21, 2010 at 12:19 PM, rick.bullo...@burningskysoftware.com
rick.bullo...@burningskysoftware.com wrote:
 That should fit in RAM just fine, except for the effect of the string
 block/page size probably.  What about a btree backed by neo relationships?
 Not fast enough?

 - Reply message -
 From: Peter Neubauer peter.neuba...@neotechnology.com
 Date: Mon, Dec 20, 2010 3:54 pm
 Subject: [Neo4j] Big index solutions?
 To: Neo4j user discussions user@lists.neo4j.org

 Hi folks,
 I wonder if any of you has seen a fast exact index solution that works
 for the batchinserter (FAST) and over big indexes (like 100M strings
 of length 20characters) that don't fit in RAM.

 Lucene is unable to cache such indexes and gets slow.

 Does anybody have experiences with other reverse lookup solutions like
 Berkeley DB, Ehcache or others? Would be great to combine them with
 the batchinserter to be able to fast insert big edge-lists with
 node-index-lookups into Neo4j ...

 Cheers,

 /peter neubauer

 GTalk:      neubauer.peter
 Skype       peter.neubauer
 Phone       +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter      http://twitter.com/peterneubauer

 http://www.neo4j.org               - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2010-12-21 Thread rick.bullo...@burningskysoftware.com
That should fit in RAM just fine, except for the effect of the string 
block/page size probably.  What about a btree backed by neo relationships? Not 
fast enough?

- Reply message -
From: Peter Neubauer peter.neuba...@neotechnology.com
Date: Mon, Dec 20, 2010 3:54 pm
Subject: [Neo4j] Big index solutions?
To: Neo4j user discussions user@lists.neo4j.org

Hi folks,
I wonder if any of you has seen a fast exact index solution that works
for the batchinserter (FAST) and over big indexes (like 100M strings
of length 20characters) that don't fit in RAM.

Lucene is unable to cache such indexes and gets slow.

Does anybody have experiences with other reverse lookup solutions like
Berkeley DB, Ehcache or others? Would be great to combine them with
the batchinserter to be able to fast insert big edge-lists with
node-index-lookups into Neo4j ...

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2010-12-20 Thread Paul A. Jackson
I do not have any direct experience but was wondering if anyone has experience 
with Jboss Cache over JDBM and could speculate on it's applicability.

Also, I would like to see this fast exact indexer available with 
GraphDatabaseService, not just BatchInserter, as I am not able to use the 
BatchInserter because I need to be able to query the graph and update existing 
nodes during loads.

Thanks,
-Paul


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Peter Neubauer
Sent: Monday, December 20, 2010 3:54 PM
To: Neo4j user discussions
Subject: [Neo4j] Big index solutions?

Hi folks,
I wonder if any of you has seen a fast exact index solution that works
for the batchinserter (FAST) and over big indexes (like 100M strings
of length 20characters) that don't fit in RAM.

Lucene is unable to cache such indexes and gets slow.

Does anybody have experiences with other reverse lookup solutions like
Berkeley DB, Ehcache or others? Would be great to combine them with
the batchinserter to be able to fast insert big edge-lists with
node-index-lookups into Neo4j ...

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Big index solutions?

2010-12-20 Thread Craig Taverner
Actually, I think my composite index design would work for this (if it was
completed). At least the design has no node deletion actions (unlike the
RTree) and so can support the batch-inserter (in principle, not in code,
since it is currently coded to the normal API). Might be worth a try :-)

It does not store anything memory, other than the neo4j caches (or batch
inserter cache if we port to that API), so should scale like you want.

On Mon, Dec 20, 2010 at 9:54 PM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Hi folks,
 I wonder if any of you has seen a fast exact index solution that works
 for the batchinserter (FAST) and over big indexes (like 100M strings
 of length 20characters) that don't fit in RAM.

 Lucene is unable to cache such indexes and gets slow.

 Does anybody have experiences with other reverse lookup solutions like
 Berkeley DB, Ehcache or others? Would be great to combine them with
 the batchinserter to be able to fast insert big edge-lists with
 node-index-lookups into Neo4j ...

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user