Re: [Neo4j] Big index solutions?
On Mon, Mar 14, 2011 at 9:26 AM, Mattias Persson matt...@neotechnology.com wrote: Hmm, that doesn't look very good. I'm very keen on looking at your code for this test, if possible, since I haven't experienced a slowdown like this before. I just did an insertion test of 1.5M indexed nodes and there was virtually no slowdown at all. I'm going let you know privately the URL for downloading. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Fri, Mar 11, 2011 at 10:31 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: No, things are not failing, it is just that in big insertion scenarios the index lookup when joining nodes together into relationships, there is often just an exact index needed in order to do that. We have good experiences with Lucene, but when importing e.g. big OpenStreetMap datasets, we need to run lookups during insertion, and we experience Lucene taking a lot of time in these cases. That's seems very similar to my use case... I need to import big, bigger then OpenStreetMap, data and I'm using plain Lucene to do indexing and query while indexing... That is why I think exact lookups, like K/V stores, would be interesting in these scenarios, as an alternative. They _should_ perform better then Lucene. I'm willing to listen to your results... Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Fri, Mar 11, 2011 at 10:37 AM, Mattias Persson matt...@neotechnology.com wrote: And I'm curious about why the neo4j lucene layer adds overhead and how your code looks like in your own solution. I really don't know, didn't had time to investigate in neo4j code but I'm indexing a SHA1 hash key pointing to a Node just to prove presence of the data within the DB using plain Lucene solutions (using the ChenilleKit.codehaus.org project) and your Index framework to compare them and here are the results doing 20 subsequent chunks/rows of data With the neo4j Index framework: [DEBUG] 11/56/2011 14:56:08,438 StatistichengModule.PickUpPoller Fatte 20 righe in 108482 ms [DEBUG] 11/59/2011 14:59:51,061 StatistichengModule.PickUpPoller Fatte 20 righe in 146664 ms [DEBUG] 11/02/2011 15:02:30,317 StatistichengModule.PickUpPoller Fatte 20 righe in 159256 ms [DEBUG] 11/05/2011 15:05:08,680 StatistichengModule.PickUpPoller Fatte 20 righe in 158363 ms [DEBUG] 11/08/2011 15:08:34,501 StatistichengModule.PickUpPoller Fatte 20 righe in 205821 ms [DEBUG] 11/12/2011 15:12:51,690 StatistichengModule.PickUpPoller Fatte 20 righe in 257189 ms [DEBUG] 11/17/2011 15:17:11,589 StatistichengModule.PickUpPoller Fatte 20 righe in 259899 ms [DEBUG] 11/21/2011 15:21:45,109 StatistichengModule.PickUpPoller Fatte 20 righe in 273520 ms [DEBUG] 11/26/2011 15:26:04,802 StatistichengModule.PickUpPoller Fatte 20 righe in 259693 ms [DEBUG] 11/30/2011 15:30:32,269 StatistichengModule.PickUpPoller Fatte 20 righe in 267467 ms [DEBUG] 11/34/2011 15:34:32,804 StatistichengModule.PickUpPoller Fatte 20 righe in 240535 ms [DEBUG] 11/43/2011 15:43:14,960 StatistichengModule.PickUpPoller Fatte 20 righe in 355129 ms [DEBUG] 11/50/2011 15:50:22,323 StatistichengModule.PickUpPoller Fatte 20 righe in 427363 ms [DEBUG] 11/58/2011 15:58:12,846 StatistichengModule.PickUpPoller Fatte 20 righe in 470523 ms With the plain Lucene solution (external to neo4j db): [DEBUG] 12/55/2011 14:55:03,997 StatistichengModule.PickUpPoller Fatte 20 righe in 48138 ms [DEBUG] 12/55/2011 14:55:53,533 StatistichengModule.PickUpPoller Fatte 20 righe in 49537 ms [DEBUG] 12/56/2011 14:56:54,773 StatistichengModule.PickUpPoller Fatte 20 righe in 61240 ms [DEBUG] 12/57/2011 14:57:54,157 StatistichengModule.PickUpPoller Fatte 20 righe in 59384 ms [DEBUG] 12/58/2011 14:58:52,667 StatistichengModule.PickUpPoller Fatte 20 righe in 58510 ms [DEBUG] 12/00/2011 15:00:20,518 StatistichengModule.PickUpPoller Fatte 20 righe in 87851 ms [DEBUG] 12/02/2011 15:02:29,176 StatistichengModule.PickUpPoller Fatte 20 righe in 82548 ms [DEBUG] 12/04/2011 15:04:52,302 StatistichengModule.PickUpPoller Fatte 20 righe in 77700 ms [DEBUG] 12/07/2011 15:07:09,584 StatistichengModule.PickUpPoller Fatte 20 righe in 77727 ms [DEBUG] 12/08/2011 15:08:26,778 StatistichengModule.PickUpPoller Fatte 20 righe in 77194 ms [DEBUG] 12/09/2011 15:09:39,495 StatistichengModule.PickUpPoller Fatte 20 righe in 72717 ms [DEBUG] 12/11/2011 15:11:04,032 StatistichengModule.PickUpPoller Fatte 20 righe in 84537 ms [DEBUG] 12/12/2011 15:12:35,806 StatistichengModule.PickUpPoller Fatte 20 righe in 91774 ms [DEBUG] 12/14/2011 15:14:13,789 StatistichengModule.PickUpPoller Fatte 20 righe in 97983 ms [DEBUG] 12/15/2011 15:15:48,196 StatistichengModule.PickUpPoller Fatte 20 righe in 94407 ms [DEBUG] 12/17/2011 15:17:16,819 StatistichengModule.PickUpPoller Fatte 20 righe in 88623 ms [DEBUG] 12/19/2011 15:19:09,743 StatistichengModule.PickUpPoller Fatte 20 righe in 76255 ms [DEBUG] 12/20/2011 15:20:16,357 StatistichengModule.PickUpPoller Fatte 20 righe in 66614 ms [DEBUG] 12/21/2011 15:21:16,136 StatistichengModule.PickUpPoller Fatte 20 righe in 59779 ms [DEBUG] 12/22/2011 15:22:29,535 StatistichengModule.PickUpPoller Fatte 20 righe in 73399 ms [DEBUG] 12/23/2011 15:23:51,419 StatistichengModule.PickUpPoller Fatte 20 righe in 81884 ms [DEBUG] 12/25/2011 15:25:14,380 StatistichengModule.PickUpPoller Fatte 20 righe in 82961 ms [DEBUG] 12/26/2011 15:26:21,857 StatistichengModule.PickUpPoller Fatte 20 righe in 67477 ms [DEBUG] 12/27/2011 15:27:34,632 StatistichengModule.PickUpPoller Fatte 20 righe in 72775 ms [DEBUG] 12/28/2011 15:28:45,497 StatistichengModule.PickUpPoller Fatte 20 righe in 70864 ms [DEBUG] 12/29/2011 15:29:44,987 StatistichengModule.PickUpPoller Fatte 20 righe in 59491 ms [DEBUG] 12/31/2011 15:31:46,182 StatistichengModule.PickUpPoller Fatte 20 righe in 87324 ms [DEBUG] 12/33/2011 15:33:16,841 StatistichengModule.PickUpPoller Fatte 20 righe in 90659 ms [DEBUG] 12/34/2011 15:34:48,457 StatistichengModule.PickUpPoller Fatte 20 righe in 91616 ms [DEBUG] 12/36/2011 15:36:17,984 StatistichengModule.PickUpPoller Fatte 20 righe in 89527 ms [DEBUG] 12/38/2011 15:38:02,083 StatistichengModule.PickUpPoller Fatte 20 righe in 104099 ms [DEBUG] 12/39/2011
Re: [Neo4j] Big index solutions?
Let me know how it goes! /peter Send from my mobile device, please excuse typos and brevity. On Mar 13, 2011 9:18 AM, Massimo Lusetti mluse...@gmail.com wrote: On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1 It's been resurrected by another author. Seems pretty interesting in the context... Going to give it a whirl Thanks! -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer neubauer.pe...@gmail.com wrote: Let me know how it goes! Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods As you can see JDBM is slower then Lucene in my tests and the growing trend in ms is steeper. Every rounds parse 20 rows of my data, the data parsed is the same for both obviously. Let me know if you are interested in something more. -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Sun, Mar 13, 2011 at 4:43 PM, Massimo Lusetti mluse...@gmail.com wrote: On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer neubauer.pe...@gmail.com wrote: Let me know how it goes! Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods As you can see JDBM is slower then Lucene in my tests and the growing trend in ms is steeper. Every rounds parse 20 rows of my data, the data parsed is the same for both obviously. Let me know if you are interested in something more. BTW I've updated with some rounds using the internal neo4j index framework, I've had to stop it sooner cause it would have taken too much to complete. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Thanks Massimo, Will check it out tomorrow! /peter Send from my mobile device, please excuse typos and brevity. On Mar 13, 2011 4:49 PM, Massimo Lusetti mluse...@gmail.com wrote: On Sun, Mar 13, 2011 at 4:43 PM, Massimo Lusetti mluse...@gmail.com wrote: On Sun, Mar 13, 2011 at 10:29 AM, Peter Neubauer neubauer.pe...@gmail.com wrote: Let me know how it goes! Here are the results: http://dl.dropbox.com/u/22802242/neo4j-stats.ods As you can see JDBM is slower then Lucene in my tests and the growing trend in ms is steeper. Every rounds parse 20 rows of my data, the data parsed is the same for both obviously. Let me know if you are interested in something more. BTW I've updated with some rounds using the internal neo4j index framework, I've had to stop it sooner cause it would have taken too much to complete. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Sun, Mar 13, 2011 at 4:50 PM, Peter Neubauer neubauer.pe...@gmail.com wrote: Thanks Massimo, Will check it out tomorrow! You're welcome, it's easy and funny to play with neo4j. Please know that every test has been conducted with neo4j 1.3.M04 untuned, that every rounds involve the creation of nodes and relationships and not only the indexing stuff, plus, every insert operation involve at least one check of the index while/before inserting new data. Hoping to be able to publish the code in a near future... Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Nice Ashwin, sounds like a great ac, will definitily keep track of it. If I do a Neo4j Index provider for JDBM2, would you be able to help me to tweak it to behave good? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer! Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1 It's been resurrected by another author. Ashwin (http://www.ashwinjayaprakash.com) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Nice Ashwin, sounds like a great ac, will definitily keep track of it. If I do a Neo4j Index provider for JDBM2, would you be able to help me to tweak it to behave good? Did it really fail Lucene with big index? I got Lucene index with millions of entries and it run smoothly... I'm talking of pure Lucene's index not the Neo4j implementation. In fact I got issues with Lucene index within Neo4j and as soon as I started to use an external managed index (Chenillekit Lucene module) I got no issue with big index. Curious about this thread. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
No, things are not failing, it is just that in big insertion scenarios the index lookup when joining nodes together into relationships, there is often just an exact index needed in order to do that. We have good experiences with Lucene, but when importing e.g. big OpenStreetMap datasets, we need to run lookups during insertion, and we experience Lucene taking a lot of time in these cases. That is why I think exact lookups, like K/V stores, would be interesting in these scenarios, as an alternative. They _should_ perform better then Lucene. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Mar 11, 2011 at 10:27 AM, Massimo Lusetti mluse...@gmail.com wrote: On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Nice Ashwin, sounds like a great ac, will definitily keep track of it. If I do a Neo4j Index provider for JDBM2, would you be able to help me to tweak it to behave good? Did it really fail Lucene with big index? I got Lucene index with millions of entries and it run smoothly... I'm talking of pure Lucene's index not the Neo4j implementation. In fact I got issues with Lucene index within Neo4j and as soon as I started to use an external managed index (Chenillekit Lucene module) I got no issue with big index. Curious about this thread. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
2011/3/11 Massimo Lusetti mluse...@gmail.com On Fri, Mar 11, 2011 at 10:12 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Nice Ashwin, sounds like a great ac, will definitily keep track of it. If I do a Neo4j Index provider for JDBM2, would you be able to help me to tweak it to behave good? Did it really fail Lucene with big index? I got Lucene index with millions of entries and it run smoothly... I'm talking of pure Lucene's index not the Neo4j implementation. In fact I got issues with Lucene index within Neo4j and as soon as I started to use an external managed index (Chenillekit Lucene module) I got no issue with big index. Curious about this thread. And I'm curious about why the neo4j lucene layer adds overhead and how your code looks like in your own solution. Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Ah..well..you see, I haven't used it extensively yet :) But I'm pretty sure that the author of JDBM2 will be of help in that regard. There is another one I have my eyes on - http://code.google.com/p/babudb/ But there's always BerkleyDB. Regards, Ashwin. On Fri, Mar 11, 2011 at 1:12 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Nice Ashwin, sounds like a great ac, will definitily keep track of it. If I do a Neo4j Index provider for JDBM2, would you be able to help me to tweak it to behave good? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer! Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Mar 8, 2011 at 7:26 PM, Ashwin Jayaprakash ashwin.jayaprak...@gmail.com wrote: Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1 It's been resurrected by another author. Ashwin (http://www.ashwinjayaprakash.com) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Try JDBM2 - http://code.google.com/p/jdbm2/issues/detail?id=1 It's been resurrected by another author. Ashwin (http://www.ashwinjayaprakash.com) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Paul, thanks for the info! I am down to the same, BerkeleyDB or JDBM. Currently I am trying to get one of the BDB team to config the index with me, so I can have some real impression of the performance (I am just not able to tweak things right) Meanwhile - is there anyone out there with a good knowledge of BerkeleyDB who could help for a couple of hours to configure the Java Edition to perform good in a Neo4j setup? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Mar 6, 2011 at 9:14 PM, Paul A. Jackson paul.jack...@pb.com wrote: Hi Peter, I finished my testing. I tried jdbm tree and map, HSQL, and jboss cache as a wrapper around both HSQL and jdbm. I found that jboss cache doesn't necessarily persist to disk at the end of a transaction, so it fails the acid test. HSQL is super fast in memory but was terrible when forced to commit every transaction. (I tested 1.8, which doesn't support transactions, only each update is a transaction. Maybe 2.0 is better.) So that leave jdbm. The tree (surprisingly) was much faster than the map. I know from experience that jdbm doesn't scale well withy multiple threads, yet, in this application I was thinking it may still be a good fit. It would be nice though if they at least used a ReentrantReadWriteLock rather than method synchronization to allow concurrent reads. Hope that helps. Thanks, -Paul Paul Jackson, Principal Software Engineer Pitney Bowes Business Insight 4200 Parliament Place | Suite 600 | Lanham, MD 20706-1844 USA O: 301.918.0850 | M: 703.862.0120 | www.pb.com paul.jack...@pb.com Every connection is a new opportunityT Please consider the environment before printing or forwarding this email. If you do print this email, please recycle the paper. This email message may contain confidential, proprietary and/or privileged information. It is intended only for the use of the intended recipient(s). If you have received it in error, please immediately advise the sender by reply email and then delete this email message. Any disclosure, copying, distribution or use of the information contained in this email message to or by anyone other than the intended recipient is strictly prohibited. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of the Company. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Tuesday, December 21, 2010 11:12 AM To: rick.bullo...@burningskysoftware.com Cc: Neo4j user discussions Subject: Re: [Neo4j] Big index solutions? Mmh, we are looking at JDBM now, and it seems to be promising. Will inform you on the progress of that! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Dec 21, 2010 at 12:19 PM, rick.bullo...@burningskysoftware.com rick.bullo...@burningskysoftware.com wrote: That should fit in RAM just fine, except for the effect of the string block/page size probably. What about a btree backed by neo relationships? Not fast enough? - Reply message - From: Peter Neubauer peter.neuba...@neotechnology.com Date: Mon, Dec 20, 2010 3:54 pm Subject: [Neo4j] Big index solutions? To: Neo4j user discussions user@lists.neo4j.org Hi folks, I wonder if any of you has seen a fast exact index solution that works for the batchinserter (FAST) and over big indexes (like 100M strings of length 20characters) that don't fit in RAM. Lucene is unable to cache such indexes and gets slow. Does anybody have experiences with other reverse lookup solutions like Berkeley DB, Ehcache or others? Would be great to combine them with the batchinserter to be able to fast insert big edge-lists with node-index-lookups into Neo4j ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo
Re: [Neo4j] Big index solutions?
Hi Peter, I finished my testing. I tried jdbm tree and map, HSQL, and jboss cache as a wrapper around both HSQL and jdbm. I found that jboss cache doesn't necessarily persist to disk at the end of a transaction, so it fails the acid test. HSQL is super fast in memory but was terrible when forced to commit every transaction. (I tested 1.8, which doesn't support transactions, only each update is a transaction. Maybe 2.0 is better.) So that leave jdbm. The tree (surprisingly) was much faster than the map. I know from experience that jdbm doesn't scale well withy multiple threads, yet, in this application I was thinking it may still be a good fit. It would be nice though if they at least used a ReentrantReadWriteLock rather than method synchronization to allow concurrent reads. Hope that helps. Thanks, -Paul Paul Jackson, Principal Software Engineer Pitney Bowes Business Insight 4200 Parliament Place | Suite 600 | Lanham, MD 20706-1844 USA O: 301.918.0850 | M: 703.862.0120 | www.pb.com paul.jack...@pb.com Every connection is a new opportunityT Please consider the environment before printing or forwarding this email. If you do print this email, please recycle the paper. This email message may contain confidential, proprietary and/or privileged information. It is intended only for the use of the intended recipient(s). If you have received it in error, please immediately advise the sender by reply email and then delete this email message. Any disclosure, copying, distribution or use of the information contained in this email message to or by anyone other than the intended recipient is strictly prohibited. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of the Company. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Tuesday, December 21, 2010 11:12 AM To: rick.bullo...@burningskysoftware.com Cc: Neo4j user discussions Subject: Re: [Neo4j] Big index solutions? Mmh, we are looking at JDBM now, and it seems to be promising. Will inform you on the progress of that! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Dec 21, 2010 at 12:19 PM, rick.bullo...@burningskysoftware.com rick.bullo...@burningskysoftware.com wrote: That should fit in RAM just fine, except for the effect of the string block/page size probably. What about a btree backed by neo relationships? Not fast enough? - Reply message - From: Peter Neubauer peter.neuba...@neotechnology.com Date: Mon, Dec 20, 2010 3:54 pm Subject: [Neo4j] Big index solutions? To: Neo4j user discussions user@lists.neo4j.org Hi folks, I wonder if any of you has seen a fast exact index solution that works for the batchinserter (FAST) and over big indexes (like 100M strings of length 20characters) that don't fit in RAM. Lucene is unable to cache such indexes and gets slow. Does anybody have experiences with other reverse lookup solutions like Berkeley DB, Ehcache or others? Would be great to combine them with the batchinserter to be able to fast insert big edge-lists with node-index-lookups into Neo4j ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
That should fit in RAM just fine, except for the effect of the string block/page size probably. What about a btree backed by neo relationships? Not fast enough? - Reply message - From: Peter Neubauer peter.neuba...@neotechnology.com Date: Mon, Dec 20, 2010 3:54 pm Subject: [Neo4j] Big index solutions? To: Neo4j user discussions user@lists.neo4j.org Hi folks, I wonder if any of you has seen a fast exact index solution that works for the batchinserter (FAST) and over big indexes (like 100M strings of length 20characters) that don't fit in RAM. Lucene is unable to cache such indexes and gets slow. Does anybody have experiences with other reverse lookup solutions like Berkeley DB, Ehcache or others? Would be great to combine them with the batchinserter to be able to fast insert big edge-lists with node-index-lookups into Neo4j ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
I do not have any direct experience but was wondering if anyone has experience with Jboss Cache over JDBM and could speculate on it's applicability. Also, I would like to see this fast exact indexer available with GraphDatabaseService, not just BatchInserter, as I am not able to use the BatchInserter because I need to be able to query the graph and update existing nodes during loads. Thanks, -Paul -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Peter Neubauer Sent: Monday, December 20, 2010 3:54 PM To: Neo4j user discussions Subject: [Neo4j] Big index solutions? Hi folks, I wonder if any of you has seen a fast exact index solution that works for the batchinserter (FAST) and over big indexes (like 100M strings of length 20characters) that don't fit in RAM. Lucene is unable to cache such indexes and gets slow. Does anybody have experiences with other reverse lookup solutions like Berkeley DB, Ehcache or others? Would be great to combine them with the batchinserter to be able to fast insert big edge-lists with node-index-lookups into Neo4j ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Big index solutions?
Actually, I think my composite index design would work for this (if it was completed). At least the design has no node deletion actions (unlike the RTree) and so can support the batch-inserter (in principle, not in code, since it is currently coded to the normal API). Might be worth a try :-) It does not store anything memory, other than the neo4j caches (or batch inserter cache if we port to that API), so should scale like you want. On Mon, Dec 20, 2010 at 9:54 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Hi folks, I wonder if any of you has seen a fast exact index solution that works for the batchinserter (FAST) and over big indexes (like 100M strings of length 20characters) that don't fit in RAM. Lucene is unable to cache such indexes and gets slow. Does anybody have experiences with other reverse lookup solutions like Berkeley DB, Ehcache or others? Would be great to combine them with the batchinserter to be able to fast insert big edge-lists with node-index-lookups into Neo4j ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user