Re: [Neo4j] KeyError in python

2011-12-13 Thread Jacopo Farina
I tried again to run the program and still got the same error, at the same 
point. I'm running it on Ubuntu 10.10, but I could try on a pc with Windows 
7 and more RAM.
Cheers,
Jacopo
___
NOTICE: THIS MAILING LIST IS BEING SWITCHED TO GOOGLE GROUPS, please register 
and consider posting at https://groups.google.com/forum/#!forum/neo4j

Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] KeyError in python

2011-12-06 Thread Jacopo Farina
Hi all!
I'm using python embedded in order to classify all the nodes in a neo4j
graph previously labeled with properties.
The graph is about 3.9GB with 7M nodes and 30-40M relationships. I've two
questions:
1- the program worked correctly for hours then crashed suddenly with this
error:
Traceback (most recent call last):
File ''assegnaCategoria.py'', line l4, in module
  for n in db.nodes:
File ''/usr/local/lib/python2.6/dist-packages/neo4)/ _ init _ .py'', line
44, in _ getitem _
 return sel_.get(items)
File ''/usr/local/lib/python2.6/dist-packages/neo4)/ _ init _ .py'', line
6l, in get
rethrow current exception as(KeyError)
File ''/usr/local/lib/python2.6/dist-packages/neo4)/util.py'', line 76, in
rethrow_current_exception_as
raise ErrorClass(msg)
KeyError: u'Node[9327924]'

2-the program is very slow.I started it at 18 pm end it crashed at ~65% of
the work at 4 am It only reads the database, never changing it, is there a
way to set it to use the cache intensively? I would put it in /dev/shm/ but
my RAM is 3GB and the database is bigger.

The code is this http://codepad.org/leSwqhnc

Cheers,
Jacopo
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] google n grams data set in neo4j

2011-11-28 Thread Jacopo Farina
That's AMAZING!
I was just thinking about using Neo4j to store some extracted n-grams, I
previously did it with a SQLite database but maybe using a graph an
application could surf between nodes more efficiently.
One question: is it possible to download the google ngram corpus release
(or at least some part of it) for free (and legally, of course) ? I've
found just this page (
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13) but
it seems I would have to pay.
Cheers,
Jacopo Farina


2011/11/28 Peter Neubauer peter.neuba...@neotechnology.com

 Seriously cool stuff René!

 I would love to hear more as the project progresses! Also, maybe the
 dataset could be added to the example dataset collection for playing around
 with neo4j? WDYT?

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org  - NOSQL for the Enterprise.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.


 2011/11/27 René Pickhardt r.pickha...@googlemail.com

  Hey Everyone,
 
  I am curently advising two high school students for a programing project
  for some german student competition.
 
  They have inserted the German google n-gram data set several GB of
 natural
  language to a neo4j data base and used this to make sentence prediction
 to
  improve typing speed.
 
  The entire project is far from being complete but there is some code
  available on how we modelled n-grams in neo4j and what we used for
  prediction
 
  Both approaches very basic and as you would expect them. Still they
 already
  work in a decent way showing again the power of neo4j.
 
  We would be happy for some feedback thoghts and suggestions for further
  improvement. Find more info in my blog post:
 
 
 http://www.rene-pickhardt.de/download-google-n-gram-data-set-and-neo4j-source-code-for-storing-it/
 
  or in the source code:
 
 
 http://code.google.com/p/complet/source/browse/trunk/Completion_DataCollector/src/completion_datacollector/Main.java?spec=svn64r=64
 
  by the way. even though the code is just hacked down it uses hashmaps to
  store nodes in memory and increase inserting speed. and builds the lucene
  index later. Of course it would be even better to use the batch inserter.
 
  best regards René
  --
  --
  mobile: +49 (0)176 6433 2481
 
  Skype: +49 (0)6131 / 4958926
 
  Skype: rene.pickhardt
 
  www.rene-pickhardt.de
   http://www.beijing-china-blog.com
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Python embedded and Java batch inserter

2011-10-18 Thread Jacopo Farina
Yes! It worked!
Thanks a lot.
Jacopo Farina

2011/10/18 Jacob Hansson jacob.hans...@neotechnology.com

 I think this might be another case of a problem that would be a lot easier
 to solve if the python bindings pushed out full stack traces. It's in my
 backlog to fix that.

 I'm gonna *guess* that the problem has to do with store upgrades. Are you
 using the 1.5.M02 version of neo4j embedded?

 If so, you need to tell the database it's ok to upgrade the store, like
 this:


 from neo4j import GraphDatabase
 db = new GraphDatabase('somefolder', allow_store_upgrade=true)

 On Mon, Oct 17, 2011 at 4:19 PM, Jacopo Farina jacopo1.far...@gmail.com
 wrote:

  Hi all,
  I'm trying to work with Neo4j embedded with Python, but, if I try to open
 a
  database created with the Java batch inserter (neo4j 1.4) by using
  Python-embedded, I get the error described here
  https://trac.neo4j.org/ticket/275
  The database works correctly when opened in Neoclipse or with neo4j
  stand-alone server (by replacing the data folder), and if I create a new
  database directly with Python embedded it works too. Is there a way to
  solve
  this problem?
 
  Then, how much is fast the neo4j-REST traverser compared with the Java
  embedded traverser on a 3 million nodes graph?
 

 As far as I know, we don't have any benchmarks that compare this. The speed
 difference should not be huge though. The server is a lot slower for reads
 and writes, but for traversals, the overhead of the server plays a much
 smaller part.

 Sidenote: The size of your graph should generally not affect traversal
 speed
 (unless the graph does not fit in memory), what matters is how much of the
 graph you traverse. In java embedded, we used to say that you get maybe 1
 000 000 relationships traversed per second (although speed can be both
 higher and much lower, depending on what your traversal does).


 
  Cheers!
  Jacopo Farinat
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 


 /Jake

 --
 Jacob Hansson
 Phone: +46 (0) 763503395
 Twitter: @jakewins
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Python embedded and Java batch inserter

2011-10-17 Thread Jacopo Farina
Hi all,
I'm trying to work with Neo4j embedded with Python, but, if I try to open a
database created with the Java batch inserter (neo4j 1.4) by using
Python-embedded, I get the error described here
https://trac.neo4j.org/ticket/275
The database works correctly when opened in Neoclipse or with neo4j
stand-alone server (by replacing the data folder), and if I create a new
database directly with Python embedded it works too. Is there a way to solve
this problem?

Then, how much is fast the neo4j-REST traverser compared with the Java
embedded traverser on a 3 million nodes graph?

Cheers!
Jacopo Farina
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Open Data Sets?

2011-09-29 Thread Jacopo Farina
Thanks a lot for the datasets!
I created a graph with English Wikipedia pages and category names and
relationships, starting from the .sql files available on the wikimedia site

http://dumps.wikimedia.org/backup-index.html

The resulting graph is about 3GB of size, so I can't easily distribute it,
but the code is trivial, feel free to use it: http://pastebin.com/mj3bkDmZ

It contains an utility class to read the file line per line, I'm sorry for
the comments in Italian. The program avoids to load most of the stub or
redirect categories, the execution should last 5-7 hours.

Cheers,
Jacopo Farina


2011/9/29 Peter Neubauer peter.neuba...@neotechnology.com

 Rene,
 that would be very cool to make available, thanks for sharing!

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 2011/9/29 René Pickhardt r.pickha...@googlemail.com:
  I recently made a blog article on a data source for 119 network graphs.
 
 
 http://www.rene-pickhardt.de/download-network-graph-data-sets-from-konect-the-koblenz-network-colection/
 
  I also post new data sets in the dataset category of my blog:
  http://www.rene-pickhardt.de/cat/data/data-sets/
 
  Have fun. Soon i will make available a social network graph extracted
 from
  wikipedia.
  Am 29.09.2011 00:09 schrieb McKinley mckinley1...@gmail.com:
  I want to create some demos of relational database extraction into Neo4j
  but
  I cannot share the data I use with the public. I know there are several
  resources for open data sets on the net, but has anyone on the list
  already
  found a nice, large data set that you are happy with?
 
  Thanks,
 
  McKinley
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Executing arbitrary code through REST (was: Specifying best first order of traverse over REST API)

2011-08-26 Thread Jacopo Farina
Nice!
is there a way to run it easily from python with neo4jrestclient, avoiding
creating an http request manually ? In general, I wasn't able to understand
how to run a Gremlin script, or just a query, in python through the
restclient library.
Probably is a very simple business, but I started using it in Python just
yesterday.

Cheers,
Jacopo

2011/8/25 Peter Neubauer peter.neuba...@neotechnology.com

 Guys,
 with the custom sorting in Lucene and this thread coming up all the time, I
 took the time to document the execution of arbitrary Groovy and thus, Java
 calls through REST. In the example below, there are calls to Neo4j APIs,
 Gremlin stuff and custom sorting using Lucene classes, and return of a
 Neo4j
 search hit object.

 You can do all this in a Neo4j Server plugin, but if you need to, this is
 an
 example on how to do it with only REST.


 http://docs.neo4j.org/chunked/snapshot/gremlin-plugin.html#rest-api-send-an-arbitrary-groovy-script---lucene-sorting

 Hope that helps for future reference!

 Cheers,

 /peter neubauer

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


 On Thu, Aug 25, 2011 at 1:00 AM, Matt Luongo m...@scholr.ly wrote:

  +1, we could really use that. Client-side sorting sucks.
 
  --
  Matt Luongo
  Co-Founder, Scholr.ly
 
 
 
  On Wed, Aug 24, 2011 at 4:43 PM, Aseem Kishore aseem.kish...@gmail.com
  wrote:
 
   I've just spent a bunch of time reading into how one can control the
   ordering of a traverse beyond simple breadth first or depth first.
  More
   precisely, even when breadth first, how one can control *which*
 neighbors
   are traversed first.
  
   (It matters less in which order they're traversed vs. which order
 they're
   returned if you're returning all results, since you can just sort on
 the
   client. But it matters a lot if you want to use the paged traverser,
  since
   you're then only returning the first results.)
  
   I've learned that this is doable from Java by writing your own
   BranchSelector implementation:
  
   http://components.neo4j.org/neo4j/1.4.1/apidocs/
  
   I've found the built-in implementations, e.g. the pre-order
 breadth-first
   and depth-first:
  
  
  
 
 https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/PreorderBreadthFirstSelector.java
  
  
  
 
 https://github.com/neo4j/community/blob/master/kernel/src/main/java/org/neo4j/kernel/PreorderDepthFirstSelector.java
  
   To achieve a custom best first, Igor Dovgiy for example shared that
 he
   modeled his implementation of the breadth-first selector, except just
  using
   a PriorityQueue instead of a regular Queue.
  
   My question is: is there any way to specify this sort of thing over the
   REST
   API instead of having to write a plugin? If not, does that sound like a
   reasonable feature request?
  
   I really just want something simple: nodes ordered by some timestamp
   property. It's killing us that we can't do this today. We might just
 have
   to
   look into writing this as a plugin...
  
   Thanks!
  
   Aseem
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Executing arbitrary code through REST (was: Specifying best first order of traverse over REST API)

2011-08-26 Thread Jacopo Farina
Thanks a lot for the answer.
Cheers!
Jacopo Farina

2011/8/26 Javier de la Rosa ver...@gmail.com

 On Fri, Aug 26, 2011 at 13:12, Matt Luongo m...@scholr.ly wrote:
  I think Javier is working on adding a returns=type style parameter in
  the most recent source so that
  the client can figure out what type to cast the data into-

 Exactly, I have to fix some issues yet. Actually it would be great if
 there had some mechanism to set the type of returned values by an
 extension. But, in the mean time, the returns param is the only we
 can have.



 --
 Javier de la Rosa
 http://versae.es
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j vs orient db

2011-07-14 Thread Jacopo Farina
It would be very interesting to see a comparison.

The MVRB Tree (former RB+ Tree) used by OrientDB sounds amazing reading what
they claim on the site, but I wasn't able to find any test except this one
http://code.google.com/p/orient/wiki/DatabaseBenchmarks
and I'm not able to fully compare them by myself.

Cheers,
Jacopo Farina

2011/7/11 Jacob Hansson ja...@voltvoodoo.com

 On Sun, Jul 10, 2011 at 7:43 AM, Aliabbas aliabba...@gmail.com wrote:

  thanks andrew ! . Can you share with us your experiment for very large
  databases  . Orient db also claims to be highly scalable and follows a
  distributed model? How does that compare to neo4js scalability? Neo4j
 seems
  to be more open and honest  than orient db in describing its limitations?
 

 As far as I can tell, the theoretical scalability and the distributed model
 does not apply if you want to use the graphy features of OrientDB. At least
 that's what it says on the OrientDB google code page. I do believe that
 there is support for master/slave replication though, similar to the Neo4j
 HA product.

 If you have the time, I'd encourage you to try both databases out, and pick
 whichever suits your needs. We are of course happy to answer any questions
 about Neo4j that you have :)

 Aliabbas Petiwala
  Composed on mobile n97mini
 
  -original message-
  Subject: Re: [Neo4j] Neo4j vs orient db
  From: Andrew White li...@andrewewhite.net
  Date: 10/07/2011 5:49 pm
 
  I've seen a few studies but nothing very complete. I'm no expert by far
  but the jest I got was that as of late 2010, OrientDB had really fast
  load/read times but that Neo4j was *far* better at graph transversal.
 
  I am in the process of evaluating OrientDB from the perspective of
  dense graphs. I get the feeling at lot of performance is going to be
  dependent on the data model choosen for the particular platform. I
  welcome any Neo4j expert opinions on the matter though.
 
  Andrew
 
  On 07/10/2011 06:38 AM, Peter Neubauer wrote:
   Hi there,
   no, I have not seen anything that way, at least not relevant studies.
   We have not seen any import over a couple of million records in
   OrientDB sa far.
  
   Cheers,
  
   /peter neubauer
  
   GTalk:  neubauer.peter
   Skype   peter.neubauer
   Phone   +46 704 106975
   LinkedIn   http://www.linkedin.com/in/neubauer
   Twitter  http://twitter.com/peterneubauer
  
   http://www.neo4j.org   - Your high performance graph
  database.
   http://startupbootcamp.org/- Öresund - Innovation happens HERE.
   http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
 party.
  
  
  
   On Sun, Jul 10, 2011 at 8:52 AM, Aliabbas Petiwala
 aliabba...@gmail.com
   wrote:
   Is there any evaluation results and code availaible comparing neo with
   orientdb for very large graph databases?
  
   --
   Aliabbas Petiwala
   M.Tech CSE
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
   ___
   Neo4j mailing list
   User@lists.neo4j.org
   https://lists.neo4j.org/mailman/listinfo/user
  
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 



 --
 Jacob Hansson
 Phone: +46 (0) 763503395
 Twitter: @jakewins
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] finding all shortest paths between one node and all other nodes in a large scale databse

2011-05-12 Thread jacopo . farina
Hi,

I had the same problem and solved it by assigning a distance label to any
node.

The procedure is:

1-take the starting node N and add it to a set A, define the set B

2-set the value d=1

3-for any node M in A:

 3.1 set the label distance of M to d

3.2 for any node X which is connected to M and doesn't have the label
distance add X to the set B

4-increase d of 1

5-empty A and put all the entries of B in it, then empty B

 6-go to step 3 until A is empty


I have an implementation of it, but it's a little bit old (I created it for
a Neo4j 0.1 graph)

- Original Message 

 Da: Neo4j user discussions lt;user@lists.neo4j.orggt;

 To: Neo4j user discussions lt;user@lists.neo4j.orggt;

 Oggetto: Re: [Neo4j] finding all shortest paths between one node and all
other nodes in a large scale databse

 Data: 12/05/11 09:30

 

  Thanks for all your response,

 

 Here is the size of the grapth db:

 

 NodesSize

 -

 100,000  97MB

 200,000  182MB

 300,000  267MB

 ...

 5,000,000   expect 5GB

 

 I've tried to use 5 virtual machines, each one has 2 cores and 1G memory,

 Running 2 threads on each VM.

 

 Nodes   Time

 --

 50,000  gt;20mins

 150,000gt;40hrs

 

 Obviously, when the amount of node increases, the spending time of
executing

 GraphAlgoFactory.shortestPath increases heavily and non-linearly.

 It's hard to estimate how many machines I need if I want to deal with

 5,000,000 nodes or even 10,000,000,

 I think Map/Reduce will be one solution for me, and I'll try to use BFS

 traversal which may reduce some duplicate procedures.

 

 

 Nice to discuss with you,

 

 Thanks.

 

 2011/5/12 Michael Hunger lt;michael.hun...@neotechnology.comgt;

 

 gt; Hey JueiTing,

 gt;

 gt; I'm not sure if Hadoop is needed here.

 gt; What is the current performance characteristics for the shortest path
you

 gt; are using?

 gt;

 gt; You could take a decent machine and just fire up, e.g. blocks of 10k
node

 gt; pairs to a ThreadPoolExecutor with cores*2 threads.

 gt; Each of those tasks only has to return the sum of the path lengths
(and you

 gt; know the block size) so you can sum the whole thing up onto a long and

 gt; divide it by the

 gt; number of pairs processed at the end?

 gt;

 gt; Perhaps instead of brute force looping one and a half times over all
nodes

 gt; it is perhaps better to do a breath first traversal over all nodes

 gt; (regardless of relationships, just outgoing rels) and returning unique
paths

 gt; from the traverser

 gt; and calculating the path lenghts from the start nodes to each
_connected_

 gt; node on the paths (start node -gt; path[0..n].nodes[1..n])

 gt;

 gt; Cheers

 gt;

 gt; Michael

 gt;

 gt; Am 11.05.2011 um 21:12 schrieb Peter Neubauer:

 gt;

 gt; gt; Hi JueiTing,

 gt; gt; I think this is a typical case for a massive Map/Reduce job. I am

 gt; thinking

 gt; gt; of combining Hadoop works with replicas of the graph and then do
the

 gt; gt; computation. I believe Paddy Fitzgerald has been working with
these

 gt; gt; approaches and can give some feedback.

 gt; gt;

 gt; gt; Of course, given the size of the graph, that might prove a
problem. OTOH,

 gt; if

 gt; gt; there are no modifications during the computation, you could run
the

 gt; gt; calculations on read-only databases from the same store. Would
that work?

 gt; gt;

 gt; gt; Cheers,

 gt; gt;

 gt; gt; /peter neubauer

 gt; gt;

 gt; gt; GTalk:  neubauer.peter

 gt; gt; Skype   peter.neubauer

 gt; gt; Phone   +46 704 106975

 gt; gt; LinkedIn   http://www.linkedin.com/in/neubauer

 gt; gt; Twitter  http://twitter.com/peterneubauer

 gt; gt;

 gt; gt; http://www.neo4j.org - Your high performance graph

 gt; database.

 gt; gt; http://startupbootcamp.org/ - Ouml;resund - Innovation happens
HERE.

 gt; gt; http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing
party.

 gt; gt;

 gt; gt;

 gt; gt; On Wed, May 11, 2011 at 12:47 AM,
ccedil;iquest;ccedil;lsquo;žaring;raquo;middot;
lt;wshir...@gmail.comgt; wrote:

 gt; gt;

 gt; gt;gt; Hi,

 gt; gt;gt;

 gt; gt;gt; I'm trying to use Neo4j graph database to store a

 gt; gt;gt; large social network(more than 5,000,000 nodes) for academic
research.

 gt; gt;gt;

 gt; gt;gt; I need to compute the separation degree(path length) between
any two

 gt; nodes

 gt; gt;gt; in the graph then get the average degree of whole database.

 gt; gt;gt; The solution I'm using use now is archieved by executing API

 gt; gt;gt; GraphAlgoFactory.shortestPath,

 gt; gt;gt; but it means I need to execute (n*(n-1))/2 times to get all
path length.

 gt; I

 gt; gt;gt; don't think it's a very good idea :(

 gt; gt;gt;

 gt; gt;gt; So, I'm wondering that is there any function which can assign
one node

 gt; and

 gt; gt;gt; whole DB as Input, and return the

 gt; gt;gt; paths or only path lengths between the node and all other
nodes in the