Right, there could be even a faster ways but they would need a few additional
methods in
NodeManager :)
Cheers
Michael
Am 23.07.2011 um 15:30 schrieb John cyuczieekc:
> Hey Michael,
> I took a very quick look, I think understand it, looks like it attempts to
> get nodes by id starting from 0 until highest possible ID.
> public synchronized boolean hasNext()
> {
> while ( currentNode == null && currentNodeId <= highId )
> {
> try
> {
> currentNode = getNodeById( currentNodeId++ );
> }
> catch ( NotFoundException e )
> {
> // ok we try next
> }
> }
> return currentNode != null;
> }
> (seems to work even for when neo4j recycles deleted ids;
> highId=100012 in my case where I had 100,011 relationships each with unique
> nodes, so likely 100,012 nodes)
>
> Thank you for pointing me to that, I will consider doing the same with
> getRelationshipById() and bench them both xD
> Done, looks like it halves the time when using getAllRels()
> ie. (output)
> counting inside the same transaction...
> Node `one` has 1,753,000 out rels, time=10,235,246,378
> tx.finish() time=1,070,892,633
> counting one's outgoing rels (outside of initial transaction)...
> Node `one` has 1,753,000 out rels, time=1,743,920,993
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=26,404,843,957 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=699,410,620 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,667,748,552 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=655,286,990 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=*1,261*,898,459 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=*663*,559,121 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,257,557,629 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=637,826,233 ns
> Shutting down database ...
>
> and the sample program for this(I did have to add getAllRels() which is
> similar with getAllNodes()):
> -------------------- in EmbeddedGraphDbImpl.java
> public Iterable<Relationship> getAllRels() {
> return new Iterable<Relationship>() {
>
> @Override
> public Iterator<Relationship> iterator() {
> long highId = nodeManager.getHighestPossibleIdInUse(
> Relationship.class );
> return new AllRelsIterator( highId );
> }
> };
> }
>
> private class AllRelsIterator implements Iterator<Relationship> {
>
> private final long highId;
> private long currentRelId = 0;
> private Relationship currentRel = null;
>
>
> AllRelsIterator( long highId ) {
> this.highId = highId;
> }
>
>
> @Override
> public synchronized boolean hasNext() {
> while ( currentRel == null && currentRelId <= highId ) {
> try {
> currentRel = getRelationshipById( currentRelId++ );
> } catch ( NotFoundException e ) {
> // ok we try next
> }
> }
> return currentRel != null;
> }
>
>
> @Override
> public synchronized Relationship next() {
> if ( !hasNext() ) {
> throw new NoSuchElementException();
> }
>
> Relationship nextNode = currentRel;
> currentRel = null;
> return nextNode;
> }
>
>
> @Override
> public void remove() {
> throw new UnsupportedOperationException();
> }
> }
>
> --------------
> /**
> * Licensed to Neo Technology under one or more contributor
> * license agreements. See the NOTICE file distributed with
> * this work for additional information regarding copyright
> * ownership. Neo Technology licenses this file to you under
> * the Apache License, Version 2.0 (the "License"); you may
> * not use this file except in compliance with the License.
> * You may obtain a copy of the License at
> *
> * http://www.apache.org/licenses/LICENSE-2.0
> *
> * Unless required by applicable law or agreed to in writing,
> * software distributed under the License is distributed on an
> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> * KIND, either express or implied. See the License for the
> * specific language governing permissions and limitations
> * under the License.
> */
> package org.neo4j.examples;
>
> import java.io.*;
> import java.text.*;
>
> import org.neo4j.graphdb.*;
> import org.neo4j.graphdb.index.*;
> import org.neo4j.kernel.*;
>
>
>
> public class CalculateShortestPath {
>
> private static final int
> SHOWINFO_IF_COUNTING_REL_TOOK_MORE_THAN_ns = 2 * 300;
> private static final int SHOWINFO_IF_REL_TOOK_MORE_THAN_ns
> = 30000;
> private static final int SHOWEVERY_xTH_REL
> = 10000;
> private static final int HOWMANY_RELATIONSHIPS
> = 109000;
> private static final String DB_PATH
> = "neo4j-shortest-path";
> private static final String NAME_KEY
> = "name";
> private static RelationshipType KNOWS
> = DynamicRelationshipType
>
> .withName( "KNOWS" );
>
> private static GraphDatabaseService graphDb;
> private static Index<Node> indexService;
> private static DecimalFormat commaDelimitedFormatter
> = new DecimalFormat( "###,###" );
>
>
> public static String number( double val ) {
> return commaDelimitedFormatter.format( val );
> }
>
>
> public static void main( final String[] args ) {
> // deleteFileOrDirectory( new File( DB_PATH ) );// XXX:
> graphDb = new EmbeddedGraphDatabase( DB_PATH );
> registerShutdownHook();
> indexService = graphDb.index().forNodes( "nodes" );
> Transaction rootTx;
> rootTx = graphDb.beginTx();
> Node one = getOrCreateNode( "one" );
> DynamicRelationshipType moo = DynamicRelationshipType.withName(
> "moo" );
> try {
> for ( int i = 1; i <= HOWMANY_RELATIONSHIPS; i++ ) {
> long start = System.nanoTime();
> Relationship rel = one.createRelationshipTo(
> graphDb.createNode(), moo );
> long end = System.nanoTime();
> if ( ( i % SHOWEVERY_xTH_REL == 0 ) || ( end - start >
> SHOWINFO_IF_REL_TOOK_MORE_THAN_ns ) ) {
> System.out.println( number( i ) + " timeDelta=" +
> number( end - start ) );
> }
> }
>
> System.out.println( "counting inside the same transaction..." );
> long start = System.nanoTime();
> Iterable<Relationship> rel = one.getRelationships(
> Direction.OUTGOING, moo );
> long count = 0;
> long tstart = 0;
> for ( Relationship relationship : rel ) {
> // long tend = System.nanoTime();
> count++;
> // if ( ( tend - tstart >
> SHOWINFO_IF_COUNTING_REL_TOOK_MORE_THAN_ns ) ) {
> // System.out.println( number( count ) + " timeDelta=" +
> number( tend - tstart ) );
> // }
> // tstart = System.nanoTime();
> }
> long end = System.nanoTime();
> System.out.println( "Node `" + one.getProperty( NAME_KEY ) + "`
> has " + number( count ) + " out rels, time="
> + number( end - start ) );
>
> rootTx.success();
> } finally {
> long start = System.nanoTime();
> rootTx.finish();
> long end = System.nanoTime();
> System.out.println( "tx.finish() time=" + number( end - start )
> );
> }
>
>
> System.out.println( "counting one's outgoing rels (outside of
> initial transaction)..." );
> long start = System.nanoTime();
> Iterable<Relationship> rel = one.getRelationships(
> Direction.OUTGOING, moo );
> long count = 0;
> for ( Relationship relationship : rel ) {
> count++;
> }
> long end = System.nanoTime();
> System.out.println( "Node `" + one.getProperty( NAME_KEY ) + "` has
> " + number( count ) + " out rels, time="
> + number( end - start ) );
>
> int repeat = 3;
> do {
> System.out.println( "counting all outgoing rels of all nodes ...
> via getAllNodes" );
> count = 0;
> start = System.nanoTime();
> Iterable<Node> allNodes = graphDb.getAllNodes();
> for ( Node node : allNodes ) {
> Iterable<Relationship> allRels = node.getRelationships(
> Direction.OUTGOING );
> for ( Relationship relationship : allRels ) {
> count++;
> }
> }
> end = System.nanoTime();
> System.out.println( "all relationships in the database = " +
> number( count ) + " timedelta=" + number( end - start )
> + " ns" );
>
>
> System.out.println( "counting all Relationships ... via
> getAllRels" );
> count = 0;
> start = System.nanoTime();
> Iterable<Relationship> allRels = graphDb.getAllRels();
> for ( Relationship relationship : allRels ) {
> count++;
> }
> end = System.nanoTime();
> System.out.println( "all relationships in the database = " +
> number( count ) + " timedelta=" + number( end - start )
> + " ns" );
> } while ( repeat-- > 0 );
> }
>
>
> private static Node getOrCreateNode( String name ) {
> Node node = indexService.get( NAME_KEY, name ).getSingle();
> if ( node == null ) {
> System.out.println( "creating new node with name=" + name );
> node = graphDb.createNode();
> node.setProperty( NAME_KEY, name );
> indexService.add( node, NAME_KEY, name );
> }
> return node;
> }
>
>
> private static void registerShutdownHook() {
> // Registers a shutdown hook for the Neo4j instance so that it
> // shuts down nicely when the VM exits (even if you "Ctrl-C" the
> // running example before it's completed)
> Runtime.getRuntime().addShutdownHook( new Thread() {
>
> @SuppressWarnings( "synthetic-access" )
> @Override
> public void run() {
> System.out.println( "Shutting down database ..." );
> graphDb.shutdown();
> }
> } );
> }
>
>
> private static void deleteFileOrDirectory( File file ) {
> if ( file.exists() ) {
> if ( file.isDirectory() ) {
> for ( File child : file.listFiles() ) {
> deleteFileOrDirectory( child );
> }
> }
> file.delete();
> }
> }
> }
>
> ========
> here's another output when no additions were done (using same sample
> program) but trying to count the relationships via getAllRels first, then
> getAllNodes, and repeating this block:
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=*10,1*02,256,698 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=*29,9*02,912,485 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=687,562,153 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,377,601,567 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=*648*,269,229 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=*1,371*,030,811 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=756,425,427 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,256,402,192 ns
> Shutting down database ...
>
> ======
> and here's another way:
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=*10,154*,915,686 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=*690*,298,744 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=649,536,192 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=1,978,228,693 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=653,655,768 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=647,365,027 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=*28,549*,060,316 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,355,109,837 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,442,695,434 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,438,563,566 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,366,895,645 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=*1,384*,237,380 ns
> Shutting down database ...
>
> I'll paste the program as it is for this last test (though it's the same
> one, but a bit reordered):
> I also cleaned it up some:
> /**
> * Licensed to Neo Technology under one or more contributor
> * license agreements. See the NOTICE file distributed with
> * this work for additional information regarding copyright
> * ownership. Neo Technology licenses this file to you under
> * the Apache License, Version 2.0 (the "License"); you may
> * not use this file except in compliance with the License.
> * You may obtain a copy of the License at
> *
> * http://www.apache.org/licenses/LICENSE-2.0
> *
> * Unless required by applicable law or agreed to in writing,
> * software distributed under the License is distributed on an
> * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
> * KIND, either express or implied. See the License for the
> * specific language governing permissions and limitations
> * under the License.
> */
> package org.neo4j.examples;
>
> import java.io.*;
> import java.text.*;
>
> import org.neo4j.graphdb.*;
> import org.neo4j.graphdb.index.*;
> import org.neo4j.kernel.*;
>
>
>
> public class Copy_2_of_CalculateShortestPath {
>
> private static final String DB_PATH =
> "neo4j-shortest-path";
> private static final String NAME_KEY = "name";
>
> private static GraphDatabaseService graphDb;
> private static Index<Node> indexService;
> private static DecimalFormat commaDelimitedFormatter = new
> DecimalFormat( "###,###" );
>
>
> public static String number( double val ) {
> return commaDelimitedFormatter.format( val );
> }
>
>
> public static void main( final String[] args ) {
> // deleteFileOrDirectory( new File( DB_PATH ) );// XXX:
> graphDb = new EmbeddedGraphDatabase( DB_PATH );
> registerShutdownHook();
> indexService = graphDb.index().forNodes( "nodes" );
> Transaction rootTx;
> Node one = getOrCreateNode( "one" );
> DynamicRelationshipType moo = DynamicRelationshipType.withName(
> "moo" );
>
> int repeat = 5;
> do {
> System.out.println( "counting all Relationships ... via
> getAllRels" );
> long count = 0;
> long start = System.nanoTime();
> Iterable<Relationship> allRels = graphDb.getAllRels();
> for ( Relationship relationship : allRels ) {
> count++;
> }
> long end = System.nanoTime();
> System.out.println( "all relationships in the database = " +
> number( count ) + " timedelta=" + number( end - start )
> + " ns" );
> } while ( repeat-- > 0 );
>
> repeat = 5;
> do {
> System.out.println( "counting all outgoing rels of all nodes ...
> via getAllNodes" );
> long count = 0;
> long start = System.nanoTime();
> Iterable<Node> allNodes = graphDb.getAllNodes();
> for ( Node node : allNodes ) {
> Iterable<Relationship> allRels2 = node.getRelationships(
> Direction.OUTGOING );
> for ( Relationship relationship : allRels2 ) {
> count++;
> }
> }
> long end = System.nanoTime();
> System.out.println( "all relationships in the database = " +
> number( count ) + " timedelta=" + number( end - start )
> + " ns" );
> } while ( repeat-- > 0 );
> }
>
>
> private static Node getOrCreateNode( String name ) {
> Node node = indexService.get( NAME_KEY, name ).getSingle();
> if ( node == null ) {
> System.out.println( "creating new node with name=" + name );
> node = graphDb.createNode();
> node.setProperty( NAME_KEY, name );
> indexService.add( node, NAME_KEY, name );
> }
> return node;
> }
>
>
> private static void registerShutdownHook() {
> // Registers a shutdown hook for the Neo4j instance so that it
> // shuts down nicely when the VM exits (even if you "Ctrl-C" the
> // running example before it's completed)
> Runtime.getRuntime().addShutdownHook( new Thread() {
>
> @SuppressWarnings( "synthetic-access" )
> @Override
> public void run() {
> System.out.println( "Shutting down database ..." );
> graphDb.shutdown();
> }
> } );
> }
>
>
> private static void deleteFileOrDirectory( File file ) {
> if ( file.exists() ) {
> if ( file.isDirectory() ) {
> for ( File child : file.listFiles() ) {
> deleteFileOrDirectory( child );
> }
> }
> file.delete();
> }
> }
> }
>
> All in all, a little better, but not making much difference for me at the
> moment, until reaching some very high amount of relationships or doing lots
> of counts.
> Well great, see you later :)
>
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=11,904,833,914 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=715,916,208 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=652,136,373 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=770,903,756 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=651,695,964 ns
> counting all Relationships ... via getAllRels
> all relationships in the database = 1,753,000 timedelta=658,016,686 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=28,067,437,117 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,347,472,686 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,282,134,781 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,309,284,532 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,358,393,266 ns
> counting all outgoing rels of all nodes ... via getAllNodes
> all relationships in the database = 1,753,000 timedelta=1,219,737,032 ns
> Shutting down database ...
>
>
> On Sat, Jul 23, 2011 at 10:51 AM, Michael Hunger <
> [email protected]> wrote:
>
>> An internal implementation would be probably faster.
>>
>> If timing is that critical for you, you can have a look in
>> EmbeddedGraphDbImpl.getAllNodes() and implement a similar solution for
>> relationships.
>>
>> Cheers
>>
>> Michael
>>
>> Am 23.07.2011 um 04:20 schrieb John cyuczieekc:
>>
>>> Hey Jim,
>>> I am sort of glad to hear that, maybe in the future I could see a method
>>> like getAllRelationships(), or not, np :)
>>> Yes, using Michael's code works, but ...
>>> total relations count=100,011 timedelta=3,075,897,991 ns
>>> it kind of takes 3 seconds (when not cached) to count 100k relationships
>>> (considering there are 100k+2 unique nodes too)
>>> when cached:
>>> total relations count=100,011 timedelta=154,673,763 ns
>>>
>>> Still, it's pretty fast, but I have to wonder if it would be faster if
>> using
>>> relationships directly :)
>>>
>>> Either way, wish y'all a great day!
>>>
>>>
>>> On Sat, Jul 23, 2011 at 3:57 AM, Jim Webber <[email protected]>
>> wrote:
>>>
>>>> Hi John,
>>>>
>>>> Relationships are stored in a different store than nodes. This enables
>>>> Neo4j to manage lifecycle events (like caching) for nodes and
>> relationships
>>>> separately.
>>>>
>>>> Neo4j really is a graph DB, not a triple store masquerading as a graph
>> DB.
>>>>
>>>> Nonetheless, that code Michael sent still works :-)
>>>>
>>>> Jim
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user