Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Sebastian Schelter
Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to halt.

It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian


Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Avery Ching

Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.

Avery

On 11/15/11 12:42 PM, Sebastian Schelter wrote:

Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to halt.

It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian




Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Avery Ching

This should fix it.  It passed local unittests.  Let me know.

Avery

On 11/15/11 1:03 PM, Avery Ching wrote:

Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.

Avery

On 11/15/11 12:42 PM, Sebastian Schelter wrote:

Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to 
halt.


It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51

--sebastian




Index: src/main/java/org/apache/giraph/graph/BspServiceWorker.java
===
--- src/main/java/org/apache/giraph/graph/BspServiceWorker.java (revision 
1202424)
+++ src/main/java/org/apache/giraph/graph/BspServiceWorker.java (working copy)
@@ -548,7 +548,7 @@
 workerGraphPartitioner.finalizePartitionStats(
 partitionStatsList, workerPartitionMap);
 
-finishSuperstep(partitionStatsList, 0);
+finishSuperstep(partitionStatsList);
 }
 
 /**
@@ -773,8 +773,7 @@
 }
 
 @Override
-public boolean finishSuperstep(ListPartitionStats partitionStatsList,
-   long workersSentMessages) {
+public boolean finishSuperstep(ListPartitionStats partitionStatsList) {
 // This barrier blocks until success (or the master signals it to
 // restart).
 //
@@ -785,8 +784,9 @@
 // of this worker
 // 3. Let the master know it is finished.
 // 4. Then it waits for the master to say whether to stop or not.
+long workerSentMessages = 0;
 try {
-commService.flush(getContext());
+workerSentMessages = commService.flush(getContext());
 } catch (IOException e) {
 throw new IllegalStateException(
 finishSuperstep: flush failed, e);
@@ -807,7 +807,7 @@
 workerFinishedInfoObj.put(JSONOBJ_PARTITION_STATS_KEY,
   Base64.encodeBytes(partitionStatsBytes));
 workerFinishedInfoObj.put(JSONOBJ_NUM_MESSAGES_KEY,
-  workersSentMessages);
+  workerSentMessages);
 } catch (JSONException e) {
 throw new RuntimeException(e);
 }
Index: src/main/java/org/apache/giraph/graph/GraphMapper.java
===
--- src/main/java/org/apache/giraph/graph/GraphMapper.java  (revision 
1202424)
+++ src/main/java/org/apache/giraph/graph/GraphMapper.java  (working copy)
@@ -512,7 +512,6 @@
 
 ListPartitionStats partitionStatsList =
 new ArrayListPartitionStats();
-long workerSentMessages = 0;
 do {
 long superstep = serviceWorker.getSuperstep();
 
@@ -556,7 +555,6 @@
 context.progress();
 
 partitionStatsList.clear();
-workerSentMessages = 0;
 for (PartitionI, V, E, M partition :
 serviceWorker.getPartitionMap().values()) {
 PartitionStats partitionStats =
@@ -593,8 +591,7 @@
   maxMem= + Runtime.getRuntime().maxMemory() +
   freeMem= + Runtime.getRuntime().freeMemory());
 }
-} while (!serviceWorker.finishSuperstep(partitionStatsList,
-workerSentMessages));
+} while (!serviceWorker.finishSuperstep(partitionStatsList));
 if (LOG.isInfoEnabled()) {
 LOG.info(map: BSP application done  +
  (global vertices marked done));
Index: src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
===
--- src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java   

Re: Did GIRAPH-11 break vertex reactivation?

2011-11-15 Thread Sebastian Schelter
Yes, that fixes it. Thank you!

--sebastian

On 15.11.2011 22:12, Avery Ching wrote:
 This should fix it.  It passed local unittests.  Let me know.
 
 Avery
 
 On 11/15/11 1:03 PM, Avery Ching wrote:
 Yes, I think I broke it.  Sorry.  Let me get you a diff to test quickly.

 Avery

 On 11/15/11 12:42 PM, Sebastian Schelter wrote:
 Hi,

 I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
 continue to work on GIRAPH-51 where I use a small toy graph to test
 SimpleShortestPathVertex.

 Unfortunately my code did not work anymore and I guess I tracked it down
 to the fact that vertex that voted to halt are not reacted anymore when
 new messages arrive.

 In SimpleShortestPathVertex every vertex always votes to halt and only
 gets reactivated when a shorter path to it has been found. However my
 test run always finished after superstep 0.

 I don't know too much about Giraph's internals yet, but my guess is that
 the number of sent messages is not tracked correctly anymore. Therefore
 giraph finishes the algorithm (as all vertices voted to halt) although
 there should still be messages in the pipeline.

 I think I tracked it down to this behavior:

 GraphMapper declares a variable workerSentMessages = 0 and never
 increases it. This variable is given to
 BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
 it to compute the GlobalStats afterwards, which are used to decide
 whether a new superstep has to be scheduled. As it has never been
 increased, the algorithm will always stop when all vertices voted to
 halt.

 It would be great if someone could confirm/disprove this speculation and
 help me to continue work on GIRAPH-51

 --sebastian