[jira] [Updated] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-16 Thread Sebastian Schelter (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Schelter updated GIRAPH-51:
-

Attachment: GIRAPH-51-2.patch

Hi Jakob,

I moved InternalVertexRunner to the src as you suggested.

I think we have to separate the unit tests into two categories:

The first one would be testing single methods, usually invoking compute() and 
verifying the behavior of the vertex (I suppose that's what you aimed at with 
your last comment). To accomplish that one need not really run the system, it 
should be sufficient to injected mocked dependencies.  I added two tests for 
SimpleShortestPathVertex as an example for such tests. I created a helper class 
for convenient mocking of dependencies like the hadoop configuration e.g. and 
for configuring the vertex.

The second category, which InternalVertexRunner aims at would be something like 
a local integration test on toy data. It runs the system in a single JVM and 
executes the whole lifecycle of an algorithm (reading input from disk, running 
the supersteps, writing output etc). Although these tests are no compensation 
for real integration testing, they are often very helpful in finding subtle 
bugs, that normal unit testing cannot discover.



 Provide unit testing tool for Giraph algorithms
 ---

 Key: GIRAPH-51
 URL: https://issues.apache.org/jira/browse/GIRAPH-51
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Sebastian Schelter
 Attachments: GIRAPH-51-2.patch, GIRAPH-51.patch


 It would be nice to have a little tool, similar to MRUnit, that would allow 
 Giraph application writers to quickly unit test their algorithms.  The tool 
 could take a Vertex implementation, a set of input and expected output and 
 verify that after the specified number of supersteps, we've gotten what we 
 expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-68) Implement a Graph Generator

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151405#comment-13151405
 ] 

Avery Ching commented on GIRAPH-68:
---


Looks good Hyunsik, a few comments.

Probably want to add a javadoc comment for GraphGenerator
Lines 40-41: Should have 8 space indenting
Line 46: needs 4 more spaces
Line 58: Over 80 chars

So is the idea that PageRankBenchmark and RandomMessageBenchmark would use it?  
Would you like to modify them to do so?

 Implement a Graph Generator
 ---

 Key: GIRAPH-68
 URL: https://issues.apache.org/jira/browse/GIRAPH-68
 Project: Giraph
  Issue Type: New Feature
  Components: benchmark
Affects Versions: 0.70.0
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Attachments: GIRAPH-68_1.patch


 To provide users with benchmark environments and to deeply test the 
 input/output system of giraph, we need a graph generator. We will enable the 
 graph generator to generate various kinds of graph data sets by specifying a 
 VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread Shaunak Kashyap (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaunak Kashyap reassigned GIRAPH-89:
-

Assignee: Shaunak Kashyap

 Remove debugging system.out from LongDoubleFloatDoubleVertex
 

 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Minor
  Labels: newbie

 Line 137: {{System.out.println(in getNumVertices!);}}
 looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Review Request: GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread ycombinator

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2859/
---

Review request for giraph.


Summary
---

Removing System.out debugging statement.


This addresses bug GIRAPH-89.
https://issues.apache.org/jira/browse/GIRAPH-89


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1202868 

Diff: https://reviews.apache.org/r/2859/diff


Testing
---

$ mvn test


Thanks,

shaunak



Re: Review Request: GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread Avery Ching

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2859/#review3304
---

Ship it!


- Avery


On 2011-11-16 20:20:09, shaunak wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/2859/
 ---
 
 (Updated 2011-11-16 20:20:09)
 
 
 Review request for giraph.
 
 
 Summary
 ---
 
 Removing System.out debugging statement.
 
 
 This addresses bug GIRAPH-89.
 https://issues.apache.org/jira/browse/GIRAPH-89
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
  1202868 
 
 Diff: https://reviews.apache.org/r/2859/diff
 
 
 Testing
 ---
 
 $ mvn test
 
 
 Thanks,
 
 shaunak
 




[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151483#comment-13151483
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-89:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2859/
---

Review request for giraph.


Summary
---

Removing System.out debugging statement.


This addresses bug GIRAPH-89.
https://issues.apache.org/jira/browse/GIRAPH-89


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1202868 

Diff: https://reviews.apache.org/r/2859/diff


Testing
---

$ mvn test


Thanks,

shaunak



 Remove debugging system.out from LongDoubleFloatDoubleVertex
 

 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Minor
  Labels: newbie

 Line 137: {{System.out.println(in getNumVertices!);}}
 looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151484#comment-13151484
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-89:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2859/#review3304
---

Ship it!


- Avery


On 2011-11-16 20:20:09, shaunak wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2859/
bq.  ---
bq.  
bq.  (Updated 2011-11-16 20:20:09)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Removing System.out debugging statement.
bq.  
bq.  
bq.  This addresses bug GIRAPH-89.
bq.  https://issues.apache.org/jira/browse/GIRAPH-89
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
 1202868 
bq.  
bq.  Diff: https://reviews.apache.org/r/2859/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  $ mvn test
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  shaunak
bq.  
bq.



 Remove debugging system.out from LongDoubleFloatDoubleVertex
 

 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Minor
  Labels: newbie

 Line 137: {{System.out.println(in getNumVertices!);}}
 looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread Shaunak Kashyap (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaunak Kashyap resolved GIRAPH-89.
---

Resolution: Fixed

 Remove debugging system.out from LongDoubleFloatDoubleVertex
 

 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Minor
  Labels: newbie

 Line 137: {{System.out.println(in getNumVertices!);}}
 looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151493#comment-13151493
 ] 

Hudson commented on GIRAPH-89:
--

Integrated in Giraph-trunk-Commit #35 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/35/])
GIRAPH-89: Remove debugging system.out from LongDoubleFloatDoubleVertex. 
(shaunak via aching)

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1202875
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java


 Remove debugging system.out from LongDoubleFloatDoubleVertex
 

 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Minor
  Labels: newbie

 Line 137: {{System.out.println(in getNumVertices!);}}
 looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-16 Thread Attila Csordas (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Csordas updated GIRAPH-86:
-

Attachment: GIRAPH-86.patch

mvn test with hadoop_non_secure profile is successful, but apache-rat:check 
throws: Failed to execute goal org.apache.rat:apache-rat-plugin:0.7:check 
(default-cli) on project giraph: Too many unapproved licenses: 1 - [Help 1] 
before and after the change

 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Attachments: GIRAPH-86.patch


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Created) (JIRA)
Large-memory improvements (Memory reduced vertex implementation, fast failure, 
added settings) 
---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching


Current vertex implementation uses a HashMap for storing the edges, which is 
quite memory heavy for large graphs.  The default settings in Giraph need to be 
improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151587#comment-13151587
 ] 

jirapos...@reviews.apache.org commented on GIRAPH-91:
-


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2868/
---

Review request for giraph.


Summary
---

There general changes should support larger heap sizes (i.e. 20G)

- Added new EdgeListVertex that stores its edges in a compact pair of lists 
instead of Vertex's HashMap.

- Added unittests TestEdgeArrayVertex to test EdgeListVertex.

- Augmented PageRankBenchmark to choose between EdgeListArrayVertex or Vertex 
(to try it out).

- Added failure cleanup for failed workers to quickly alert the master that 
they are dead by deleting its health ephemeral znode.  This allows us to set 
higher ZooKeeper timeouts to deal with GC pauses and the like.  In a quick test 
of 3 nodes, I saw failure in 43 seconds instead of 1m 52 sec.

- Added a context.progress() to flushing to not kill jobs with long timeouts 
(GC or lots of messages).


This addresses bug GIRAPH-91.
https://issues.apache.org/jira/browse/GIRAPH-91


Diffs
-

  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
 1202898 
  
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/2868/diff


Testing
---

Local unittests, PageRankBenchmark on multiple machines with 20GB heaps.


Thanks,

Avery



 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching

 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-16 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-86:
--

Attachment: pom.diff

See if this helps you...if so, please add to your patch.

 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Attachments: GIRAPH-86.patch, pom.diff


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151600#comment-13151600
 ] 

Jakob Homan commented on GIRAPH-91:
---

can you attach the patch to the jira, for non-rb review?

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching

 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-91:
--

Attachment: GIRAPH-91.diff

Sure, sorry about that.

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151603#comment-13151603
 ] 

Avery Ching commented on GIRAPH-91:
---

By the way, rb allows you to download the diff directly (so you don't have to 
worry about them staying in sync).

https://reviews.apache.org/r/2868/diff/raw/

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Claudio Martella (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151643#comment-13151643
 ] 

Claudio Martella commented on GIRAPH-91:


The List-based adjacency list looks quite good to me. A couple of weeks ago I 
did a microbenchmark on iteration-performance of arrayList/array, TreeMap, 
HashMap and SkipList and I was quite impressed about the performance hit. I 
believe we don't only save memory here (would be curious to calculate precisely 
the overhead) but also in speedup with algorithms, such as PR, where the 
compute has an iterator-based sendMsg pattern. Good!

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-92:
--

Attachment: GIRAPH-92.patch

Patch with new format and unit test.  One can switch from IddelimValue to 
ValuedelimId via a configuration parameter.

 Need outputformat for just vertex ID and value
 --

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0

 Attachments: GIRAPH-92.patch


 We should have an text outputformat that just spits out the vertex id and 
 value without its edges:
 {noformat}index.html 0.9423{noformat}
 This would be particularly helpful for further processing by, for instance, 
 Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Jakob Homan (Created) (JIRA)
Need outputformat for just vertex ID and value
--

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0
 Attachments: GIRAPH-92.patch

We should have an text outputformat that just spits out the vertex id and value 
without its edges:
{noformat}index.html 0.9423{noformat}
This would be particularly helpful for further processing by, for instance, Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151736#comment-13151736
 ] 

Avery Ching commented on GIRAPH-92:
---

I agree this could be useful.

Couple of format errors:

if (expr)

+  if(reverseOutput) {

typo

+  public void testWithDifferentDelimieter()  throws IOException,

Interrupted needs one more space

+  public void testWithDifferentDelimieter()  throws IOException,
+ InterruptedException {
+Configuration conf = new Configuration();

Extra line break

+writer.writeVertex(vertex);
+
+
+verify(tw).write(expected, null);

 Need outputformat for just vertex ID and value
 --

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0

 Attachments: GIRAPH-92.patch


 We should have an text outputformat that just spits out the vertex id and 
 value without its edges:
 {noformat}index.html 0.9423{noformat}
 This would be particularly helpful for further processing by, for instance, 
 Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex

2011-11-16 Thread Claudio Martella (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151753#comment-13151753
 ] 

Claudio Martella commented on GIRAPH-78:


Yes, very nice, but how would you implement this? A caching Factory or you 
really want 100% re-use? That would require a per-worker index of Is.

 Be smarter about multiple instances of the same vertex
 --

 Key: GIRAPH-78
 URL: https://issues.apache.org/jira/browse/GIRAPH-78
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 In a graph such as 
 {noformat}a - b, z
 b - c, z
 c - a, z
 ...
 z{noformat}
 where vertices a,b,c and are hosted on one worker and z is hosted on another, 
 it would be good to cache instances of z so a,b,c all point at the same 
 instance, rather than generating multiple copies of the same remote vertex 
 during vertex reading.  This is less important with primitive types and the 
 recent work done there, but very useful for more complex types.  Since the 
 vertex readers are in userland, it would be good to provide these facilities 
 as a library implementing users can access. ]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151782#comment-13151782
 ] 

Avery Ching commented on GIRAPH-78:
---

Actually the more I think about it, this might not be too useful unless you 
have large vertexId objects.  I guess the idea would be to keep a cache, maybe 
in the GraphState or the WorkerContext.

 Be smarter about multiple instances of the same vertex
 --

 Key: GIRAPH-78
 URL: https://issues.apache.org/jira/browse/GIRAPH-78
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 In a graph such as 
 {noformat}a - b, z
 b - c, z
 c - a, z
 ...
 z{noformat}
 where vertices a,b,c and are hosted on one worker and z is hosted on another, 
 it would be good to cache instances of z so a,b,c all point at the same 
 instance, rather than generating multiple copies of the same remote vertex 
 during vertex reading.  This is less important with primitive types and the 
 recent work done there, but very useful for more complex types.  Since the 
 vertex readers are in userland, it would be good to provide these facilities 
 as a library implementing users can access. ]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Avery Ching (Created) (JIRA)
Hive input / output format
--

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching


It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151805#comment-13151805
 ] 

Jakob Homan commented on GIRAPH-93:
---

Do you mean RCFile specifically?  Hive can handle data in any format there's a 
serde for.  I've been meaning to open a jira for handling Avro-encoded data as 
well (and possibly specifying a graph schema for it).  For directly loading 
tables in/out of Hive, it may be better to target HCatalog, as that will also 
give access to Pig (and whatever else HCatalog eventually supports)

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Dmitriy V. Ryaboy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151809#comment-13151809
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-93:
-

FWIW I already have a Thrift one lying around, which both Hive and Pig read via 
Elephant-Bird. It might not compile against the current trunk as I've been 
working on other stuff and you guys have been coding like mad.. but I can post 
something over the thanksgiving weekend.


 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Arun Suresh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151823#comment-13151823
 ] 

Arun Suresh commented on GIRAPH-91:
---

Avery, I see that you have used 2 sorted ArrayLists. Couldnt a LinkedHashMap 
have been an alternative ? I understand that the getEdgeValue and hasEdgeVale 
would be faster if it were a sortedArrayList. Also arraylists are more compact. 
But I was just wondering.. in the event that the graph is truly large (millions 
of edges, for a vertex) would it make sense to have the entire edgelist in 
memory in the first place ? we might need a scheme where only a part of the 
list is in memory and have chunks of the list fetched on demand when the 
provided iterator calls next(). In which case we can have a hybrid array + 
linked list (linked list of chunks of the edgelist)

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-76) Refactor worker logic from GraphMapper

2011-11-16 Thread Arun Suresh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151825#comment-13151825
 ] 

Arun Suresh commented on GIRAPH-76:
---

Yes, this does sound like a good idea. I could take a crack at it you havn't 
already started.

 Refactor worker logic from GraphMapper
 --

 Key: GIRAPH-76
 URL: https://issues.apache.org/jira/browse/GIRAPH-76
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Jakob Homan

 The plumbing around executing vertices is hosted within the mapper, but could 
 be extracted to its own class and executed from the Mapper directly.  This 
 would ease testing and make it easier to host in the new YARN infrastructure. 
  There's nothing mapper specific about this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151844#comment-13151844
 ] 

Avery Ching commented on GIRAPH-91:
---

Arun, we can certainly try other data structure for other BasicVertex 
implementations.  This one is a meant for pretty decent memory reduction.  I 
expect we will have a bunch of different implementations based on the 
requirements of the application.

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151848#comment-13151848
 ] 

Avery Ching commented on GIRAPH-93:
---

I think RCFile initially, then other formats in Hive.  HCatalog certainly is a 
good idea for the long term, not sure how ready it is now?  

Dmitriy, could you send me your code (don't worry about getting it to compile). 
 I'd like to take a look at any examples.

I've been trying to find any examples for using loading/storing from Hive 
tables to MapReduce jobs and can't find much unfortunately.  I'd appreciate any 
pointers.

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Arun Suresh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151851#comment-13151851
 ] 

Arun Suresh commented on GIRAPH-93:
---

Avery, This might not be an optimal solution, but just putting it out there. I 
understand Hive exposes a JDBC interface. Once can use the JDBC interface and 
the DbInputFormat 
http://www.cloudera.com/blog/2009/03/database-access-with-hadoop/ to load data 
from a Hive table for a Map Reduce Job

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-76) Refactor worker logic from GraphMapper

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151860#comment-13151860
 ] 

Jakob Homan commented on GIRAPH-76:
---

I've not.  Please go for it.

 Refactor worker logic from GraphMapper
 --

 Key: GIRAPH-76
 URL: https://issues.apache.org/jira/browse/GIRAPH-76
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Jakob Homan

 The plumbing around executing vertices is hosted within the mapper, but could 
 be extracted to its own class and executed from the Mapper directly.  This 
 would ease testing and make it easier to host in the new YARN infrastructure. 
  There's nothing mapper specific about this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151866#comment-13151866
 ] 

Avery Ching commented on GIRAPH-93:
---

Specifically, we have a lot of data stored in Hive tables and I'd like to be 
able to do graph computation on them with Giraph and then store the results 
back in Hive tables so Hive queries can operate against them as well.

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Dmitriy V. Ryaboy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151871#comment-13151871
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-93:
-

Going through HCat will be a bit gnarly (though I agree with Jakob that it's 
the only sensible way when you are dealing with Hive-managed tables).  Writing 
to a directory and having Hive treat is a external will be far easier. Oh and 
Hive (and Pig) can read tab-delimited files, if it's just a matter of getting 
basic pipelining to happen initially.

Jakob, maybe we can discuss on the list, but starting a Giraph job from Pig 
should be as simple as a mapreduce invocation 
(http://pig.apache.org/docs/r0.9.1/basic.html#mapreduce). 

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-78) Be smarter about multiple instances of the same vertex

2011-11-16 Thread Jake Mannix (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151873#comment-13151873
 ] 

Jake Mannix commented on GIRAPH-78:
---

Yeah, that's what I've been thinking too: each vertex has independent edge 
values to its destination, and doesn't keep a reference to the target vertex 
*value*, just its id.  So yeah, unless the I typed objects are big, I'm not 
sure what you can do here.

 Be smarter about multiple instances of the same vertex
 --

 Key: GIRAPH-78
 URL: https://issues.apache.org/jira/browse/GIRAPH-78
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 In a graph such as 
 {noformat}a - b, z
 b - c, z
 c - a, z
 ...
 z{noformat}
 where vertices a,b,c and are hosted on one worker and z is hosted on another, 
 it would be good to cache instances of z so a,b,c all point at the same 
 instance, rather than generating multiple copies of the same remote vertex 
 during vertex reading.  This is less important with primitive types and the 
 recent work done there, but very useful for more complex types.  Since the 
 vertex readers are in userland, it would be good to provide these facilities 
 as a library implementing users can access. ]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira