[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Arun Suresh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151910#comment-13151910
 ] 

Arun Suresh commented on GIRAPH-77:
---

This looks very related to 
[GIRAPH-76|https://issues.apache.org/jira/browse/GIRAPH-76] since I will be 
refactoring GraphMapper. I can take this up as well if you havn't already 
started..

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-94) Loading vertex ranges from HBase

2011-11-17 Thread Claudio Martella (Created) (JIRA)
Loading vertex ranges from HBase


 Key: GIRAPH-94
 URL: https://issues.apache.org/jira/browse/GIRAPH-94
 Project: Giraph
  Issue Type: New Feature
Reporter: Claudio Martella
Assignee: Claudio Martella


Loading vertices from an HTable would be an option.

A possible schema for storing the graph would be Hexastore 
(http://www.vldb.org/pvldb/1/1453965.pdf). 
Also, as vertices whom messages are sent to get created on the fly (if they 
don't exist already), we could potentially have a HBaseVertex that fetches the 
adjacency list + vertex value from HBase. That would be kind of a Lazy-load 
approach, if you can define the initial split as an HBase query.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151965#comment-13151965
 ] 

Hudson commented on GIRAPH-91:
--

Integrated in Giraph-trunk-Commit #36 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/36/])
GIRAPH-91: Large-memory improvements (Memory reduced vertex
implementation, fast failure, added settings). (aching)

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1203130
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
* 
/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/graph
* 
/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java


 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-95) vertex resolution expects MutableVertex instead of BasicVertex

2011-11-17 Thread Claudio Martella (Created) (JIRA)
vertex resolution expects MutableVertex instead of BasicVertex
--

 Key: GIRAPH-95
 URL: https://issues.apache.org/jira/browse/GIRAPH-95
 Project: Giraph
  Issue Type: Bug
  Components: graph
Reporter: Claudio Martella


At the beginning of the superstep, when a message is sent to non-existing 
vertex, the new vertex is created. This new vertex id is set through 
setVertexId() which belongs to MutableVertex. Should use initialize() instead.

See BspRPCCommunication:948 (on my local trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Arun Suresh (Created) (JIRA)
Support for Graphs with Huge adjacency lists


 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
Reporter: Arun Suresh


Currently the vertex initialize() method is passed the complete adjacency list 
as a HashMap. All the current concrete implementations of Vertex iterate over 
the adjacency list and recreate new Data Structures within the Vertex instance 
to hold/manipulate the adjacency list. This would seize to be feasible once the 
size of the adjacency list becomes really huge.

I propose storing the adjacency list and all vertex information (and incoming 
messages ?) in a distributed data store such as HBase. The adjacency list can 
be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
row Id is a concatenation of VertexID+OutboundVertexId with a single column 
containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Arun Suresh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152087#comment-13152087
 ] 

Arun Suresh commented on GIRAPH-96:
---

Looks like Claudio beat me to a similar suggestion 
[GIRAPH-94|https://issues.apache.org/jira/browse/GIRAPH-94]

My proposal was more for a standard means of storing vertex/adjacency list 
information. The Giraph framework would handle the storage and would expose 
APIs which the Vertex reader can use to store the information as it reads the 
graph. The user would then not be required to subclass a Vertex class and 
implement the initialize() method. All adjacency list/vertex manipulation would 
go thru the common data store.

 Support for Graphs with Huge adjacency lists
 

 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
Reporter: Arun Suresh

 Currently the vertex initialize() method is passed the complete adjacency 
 list as a HashMap. All the current concrete implementations of Vertex iterate 
 over the adjacency list and recreate new Data Structures within the Vertex 
 instance to hold/manipulate the adjacency list. This would seize to be 
 feasible once the size of the adjacency list becomes really huge.
 I propose storing the adjacency list and all vertex information (and incoming 
 messages ?) in a distributed data store such as HBase. The adjacency list can 
 be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
 row Id is a concatenation of VertexID+OutboundVertexId with a single column 
 containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Gianmarco De Francisci Morales (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152101#comment-13152101
 ] 

Gianmarco De Francisci Morales commented on GIRAPH-96:
--

In my opinion this would make things too complex.
I wouldn't like to keep my graph in HBase to run Giraph on it.
Also, this makes HBase a dependency.

I agree that this is a nice option to have but I wouldn't make it the default.

Finally, a similar goal could be attained by streaming edges to disk and 
reading them with sequential scans when performing supersteps. This requires no 
network connection and should be much faster.
You just need good out-of-core data structures and algorithms.

 Support for Graphs with Huge adjacency lists
 

 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.70.0
Reporter: Arun Suresh

 Currently the vertex initialize() method is passed the complete adjacency 
 list as a HashMap. All the current concrete implementations of Vertex iterate 
 over the adjacency list and recreate new Data Structures within the Vertex 
 instance to hold/manipulate the adjacency list. This would seize to be 
 feasible once the size of the adjacency list becomes really huge.
 I propose storing the adjacency list and all vertex information (and incoming 
 messages ?) in a distributed data store such as HBase. The adjacency list can 
 be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
 row Id is a concatenation of VertexID+OutboundVertexId with a single column 
 containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-68) Implement a Graph Generator

2011-11-17 Thread Hyunsik Choi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-68:
---

Attachment: GIRAPH-68_2.patch

Avery,

Thank you for review.

I think that the GraphGenerator is necessary to test the overall of IO-related 
sub systems. For example, *InputFormat and Partitioners can be examined by some 
generated data set instead of PseudoRandomVertexInputFormat.

As you mentioned, I modified PageRank/RandomMessageBenchmark to use a specified 
InputFormat and an input path. If the input format and input path are not 
given, they will work as the current implementation using 
PseudoRandomVertexInputFormat.

 Implement a Graph Generator
 ---

 Key: GIRAPH-68
 URL: https://issues.apache.org/jira/browse/GIRAPH-68
 Project: Giraph
  Issue Type: New Feature
  Components: benchmark
Affects Versions: 0.70.0
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch


 To provide users with benchmark environments and to deeply test the 
 input/output system of giraph, we need a graph generator. We will enable the 
 graph generator to generate various kinds of graph data sets by specifying a 
 VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-68) Implement a Graph Generator

2011-11-17 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152185#comment-13152185
 ] 

Hyunsik Choi commented on GIRAPH-68:


I missed javadoc. I will reattach the patch including javadoc.

 Implement a Graph Generator
 ---

 Key: GIRAPH-68
 URL: https://issues.apache.org/jira/browse/GIRAPH-68
 Project: Giraph
  Issue Type: New Feature
  Components: benchmark
Affects Versions: 0.70.0
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch


 To provide users with benchmark environments and to deeply test the 
 input/output system of giraph, we need a graph generator. We will enable the 
 graph generator to generate various kinds of graph data sets by specifying a 
 VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-17 Thread Avery Ching (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching resolved GIRAPH-91.
---

Resolution: Fixed

Thanks for the quick review Claudio.  Hudson +1'ed it as well, resolving.

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-91.diff


 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152203#comment-13152203
 ] 

Avery Ching commented on GIRAPH-96:
---

The general issue of overloading our memory and working out of core has been 
discussed a little in GIRAPH-45 as well.  I suppose you could implement a 
BasicVertex that loaded everything on demand from HBase, but I suspect it would 
be a little slow, but depends on the application.

 Support for Graphs with Huge adjacency lists
 

 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.70.0
Reporter: Arun Suresh

 Currently the vertex initialize() method is passed the complete adjacency 
 list as a HashMap. All the current concrete implementations of Vertex iterate 
 over the adjacency list and recreate new Data Structures within the Vertex 
 instance to hold/manipulate the adjacency list. This would seize to be 
 feasible once the size of the adjacency list becomes really huge.
 I propose storing the adjacency list and all vertex information (and incoming 
 messages ?) in a distributed data store such as HBase. The adjacency list can 
 be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
 row Id is a concatenation of VertexID+OutboundVertexId with a single column 
 containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-17 Thread Attila Csordas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152284#comment-13152284
 ] 

Attila Csordas commented on GIRAPH-86:
--

it didn't help yesterday but I checked out the latest code today and now 
rat:check is ok. This time cannot build the project due to Failed to execute 
goal on project giraph: Could not resolve dependencies for project 
org.apache.giraph:giraph:jar:0.70: Could not find artifact 
org.apache.hadoop:hadoop-common:jar:0.21.0-dev-SNAPSHOT - [Help 1]

do we need that at all?

 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Attachments: GIRAPH-86.patch, pom.diff


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Claudio Martella (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152291#comment-13152291
 ] 

Claudio Martella commented on GIRAPH-96:


it is indeed a nice discussion, the amount of data to be read is the same after 
all, but we're talking about random i/o here. It's a possibility. Also, 
answering to Gianmarco, the idea of having an HBase InputReader is the same as 
the current discussion on supporting Hive, Pig and HCatalog. If you store your 
data in HBase it can be quite useful, as much as it is not for MR. The 
lazy-approach could be something to investigate  and anyway something that 
would be necessary only with huge graphs or, as in my case, where we have 
computations that don't necessarily touch the whole graph.

Good out-of-core data structures/maps are difficult to find around, maybe 
linkedin's krati or leveldb (but i guess we'd have license issues there).

 Support for Graphs with Huge adjacency lists
 

 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.70.0
Reporter: Arun Suresh

 Currently the vertex initialize() method is passed the complete adjacency 
 list as a HashMap. All the current concrete implementations of Vertex iterate 
 over the adjacency list and recreate new Data Structures within the Vertex 
 instance to hold/manipulate the adjacency list. This would seize to be 
 feasible once the size of the adjacency list becomes really huge.
 I propose storing the adjacency list and all vertex information (and incoming 
 messages ?) in a distributed data store such as HBase. The adjacency list can 
 be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
 row Id is a concatenation of VertexID+OutboundVertexId with a single column 
 containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152328#comment-13152328
 ] 

Jakob Homan commented on GIRAPH-86:
---

+1.  I've committed this.  Not sure why you ended up with a reference to Hadoop 
0.21.  That's an unstable, unsupported version.  I can't find a reference to it 
in our pom.  In terms of the rat check, rather than making rat ignore the file 
we should really just fix GIRAPH-20, which is triggering this.  Thanks for the 
contribution, Attila!

 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Fix For: 0.70.0

 Attachments: GIRAPH-86.patch, pom.diff


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152337#comment-13152337
 ] 

Hudson commented on GIRAPH-86:
--

Integrated in Giraph-trunk-Commit #37 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/37/])
GIRAPH-86. Simplify boolean expressions in ZooKeeperExt::createExt. 
Contributed by Attila Csordas.

jghoman : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1203369
Files : 
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java


 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Fix For: 0.70.0

 Attachments: GIRAPH-86.patch, pom.diff


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152348#comment-13152348
 ] 

Jakob Homan commented on GIRAPH-77:
---

I've got a bit of code, but enough to block progress.  I can see this as 
separate work from 76. I was looking at starting up a webserver on the 
coordinator to track all the info it's normally difficult to get during job 
execution. If this comes up as part of 76, go for it. 

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-17 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-92:
--

Attachment: GIRAPH-92-2.patch

Updated patch per Avery's comments. Will commit this.

 Need outputformat for just vertex ID and value
 --

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0

 Attachments: GIRAPH-92-2.patch, GIRAPH-92.patch


 We should have an text outputformat that just spits out the vertex id and 
 value without its edges:
 {noformat}index.html 0.9423{noformat}
 This would be particularly helpful for further processing by, for instance, 
 Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152356#comment-13152356
 ] 

Jakob Homan commented on GIRAPH-83:
---

bq. I think it should probably actually be called Vertex, because everything 
is a BasicVertex currently, so it makes sense instead to say everything is 
a Vertex.
Absolutely agreed.

bq. We can factor off lots of stuff into other classes, but the question comes 
down to how does the user writing their algorithm get access to them? How is it 
all wired together? You want compute() to get passed some state that you have 
right when you need it, instead of either going with inheritance or 
composition? That could be nice, I think, as long as we package it all up into 
a minimal set of *Context-like objects to carry around.
Correct, this is what I'm getting at.

bq. In what way are the out edges of a vertex managed by the framework 
currently?
In that Vertex is responsible for maintaining the destEdgeMap for an 
implementation of Vertex, rather than implementers having to do this 
themselves.  For each compute invocation, the vertex shouldn't assume anything 
about its outgoing edges, as they may have been mutated since the last call.

 Is Vertex correct yet?
 --

 Key: GIRAPH-83
 URL: https://issues.apache.org/jira/browse/GIRAPH-83
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 I'm seeing a number of people run into oddities with Vertex and am thinking 
 we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152360#comment-13152360
 ] 

Jakob Homan commented on GIRAPH-45:
---

On a side note, is it worth considering messages to be immutable (or provide a 
separate annotation for these)? This would help with message de-duplication, 
which could be a significant help in some algorithms.  One would only need to 
keep one copy of the message going to a particular worker, regardless of the 
number of vertices it is bound for.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-17 Thread Jake Mannix (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152373#comment-13152373
 ] 

Jake Mannix commented on GIRAPH-83:
---

bq. In that Vertex is responsible for maintaining the destEdgeMap for an 
implementation of Vertex, rather than implementers having to do this 
themselves. For each compute invocation, the vertex shouldn't assume anything 
about its outgoing edges, as they may have been mutated since the last call.

You mean that in the current Vertex class, we have the map of edges right 
there?  It's not really in the framework, it's in the superclass, but ok, 
you're saying we *shouldn't* take care of the bookkeeping about this, and leave 
it always up to the implementations (like the way that 
LongDoubleFloatDoubleVertex does it with primitives)?  Or that there should be 
some other structure which handles them?

 Is Vertex correct yet?
 --

 Key: GIRAPH-83
 URL: https://issues.apache.org/jira/browse/GIRAPH-83
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 I'm seeing a number of people run into oddities with Vertex and am thinking 
 we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152383#comment-13152383
 ] 

Jakob Homan commented on GIRAPH-83:
---

I'm saying we should be responsible for maintaining it (since we have to mutate 
it), but that _maybe_ it shouldn't be in Vertex itself, just to have a cleaner 
delineation. But Avery makes a good point and I'm not completely sold on this 
aspect myself.   How many different memory efficient implementations of Vertex 
can we expect to have?

 Is Vertex correct yet?
 --

 Key: GIRAPH-83
 URL: https://issues.apache.org/jira/browse/GIRAPH-83
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 I'm seeing a number of people run into oddities with Vertex and am thinking 
 we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152386#comment-13152386
 ] 

Jakob Homan commented on GIRAPH-77:
---

s/enough/not enough/g. oy.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-17 Thread Attila Csordas (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Csordas reassigned GIRAPH-84:


Assignee: Attila Csordas

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-17 Thread Shaunak Kashyap (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152469#comment-13152469
 ] 

Shaunak Kashyap commented on GIRAPH-84:
---

I believe that style goes against the coding conventions for this project (see 
http://svn.apache.org/repos/asf/incubator/giraph/trunk/CODE_CONVENTIONS).

I wonder if using the ternary operator like so is acceptable:

{code}
return (seenRecord ? 1f : 0f);
{code}

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-17 Thread Claudio Martella (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152477#comment-13152477
 ] 

Claudio Martella commented on GIRAPH-45:


I was struggling with this and the annotation could actually be the elegant 
solution to this. Avery asked for a review one month ago about this. We could 
clone those that are not annotated as a general approach.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-17 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152488#comment-13152488
 ] 

Avery Ching commented on GIRAPH-84:
---

ternary is fine with me.  I think we use it in the codebase.  We should 
probably add it to the coding conventions...unless someone objects.

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152583#comment-13152583
 ] 

Hyunsik Choi commented on GIRAPH-77:


I also think that this feature is necessary because we would not depend on 
MapReduce anymore after we port Giraph to Yarn.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira