[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-02 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244645#comment-13244645
 ] 

Brian Femiano commented on GIRAPH-153:
--

Avery and Jakob. Here's what I've got setup. I wanted to double-check this 
before moving
forward with the project template.

1) I have a subproject 'giraph-formats-contrib' under the giraph trunk that 
depends on giraph 0.2-SNAPSHOT. Since this is not yet hosted in maven central I 
installed it to my local repo. Note this is only necessary if you wish to build 
the subproject. Not this is not a maven submodule that builds as a dependency. 
It's entirely standalone. 

2) The subproject hosts the Accumulo 1.4.0 and HBase 0.92.1 abstract 
input/output formats, and any future derived implementations. 

3) I copied the BspCase Junit class into the subproject redundantly. The 
subproject is builds and tests entirly standalone from the main giraph build, 
except for the dependency giraph.jar. Unfortuantely, the test classes are not 
included in the fat jar, so I copied one class into the build for future unit 
testing. 

I'm moving forward with the unit tests. If you guys have think I should change 
anything I'll happily rework my structure. The main thing I strived for was 
total separation from the main build. It simply uses Giraph as a jar 
dependency. 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Status report

2012-04-02 Thread Jakob Homan
I'll do it tonight.

On Mon, Apr 2, 2012 at 4:14 PM, Owen O'Malley omal...@apache.org wrote:
 All,
  We need a status report for the last quarter by Wednesday. Anyone
 want to take the first shot at it?

 -- Owen


Re: Status report

2012-04-02 Thread Avery Ching

Thanks Jakob.

Avery

On 4/2/12 4:31 PM, Jakob Homan wrote:

I'll do it tonight.

On Mon, Apr 2, 2012 at 4:14 PM, Owen O'Malleyomal...@apache.org  wrote:

All,
  We need a status report for the last quarter by Wednesday. Anyone
want to take the first shot at it?

-- Owen




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244972#comment-13244972
 ] 

Jakob Homan commented on GIRAPH-153:


bq. I have a subproject 'giraph-formats-contrib'
This sounds like a good name as we can also stash the Hive work Avery has done 
there.

bq. Not this is not a maven submodule that builds as a dependency. It's 
entirely standalone. 
What are the advantages of this approach compard to a maven submodule (keeping 
in mind that I'm a Maven moron)? 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Status report

2012-04-02 Thread Owen O'Malley
That looks great, Jakob. I've put that into the wiki for now until we
have further edits.

-- Owen


Re: Status report

2012-04-02 Thread Avery Ching

Looks good to me as well.

Avery

On 4/2/12 10:17 PM, Owen O'Malley wrote:

That looks great, Jakob. I've put that into the wiki for now until we
have further edits.

-- Owen