[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-21 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234479#comment-13234479
 ] 

Brian Femiano commented on GIRAPH-159:
--

Steps to run.

1) Follow http://ssc.io/running-giraphs-unit-tests-in-pseudo-distributed-mode/ 
for running the single node unit tests on an OSX.
2) Before the M/R jobs can even begin, the JobTracker will throw an IOException 
indicating
   it cannot mkdirs on 'license'. 
3) The operating system does not distinguish between the directory 'license' 
and the file 'LICENSE'. 


 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-165) checkstyle error: 'conf' hides a field on line 154 of GraphRunner

2012-03-21 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234815#comment-13234815
 ] 

Brian Femiano commented on GIRAPH-165:
--

In GIRAPH-159 I found another work around that avoid the local variable. Since 
the reference is
only used once I chain the call together job.getConfiguration().~ instead of 
conf.~

This would also work. 

 checkstyle error:  'conf' hides a field on line 154 of GraphRunner
 

 Key: GIRAPH-165
 URL: https://issues.apache.org/jira/browse/GIRAPH-165
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Minor
 Attachments: GIRAPH-165.patch


 full checkstyle error is 
 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
 error line=154 column=21 severity=error message=apos;confapos; 
 hides a field. 
 source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-165) checkstyle error: 'conf' hides a field on line 154 of GraphRunner

2012-03-21 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234816#comment-13234816
 ] 

Brian Femiano commented on GIRAPH-165:
--

In GIRAPH-159 I found another work around that avoid the local variable. Since 
the reference is
only used once I chain the call together job.getConfiguration().~ instead of 
conf.~

This would also work. 

 checkstyle error:  'conf' hides a field on line 154 of GraphRunner
 

 Key: GIRAPH-165
 URL: https://issues.apache.org/jira/browse/GIRAPH-165
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Minor
 Attachments: GIRAPH-165.patch


 full checkstyle error is 
 {code}
 file 
 name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java
 error line=154 column=21 severity=error message=apos;confapos; 
 hides a field. 
 source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/
 /file
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-23 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236660#comment-13236660
 ] 

Brian Femiano commented on GIRAPH-159:
--

Any luck recreating this? I have to keep this change local until it's 
committed. 

 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
Priority: Minor
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-24 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237609#comment-13237609
 ] 

Brian Femiano commented on GIRAPH-159:
--

I figured out what's causing it.

It's a result of adding my hbase dependency to the pom.xml 

dependency
  groupIdorg.apache.hbase/groupId
  artifactIdhbase/artifactId
  version0.92.1/version
/dependency

Compile the jar and you should see a new 'license' directory.

jar tvf giraph-0.2-SNAPSHOT-jar-with-dependencies.jar | grep -i 'license'

1358 Mon Mar 16 00:31:16 EDT 2009 META-INF/LICENSE.txt
 11358 Mon Nov 19 00:16:46 EST 2007 META-INF/LICENSE
  1596 Mon Dec 20 14:42:08 EST 2010 LICENSE
 11560 Tue Aug 23 13:48:08 EDT 2011 
META-INF/maven/org.xerial.snappy/snappy-java/LICENSE
 0 Mon Feb 07 21:38:56 EST 2011 META-INF/license/
  1592 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.base64.txt
 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.commons-logging.txt
 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.felix.txt
 26441 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jboss-logging.txt
  1592 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jsr166y.txt
  1465 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jzlib.txt
 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.log4j.txt
  1732 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.protobuf.txt
  1203 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.slf4j.txt
 11358 Fri Jan 21 17:06:30 EST 2011 LICENSE.txt
  1062 Tue Oct 25 10:29:02 EDT 2011 
META-INF/jruby.home/lib/ruby/gems/1.8/gems/rake-0.8.7/MIT-LICENSE


 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-24 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237616#comment-13237616
 ] 

Brian Femiano commented on GIRAPH-159:
--

giraph-0.2-SNAPSHOT-jar-with-dependencies.jar goes from being ~5MB in size to 
~34MB once all the
hbase dependencies are unpacked.  

mvn verify takes about 1.5 hours to run with the pseudo-distributed unit tests. 


 Case insensitive file/directory name matching will produce errors on M/R jar 
 unpack. 
 -

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano
 Attachments: GIRAPH-159.patch, compile.xml


 This only seems to affect platforms where there can be a file/directory 
 naming conflicts
 from case insensitive matches. 
  
 I was able to reproduce running the pseudo-distributed unit tests within OSX.
 This has affected other projects: 
 https://issues.apache.org/jira/browse/MAHOUT-780
 I've been able to reproduce this on my local OSX install with the following 
 error:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8
 Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
 exclude any LICENSE matches found in the unpacked dependency jars
 when the maven assembly phase hits 'jar-with-dependencies'. 
 I have a patch which moves the 'jar-with-dependencies' descriptor to an 
 external compile.xml file which has the proper excludes. This might also
 come in handy down the road should any additional tweaks be needed to the 
 compile phase. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-25 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237937#comment-13237937
 ] 

Brian Femiano commented on GIRAPH-153:
--

The Accumulo team is about to release a new version that should have a 
published maven
artifact. 

I'm concerned with how fat the jar becomes once the HBase core files are 
coalesced into the Giraph jar. 

It goes from a reasonable 5MB in size to 34MB. This causes quite a slow down 
with the distributed
unit tests. We may want to consider having the Hbase-contrib in a separate 
submodule, much the same 
way Hive does with the HBaseStorageHandler. Giraph users that desire HBase 
support will need the main
giraph jar, the hbase-contrib-jar, and any hbase dependencies. 

Thoughts?

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-25 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238111#comment-13238111
 ] 

Brian Femiano commented on GIRAPH-153:
--

Jakob, that's exactly along the lines of what I was thinking. A separate module 
that builds along side the main giraph jar for extra
functionality. Users can see which version of HBase we've compiled against. 

People can use this 'giraph-hbase-contrib.jar' by including giraph, a 
compatible version of HBase, and all related dependencies on the classpath.

To build, it will list giraph as a dependency in maven. 

Let me finish up my unit tests this week and I'll post a patch along with the 
new files. 

The equivalent Accumulo support will take a little longer pending their 
published maven artifact. 



 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-02 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244645#comment-13244645
 ] 

Brian Femiano commented on GIRAPH-153:
--

Avery and Jakob. Here's what I've got setup. I wanted to double-check this 
before moving
forward with the project template.

1) I have a subproject 'giraph-formats-contrib' under the giraph trunk that 
depends on giraph 0.2-SNAPSHOT. Since this is not yet hosted in maven central I 
installed it to my local repo. Note this is only necessary if you wish to build 
the subproject. Not this is not a maven submodule that builds as a dependency. 
It's entirely standalone. 

2) The subproject hosts the Accumulo 1.4.0 and HBase 0.92.1 abstract 
input/output formats, and any future derived implementations. 

3) I copied the BspCase Junit class into the subproject redundantly. The 
subproject is builds and tests entirly standalone from the main giraph build, 
except for the dependency giraph.jar. Unfortuantely, the test classes are not 
included in the fat jar, so I copied one class into the build for future unit 
testing. 

I'm moving forward with the unit tests. If you guys have think I should change 
anything I'll happily rework my structure. The main thing I strived for was 
total separation from the main build. It simply uses Giraph as a jar 
dependency. 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252212#comment-13252212
 ] 

Brian Femiano commented on GIRAPH-153:
--

Patch contains the entire submodule including HBase and Accumulo unit tests. It 
has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with 
Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass 
reading/writing to and from these datastores. 

The test package shows a few example subclasses which were needed to verify the 
behavior. For now they only run in local mode and will be disabled if the user 
supplies a jobtracker URI. 

It builds exactly as described in the earlier comments. Simply run 'mvn verify' 
and you'll get an isolated build. 

A few caveats:

1) Users must 'mvn install' the giraph artifact in their local repo, at least 
until we get something posted on maven central.
2) I modified the pom.xml to exclude the artifact from the rat plugin. I 
realize this is less than desirable, but I couldn't get anything running
   despite numerous attempts at fixing the too many unapproved licenses 
issues. I'm interested to hear your guys thoughts.
3) Duplicate BspCase in my submodule, at least until Giraph has a test 
artifact. '
4) Initializing the AccumuloVertexInputFormat has some procedural limitations 
inherent in the format design when run with the GiraphJob. It really expects to 
have control of the Job instance. These can be difficult to track down. I tried 
to document these in my unit tests and provide some simple error wrapping to 
help notify users when they see these. 
5) No README.txt or any wiki entry yet. I figured I'd wait and see what 
feedback you guys had.

Hopefully people will find the submodule useful. 


 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-18 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256716#comment-13256716
 ] 

Brian Femiano commented on GIRAPH-180:
--


It's not uncommon to see general release artifacts and some supporting 
documentation telling you what versions
they're tested against. The published artifact could reflect the most commonly 
used version to which it's been tested against (0.20.203). 
Special versions like FB Hadoop would still have to be built by hand from trunk.

Of course since there's only a few different variations for Giraph, it probably 
wouldn't hurt to host them all. 

 Publish SNAPSHOTs and released artifacts in the Maven repository
 

 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 Currently Giraph uses Maven to drive its build.
 However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
 repository or Maven central.
 It would be useful to have Apache Giraph artifacts and SNAPSHOTs published 
 and enable people to use Giraph without recompiling themselves.
 Right now users can checkout Giraph, mvn install it and use this for their 
 dependency:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version0.2-SNAPSHOT/version
 /dependency
 So, it's not that bad, but it can be better. :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256750#comment-13256750
 ] 

Brian Femiano commented on GIRAPH-153:
--

Sorry HAMA-544

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256849#comment-13256849
 ] 

Brian Femiano commented on GIRAPH-153:
--


| we'll always have a volunteer with hbase/accumulo knowledge to keep the code 
up to date

I will gladly do that for the foreseeable future, should this patch get 
accepted into Giraph. 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-19 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257553#comment-13257553
 ] 

Brian Femiano commented on GIRAPH-153:
--

Sounds good Avery. I'll add the checkstyle plugin and work on cleaning 
everything up a bit more.

The rat integration could take me a little longer to troubleshoot. I thought I 
had included all the proper licenses, but maybe not. 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira