[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234479#comment-13234479 ] Brian Femiano commented on GIRAPH-159: -- Steps to run. 1) Follow http://ssc.io/running-giraphs-unit-tests-in-pseudo-distributed-mode/ for running the single node unit tests on an OSX. 2) Before the M/R jobs can even begin, the JobTracker will throw an IOException indicating it cannot mkdirs on 'license'. 3) The operating system does not distinguish between the directory 'license' and the file 'LICENSE'. Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Priority: Minor Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-165) checkstyle error: 'conf' hides a field on line 154 of GraphRunner
[ https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234815#comment-13234815 ] Brian Femiano commented on GIRAPH-165: -- In GIRAPH-159 I found another work around that avoid the local variable. Since the reference is only used once I chain the call together job.getConfiguration().~ instead of conf.~ This would also work. checkstyle error: 'conf' hides a field on line 154 of GraphRunner Key: GIRAPH-165 URL: https://issues.apache.org/jira/browse/GIRAPH-165 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Priority: Minor Attachments: GIRAPH-165.patch full checkstyle error is {code} file name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java error line=154 column=21 severity=error message=apos;confapos; hides a field. source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/ /file {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-165) checkstyle error: 'conf' hides a field on line 154 of GraphRunner
[ https://issues.apache.org/jira/browse/GIRAPH-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234816#comment-13234816 ] Brian Femiano commented on GIRAPH-165: -- In GIRAPH-159 I found another work around that avoid the local variable. Since the reference is only used once I chain the call together job.getConfiguration().~ instead of conf.~ This would also work. checkstyle error: 'conf' hides a field on line 154 of GraphRunner Key: GIRAPH-165 URL: https://issues.apache.org/jira/browse/GIRAPH-165 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Priority: Minor Attachments: GIRAPH-165.patch full checkstyle error is {code} file name=/Users/ekoontz/giraph/src/main/java/org/apache/giraph/GiraphRunner.java error line=154 column=21 severity=error message=apos;confapos; hides a field. source=com.puppycrawl.tools.checkstyle.checks.coding.HiddenFieldCheck/ /file {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236660#comment-13236660 ] Brian Femiano commented on GIRAPH-159: -- Any luck recreating this? I have to keep this change local until it's committed. Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Priority: Minor Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237609#comment-13237609 ] Brian Femiano commented on GIRAPH-159: -- I figured out what's causing it. It's a result of adding my hbase dependency to the pom.xml dependency groupIdorg.apache.hbase/groupId artifactIdhbase/artifactId version0.92.1/version /dependency Compile the jar and you should see a new 'license' directory. jar tvf giraph-0.2-SNAPSHOT-jar-with-dependencies.jar | grep -i 'license' 1358 Mon Mar 16 00:31:16 EDT 2009 META-INF/LICENSE.txt 11358 Mon Nov 19 00:16:46 EST 2007 META-INF/LICENSE 1596 Mon Dec 20 14:42:08 EST 2010 LICENSE 11560 Tue Aug 23 13:48:08 EDT 2011 META-INF/maven/org.xerial.snappy/snappy-java/LICENSE 0 Mon Feb 07 21:38:56 EST 2011 META-INF/license/ 1592 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.base64.txt 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.commons-logging.txt 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.felix.txt 26441 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jboss-logging.txt 1592 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jsr166y.txt 1465 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.jzlib.txt 10174 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.log4j.txt 1732 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.protobuf.txt 1203 Mon Feb 07 21:38:38 EST 2011 META-INF/license/LICENSE.slf4j.txt 11358 Fri Jan 21 17:06:30 EST 2011 LICENSE.txt 1062 Tue Oct 25 10:29:02 EDT 2011 META-INF/jruby.home/lib/ruby/gems/1.8/gems/rake-0.8.7/MIT-LICENSE Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
[ https://issues.apache.org/jira/browse/GIRAPH-159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237616#comment-13237616 ] Brian Femiano commented on GIRAPH-159: -- giraph-0.2-SNAPSHOT-jar-with-dependencies.jar goes from being ~5MB in size to ~34MB once all the hbase dependencies are unpacked. mvn verify takes about 1.5 hours to run with the pseudo-distributed unit tests. Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano Attachments: GIRAPH-159.patch, compile.xml This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237937#comment-13237937 ] Brian Femiano commented on GIRAPH-153: -- The Accumulo team is about to release a new version that should have a published maven artifact. I'm concerned with how fat the jar becomes once the HBase core files are coalesced into the Giraph jar. It goes from a reasonable 5MB in size to 34MB. This causes quite a slow down with the distributed unit tests. We may want to consider having the Hbase-contrib in a separate submodule, much the same way Hive does with the HBaseStorageHandler. Giraph users that desire HBase support will need the main giraph jar, the hbase-contrib-jar, and any hbase dependencies. Thoughts? HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238111#comment-13238111 ] Brian Femiano commented on GIRAPH-153: -- Jakob, that's exactly along the lines of what I was thinking. A separate module that builds along side the main giraph jar for extra functionality. Users can see which version of HBase we've compiled against. People can use this 'giraph-hbase-contrib.jar' by including giraph, a compatible version of HBase, and all related dependencies on the classpath. To build, it will list giraph as a dependency in maven. Let me finish up my unit tests this week and I'll post a patch along with the new files. The equivalent Accumulo support will take a little longer pending their published maven artifact. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244645#comment-13244645 ] Brian Femiano commented on GIRAPH-153: -- Avery and Jakob. Here's what I've got setup. I wanted to double-check this before moving forward with the project template. 1) I have a subproject 'giraph-formats-contrib' under the giraph trunk that depends on giraph 0.2-SNAPSHOT. Since this is not yet hosted in maven central I installed it to my local repo. Note this is only necessary if you wish to build the subproject. Not this is not a maven submodule that builds as a dependency. It's entirely standalone. 2) The subproject hosts the Accumulo 1.4.0 and HBase 0.92.1 abstract input/output formats, and any future derived implementations. 3) I copied the BspCase Junit class into the subproject redundantly. The subproject is builds and tests entirly standalone from the main giraph build, except for the dependency giraph.jar. Unfortuantely, the test classes are not included in the fat jar, so I copied one class into the build for future unit testing. I'm moving forward with the unit tests. If you guys have think I should change anything I'll happily rework my structure. The main thing I strived for was total separation from the main build. It simply uses Giraph as a jar dependency. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252212#comment-13252212 ] Brian Femiano commented on GIRAPH-153: -- Patch contains the entire submodule including HBase and Accumulo unit tests. It has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass reading/writing to and from these datastores. The test package shows a few example subclasses which were needed to verify the behavior. For now they only run in local mode and will be disabled if the user supplies a jobtracker URI. It builds exactly as described in the earlier comments. Simply run 'mvn verify' and you'll get an isolated build. A few caveats: 1) Users must 'mvn install' the giraph artifact in their local repo, at least until we get something posted on maven central. 2) I modified the pom.xml to exclude the artifact from the rat plugin. I realize this is less than desirable, but I couldn't get anything running despite numerous attempts at fixing the too many unapproved licenses issues. I'm interested to hear your guys thoughts. 3) Duplicate BspCase in my submodule, at least until Giraph has a test artifact. ' 4) Initializing the AccumuloVertexInputFormat has some procedural limitations inherent in the format design when run with the GiraphJob. It really expects to have control of the Job instance. These can be difficult to track down. I tried to document these in my unit tests and provide some simple error wrapping to help notify users when they see these. 5) No README.txt or any wiki entry yet. I figured I'd wait and see what feedback you guys had. Hopefully people will find the submodule useful. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository
[ https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256716#comment-13256716 ] Brian Femiano commented on GIRAPH-180: -- It's not uncommon to see general release artifacts and some supporting documentation telling you what versions they're tested against. The published artifact could reflect the most commonly used version to which it's been tested against (0.20.203). Special versions like FB Hadoop would still have to be built by hand from trunk. Of course since there's only a few different variations for Giraph, it probably wouldn't hurt to host them all. Publish SNAPSHOTs and released artifacts in the Maven repository Key: GIRAPH-180 URL: https://issues.apache.org/jira/browse/GIRAPH-180 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.1.0 Reporter: Paolo Castagna Priority: Minor Original Estimate: 4h Remaining Estimate: 4h Currently Giraph uses Maven to drive its build. However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven repository or Maven central. It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and enable people to use Giraph without recompiling themselves. Right now users can checkout Giraph, mvn install it and use this for their dependency: dependency groupIdorg.apache.giraph/groupId artifactIdgiraph/artifactId version0.2-SNAPSHOT/version /dependency So, it's not that bad, but it can be better. :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256750#comment-13256750 ] Brian Femiano commented on GIRAPH-153: -- Sorry HAMA-544 HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256849#comment-13256849 ] Brian Femiano commented on GIRAPH-153: -- | we'll always have a volunteer with hbase/accumulo knowledge to keep the code up to date I will gladly do that for the foreseeable future, should this patch get accepted into Giraph. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257553#comment-13257553 ] Brian Femiano commented on GIRAPH-153: -- Sounds good Avery. I'll add the checkstyle plugin and work on cleaning everything up a bit more. The rat integration could take me a little longer to troubleshoot. I thought I had included all the proper licenses, but maybe not. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira