[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat
[ https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251884#comment-13251884 ] Jakob Homan commented on GIRAPH-182: Hey Pradeep. Thanks for the contribution. Review: * Apache prohibits author tags to ensure that all the code is viewed as the whole community's responsiblity. * SimpleSequenceFileVertexOutputFormat: We've thus far had the convention of using the type names in the in/outputformats. This is a bit verbose and may not be the right approach, but it's probably best to keep it in this patch. Also can you provide javadoc for it? * SequenceFileVertexOutputFormat: Any reason not to use the more standard M type variable? Some Javadoc for the class would be nice here too. * Is it possible to add a unit test just to verify we get out from the file what we put in? Provide SequenceFileVertexOutputFormat as an available OutputFormat --- Key: GIRAPH-182 URL: https://issues.apache.org/jira/browse/GIRAPH-182 Project: Giraph Issue Type: New Feature Components: lib Reporter: Pradeep Gollakota Assignee: Pradeep Gollakota Priority: Minor Attachments: GIRAPH-182-1.patch SequenceFile's are heavily used in Hadoop. We should provide SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is already provided, it makes sense to also provide a mirroring OutputFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified
[ https://issues.apache.org/jira/browse/GIRAPH-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251940#comment-13251940 ] Hudson commented on GIRAPH-179: --- Integrated in Giraph-trunk-Commit #101 (See [https://builds.apache.org/job/Giraph-trunk-Commit/101/]) GIRAPH-179: BspServiceMaster's PathFilter can be simplified. Contributed by Devaraj K. (Revision 1324991) Result = SUCCESS jghoman : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1324991 Files : * /incubator/giraph/trunk/CHANGELOG * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java BspServiceMaster's PathFilter can be simplified --- Key: GIRAPH-179 URL: https://issues.apache.org/jira/browse/GIRAPH-179 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Devaraj K Priority: Trivial Labels: newbie Fix For: 0.2.0 Attachments: GIRAPH-179.patch {code} /** * Only get the finalized checkpoint files */ public static class FinalizedCheckpointPathFilter implements PathFilter { @Override public boolean accept(Path path) { if (path.getName().endsWith( BspService.CHECKPOINT_FINALIZED_POSTFIX)) { return true; } return false; } }{code} we can simplify this, eliminating the if statement and just returning the result of {{endsWith()}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat
[ https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252127#comment-13252127 ] Pradeep Gollakota commented on GIRAPH-182: -- Thanks for the review Jakob. * I thought I removed the author tags. My IDE must have inserted it back in when I wasn't looking. I'll double check before submitting further patches. * I agree that the name is a bit too verbose. An option could be to abbreviate part of the name. Maybe something like SimpleSeqFileVertexOF. I'll follow whatever convention that you guys want to introduce. * I used X instead of M because M is used for message type and I wanted to distinguish it from that type. Since SequenceFileVertexInputFormat used X, I decided to use the same variable. * Absolutely. I wasn't sure how to test the abstract class but I can definitely include a unit test for SimpleSequenceFileVertexOutputFormat. Any suggestions on how to test the abstract class? Provide SequenceFileVertexOutputFormat as an available OutputFormat --- Key: GIRAPH-182 URL: https://issues.apache.org/jira/browse/GIRAPH-182 Project: Giraph Issue Type: New Feature Components: lib Reporter: Pradeep Gollakota Assignee: Pradeep Gollakota Priority: Minor Attachments: GIRAPH-182-1.patch SequenceFile's are heavily used in Hadoop. We should provide SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is already provided, it makes sense to also provide a mirroring OutputFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Femiano updated GIRAPH-153: - Attachment: GIRAPH-153.patch HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252212#comment-13252212 ] Brian Femiano commented on GIRAPH-153: -- Patch contains the entire submodule including HBase and Accumulo unit tests. It has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass reading/writing to and from these datastores. The test package shows a few example subclasses which were needed to verify the behavior. For now they only run in local mode and will be disabled if the user supplies a jobtracker URI. It builds exactly as described in the earlier comments. Simply run 'mvn verify' and you'll get an isolated build. A few caveats: 1) Users must 'mvn install' the giraph artifact in their local repo, at least until we get something posted on maven central. 2) I modified the pom.xml to exclude the artifact from the rat plugin. I realize this is less than desirable, but I couldn't get anything running despite numerous attempts at fixing the too many unapproved licenses issues. I'm interested to hear your guys thoughts. 3) Duplicate BspCase in my submodule, at least until Giraph has a test artifact. ' 4) Initializing the AccumuloVertexInputFormat has some procedural limitations inherent in the format design when run with the GiraphJob. It really expects to have control of the Job instance. These can be difficult to track down. I tried to document these in my unit tests and provide some simple error wrapping to help notify users when they see these. 5) No README.txt or any wiki entry yet. I figured I'd wait and see what feedback you guys had. Hopefully people will find the submodule useful. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: GIRAPH-153.patch Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira