[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251884#comment-13251884
 ] 

Jakob Homan commented on GIRAPH-182:


Hey Pradeep. Thanks for the contribution.
Review:
* Apache prohibits author tags to ensure that all the code is viewed as the 
whole community's responsiblity.
* SimpleSequenceFileVertexOutputFormat: We've thus far had the convention of 
using the type names in the in/outputformats. This is a bit verbose and may not 
be the right approach, but it's probably best to keep it in this patch.  Also 
can you provide javadoc for it?
* SequenceFileVertexOutputFormat: Any reason not to use the more standard M 
type variable? Some Javadoc for the class would be nice here too.
* Is it possible to add a unit test just to verify we get out from the file 
what we put in?




 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
 Attachments: GIRAPH-182-1.patch


 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

2012-04-11 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251940#comment-13251940
 ] 

Hudson commented on GIRAPH-179:
---

Integrated in Giraph-trunk-Commit #101 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/101/])
GIRAPH-179: BspServiceMaster's PathFilter can be simplified. Contributed by 
Devaraj K. (Revision 1324991)

 Result = SUCCESS
jghoman : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1324991
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java


 BspServiceMaster's PathFilter can be simplified
 ---

 Key: GIRAPH-179
 URL: https://issues.apache.org/jira/browse/GIRAPH-179
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Devaraj K
Priority: Trivial
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-179.patch


 {code}  /**
* Only get the finalized checkpoint files
*/
   public static class FinalizedCheckpointPathFilter implements PathFilter {
 @Override
 public boolean accept(Path path) {
   if (path.getName().endsWith(
   BspService.CHECKPOINT_FINALIZED_POSTFIX)) {
 return true;
   }
   return false;
 }
   }{code}
 we can simplify this, eliminating the if statement and just returning the 
 result of {{endsWith()}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Pradeep Gollakota (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252127#comment-13252127
 ] 

Pradeep Gollakota commented on GIRAPH-182:
--

Thanks for the review Jakob.
* I thought I removed the author tags. My IDE must have inserted it back in 
when I wasn't looking. I'll double check before submitting further patches.
* I agree that the name is a bit too verbose. An option could be to abbreviate 
part of the name. Maybe something like SimpleSeqFileVertexOF. I'll follow 
whatever convention that you guys want to introduce.
* I used X instead of M because M is used for message type and I wanted to 
distinguish it from that type. Since SequenceFileVertexInputFormat used X, I 
decided to use the same variable.
* Absolutely. I wasn't sure how to test the abstract class but I can definitely 
include a unit test for SimpleSequenceFileVertexOutputFormat. Any suggestions 
on how to test the abstract class?

 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
 Attachments: GIRAPH-182-1.patch


 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Femiano updated GIRAPH-153:
-

Attachment: GIRAPH-153.patch

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252212#comment-13252212
 ] 

Brian Femiano commented on GIRAPH-153:
--

Patch contains the entire submodule including HBase and Accumulo unit tests. It 
has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with 
Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass 
reading/writing to and from these datastores. 

The test package shows a few example subclasses which were needed to verify the 
behavior. For now they only run in local mode and will be disabled if the user 
supplies a jobtracker URI. 

It builds exactly as described in the earlier comments. Simply run 'mvn verify' 
and you'll get an isolated build. 

A few caveats:

1) Users must 'mvn install' the giraph artifact in their local repo, at least 
until we get something posted on maven central.
2) I modified the pom.xml to exclude the artifact from the rat plugin. I 
realize this is less than desirable, but I couldn't get anything running
   despite numerous attempts at fixing the too many unapproved licenses 
issues. I'm interested to hear your guys thoughts.
3) Duplicate BspCase in my submodule, at least until Giraph has a test 
artifact. '
4) Initializing the AccumuloVertexInputFormat has some procedural limitations 
inherent in the format design when run with the GiraphJob. It really expects to 
have control of the Job instance. These can be difficult to track down. I tried 
to document these in my unit tests and provide some simple error wrapping to 
help notify users when they see these. 
5) No README.txt or any wiki entry yet. I figured I'd wait and see what 
feedback you guys had.

Hopefully people will find the submodule useful. 


 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira