date:20120411

[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Jakob Homan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251884#comment-13251884
]

Jakob Homan commented on GIRAPH-182:

Hey Pradeep. Thanks for the contribution.
Review:
* Apache prohibits author tags to ensure that all the code is viewed as the
whole community's responsiblity.
* SimpleSequenceFileVertexOutputFormat: We've thus far had the convention of
using the type names in the in/outputformats. This is a bit verbose and may not
be the right approach, but it's probably best to keep it in this patch. Also
can you provide javadoc for it?
* SequenceFileVertexOutputFormat: Any reason not to use the more standard M
type variable? Some Javadoc for the class would be nice here too.
* Is it possible to add a unit test just to verify we get out from the file
what we put in?

Provide SequenceFileVertexOutputFormat as an available OutputFormat
---

Key: GIRAPH-182
URL: https://issues.apache.org/jira/browse/GIRAPH-182
Project: Giraph
Issue Type: New Feature
Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
Attachments: GIRAPH-182-1.patch

SequenceFile's are heavily used in Hadoop. We should provide
SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is
already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

2012-04-11 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251940#comment-13251940
 ] 

Hudson commented on GIRAPH-179:
---

Integrated in Giraph-trunk-Commit #101 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/101/])
GIRAPH-179: BspServiceMaster's PathFilter can be simplified. Contributed by 
Devaraj K. (Revision 1324991)

 Result = SUCCESS
jghoman : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1324991
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java


 BspServiceMaster's PathFilter can be simplified
 ---

 Key: GIRAPH-179
 URL: https://issues.apache.org/jira/browse/GIRAPH-179
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Devaraj K
Priority: Trivial
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-179.patch


 {code}  /**
* Only get the finalized checkpoint files
*/
   public static class FinalizedCheckpointPathFilter implements PathFilter {
 @Override
 public boolean accept(Path path) {
   if (path.getName().endsWith(
   BspService.CHECKPOINT_FINALIZED_POSTFIX)) {
 return true;
   }
   return false;
 }
   }{code}
 we can simplify this, eliminating the if statement and just returning the 
 result of {{endsWith()}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Pradeep Gollakota (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252127#comment-13252127
]

Pradeep Gollakota commented on GIRAPH-182:
--

Thanks for the review Jakob.
* I thought I removed the author tags. My IDE must have inserted it back in
when I wasn't looking. I'll double check before submitting further patches.
* I agree that the name is a bit too verbose. An option could be to abbreviate
part of the name. Maybe something like SimpleSeqFileVertexOF. I'll follow
whatever convention that you guys want to introduce.
* I used X instead of M because M is used for message type and I wanted to
distinguish it from that type. Since SequenceFileVertexInputFormat used X, I
decided to use the same variable.
* Absolutely. I wasn't sure how to test the abstract class but I can definitely
include a unit test for SimpleSequenceFileVertexOutputFormat. Any suggestions
on how to test the abstract class?

Provide SequenceFileVertexOutputFormat as an available OutputFormat
---

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brian Femiano updated GIRAPH-153:
-

Attachment: GIRAPH-153.patch

HBase/Accumulo Input and Output formats
---

Key: GIRAPH-153
URL: https://issues.apache.org/jira/browse/GIRAPH-153
Project: Giraph
Issue Type: New Feature
Components: bsp
Affects Versions: 0.1.0
Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
Attachments: GIRAPH-153.patch

Four abstract classes that wrap their respective delegate input/output
formats for
easy hooks into vertex input format subclasses. I've included some sample
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple directed
structure, starting with a few 'root' nodes.
Root nodes are defined as nodes which are not listed as a child anywhere in
the graph.
Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source.
Every vertex starts thinking it's a root. At superstep 0, send a message down
to each
child as a non-root notification. After superstep 1, only root nodes will
have never been messaged.
Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by
bundling the notification logic followed by root node propagation. Once we've
marked the appropriate nodes as roots, tell every child which roots it can be
traced back to via one or more spanning trees. This will take N + 2
supersteps where N is the maximum number of hops from any root to any leaf,
plus 2 supersteps for the initial root flagging.
I've included all relevant code plus DistributedCacheHelper.java for
recursive cache file and archive searches. It is more hadoop centric than
giraph, but these jobs use it so I figured why not commit here.
These have been tested through local JobRunner, pseudo-distributed on the
aforementioned hardware, and full distributed on EC2. More details in the
comments.

[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-11 Thread Brian Femiano (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252212#comment-13252212
]

Brian Femiano commented on GIRAPH-153:
--

Patch contains the entire submodule including HBase and Accumulo unit tests. It
has been tested against Accumulo 1.4 (latest release) and HBase 0.90.5 with
Zookeeper 3.3.3. It includes 4 abstract classes designed to help subclass
reading/writing to and from these datastores.

The test package shows a few example subclasses which were needed to verify the
behavior. For now they only run in local mode and will be disabled if the user
supplies a jobtracker URI.

It builds exactly as described in the earlier comments. Simply run 'mvn verify'
and you'll get an isolated build.

A few caveats:

1) Users must 'mvn install' the giraph artifact in their local repo, at least
until we get something posted on maven central.
2) I modified the pom.xml to exclude the artifact from the rat plugin. I
realize this is less than desirable, but I couldn't get anything running
despite numerous attempts at fixing the too many unapproved licenses
issues. I'm interested to hear your guys thoughts.
3) Duplicate BspCase in my submodule, at least until Giraph has a test
artifact. '
4) Initializing the AccumuloVertexInputFormat has some procedural limitations
inherent in the format design when run with the GiraphJob. It really expects to
have control of the Job instance. These can be difficult to track down. I tried
to document these in my unit tests and provide some simple error wrapping to
help notify users when they see these.
5) No README.txt or any wiki entry yet. I figured I'd wait and see what
feedback you guys had.

Hopefully people will find the submodule useful.

HBase/Accumulo Input and Output formats
---

[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

[jira] [Commented] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

[jira] [Updated] (GIRAPH-153) HBase/Accumulo Input and Output formats

[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

5 matches

Site Navigation

Mail list logo

Footer information