Re: Status report
Is it worth mentioning the UC Irvine connection? ... ? Is that the low-budget sequel to the classic Gene Hackman film? On Mon, Apr 2, 2012 at 10:20 PM, Avery Ching ach...@apache.org wrote: Looks good to me as well. Avery On 4/2/12 10:17 PM, Owen O'Malley wrote: That looks great, Jakob. I've put that into the wiki for now until we have further edits. -- Owen
[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph
[ https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245145#comment-13245145 ] Paolo Castagna commented on GIRAPH-141: --- Just to add another example of multigraph: [RDF|http://en.wikipedia.org/wiki/Resource_Description_Framework] data model is a labelled directed multigraph. mulitgraph support in giraph Key: GIRAPH-141 URL: https://issues.apache.org/jira/browse/GIRAPH-141 Project: Giraph Issue Type: Improvement Components: graph Reporter: André Kelpe The current vertex API only supports simple graphs, meaning that there can only ever be one edge between two vertices. Many graphs like the road network are in fact multigraphs, where many edges can connect two vertices at the same time. Support for this could be added by introducing an IteratorEdgeWritable getEdgeValue() or a similar construct. Maybe introducing a slim object like a Connector between the edge and the vertex is also a good idea, so that you could do something like: {code} for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){ final EdgeWritable edge = conn.getEdge(); final VertexWritable otherVertex = conn.getOther(); doInterestingStuff(otherVertex); doMoreInterestingStuff(edge); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Status report
Hi Avery, Yep, the Hadoop Summit talk is definitely worth mentioning and yeah I just wanted to indicate the connection with UCI b/c it was relevant to me (broader use within academia). Cheers, Chris On Apr 3, 2012, at 9:31 AM, Avery Ching wrote: I'm not sure if anything came out of those discussions, but maybe it's worth mentioning. One more thing worth mentioning is that the Giraph talk, Processing over a billion edges on Apache Giraph, was accepted for the Hadoop Summit 2012 (http://hadoopsummit.org/program/). Avery On 4/3/12 7:20 AM, Mattmann, Chris A (388J) wrote: Hi Jakob, On Apr 2, 2012, at 11:34 PM, Jakob Homan wrote: Is it worth mentioning the UC Irvine connection? ... ? Is that the low-budget sequel to the classic Gene Hackman film? LOL ummm no :) But, this is what I was talking about: http://s.apache.org/ZUG Looks like the students were doing something similar in that class, and there was some mailing list discussion about it in Jan 2012. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245480#comment-13245480 ] Avery Ching commented on GIRAPH-153: From what you've described, sounds good to me. In the worst case, we can change it to a submodule if that makes more sense in the future. I would like to use a similar approach for https://issues.apache.org/jira/browse/GIRAPH-93, as Jakob mentioned. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-141) mulitgraph support in giraph
[ https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245484#comment-13245484 ] Avery Ching commented on GIRAPH-141: Yes, I also think this is an important feature. Anyone want to work on it? =) mulitgraph support in giraph Key: GIRAPH-141 URL: https://issues.apache.org/jira/browse/GIRAPH-141 Project: Giraph Issue Type: Improvement Components: graph Reporter: André Kelpe The current vertex API only supports simple graphs, meaning that there can only ever be one edge between two vertices. Many graphs like the road network are in fact multigraphs, where many edges can connect two vertices at the same time. Support for this could be added by introducing an IteratorEdgeWritable getEdgeValue() or a similar construct. Maybe introducing a slim object like a Connector between the edge and the vertex is also a good idea, so that you could do something like: {code} for (final ConnectorEdgeWritable, VertexWritable conn: getEdgeValues(){ final EdgeWritable edge = conn.getEdge(); final VertexWritable otherVertex = conn.getOther(); doInterestingStuff(otherVertex); doMoreInterestingStuff(edge); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245488#comment-13245488 ] Avery Ching commented on GIRAPH-169: This is a simple case. I'll try and see if I can replicate it sometime this week. Feel free to bug me if I forget. =) How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated GIRAPH-168: - Attachment: GIRAPH-168.patch Latest patch flips the set of munge directives from {HADOOP_NEWRPC, HADOOP_SECURE} to {HADOOP_OLDRPC,HADOOP_NON_SECURE}. HADOOP_NON_SECURE is a flag used currently in trunk, so this is a return back to the current trunk state. Making old-RPC-signature and non-secure be the exceptional cases seems to me better because if we remove older Hadoop versions, we'll have also removed the need for having any munge directives. Please see the flag/profile matrix for this patch below: ||profile||HADOOP_OLDRPC||HADOOP_NON_SECURE|| |hadoop_non_secure|x|x| |hadoop_0.20.203|x|| |hadoop_0.23| | | |hadoop_trunk| | | |hadoop_facebook| | | Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Status report
Thanks Jakob, I signed off too! Cheers, Chris On Apr 3, 2012, at 12:15 PM, Jakob Homan wrote: I updated the text with Avery talk's and the UC Irvine work: http://wiki.apache.org/incubator/April2012#preview On Tue, Apr 3, 2012 at 9:37 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Avery, Yep, the Hadoop Summit talk is definitely worth mentioning and yeah I just wanted to indicate the connection with UCI b/c it was relevant to me (broader use within academia). Cheers, Chris On Apr 3, 2012, at 9:31 AM, Avery Ching wrote: I'm not sure if anything came out of those discussions, but maybe it's worth mentioning. One more thing worth mentioning is that the Giraph talk, Processing over a billion edges on Apache Giraph, was accepted for the Hadoop Summit 2012 (http://hadoopsummit.org/program/). Avery On 4/3/12 7:20 AM, Mattmann, Chris A (388J) wrote: Hi Jakob, On Apr 2, 2012, at 11:34 PM, Jakob Homan wrote: Is it worth mentioning the UC Irvine connection? ... ? Is that the low-budget sequel to the classic Gene Hackman film? LOL ummm no :) But, this is what I was talking about: http://s.apache.org/ZUG Looks like the students were doing something similar in that class, and there was some mailing list discussion about it in Jan 2012. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++