[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246478#comment-13246478 ] Jakob Homan commented on GIRAPH-168: My understanding was that the RPC changes FB had made were backports of changes that are in later versions, so I'm not sure if OldRPC is the correct description. Also, within the Hadoop world there's not really talk of old versus new RPC (except for the PB-based stuff, which will make this really confusing...). Hadoop security is API-incompatible with Hadoop non-security (due to changes in UGI) and FB's distro is insecure and API incompatible due to new APIs backported from more modern versions. Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats
[ https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246489#comment-13246489 ] Jakob Homan commented on GIRAPH-153: Sounds good to me as well. I'm fine with devs having to build/test against this subproject/module; this ensures we don't get out of synch with our adapters. My mail goal is to make sure anyone wanting just Giraph doesn't need the hbase/accumulo stuff and it sounds like this does that. Thanks for the hard work, Brian. HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple directed structure, starting with a few 'root' nodes. Root nodes are defined as nodes which are not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java -- Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP
[ https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246496#comment-13246496 ] Jakob Homan commented on GIRAPH-168: bq. except for the PB-based stuf Where PB = ProtocolBuffers and != FB because this isn't quite confusing enough. Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP - Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA renames HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify mvn -Phadoop_trunk clean verify mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246561#comment-13246561 ] Avery Ching commented on GIRAPH-77: --- Paolo, would you be interested in working on this? =) Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246581#comment-13246581 ] Paolo Castagna commented on GIRAPH-77: -- Hi Avery, I am still learning and stepping into the Apache Giraph source code (fortunately, it isn't that big) :-) Do you or Jakob have a favorite stack to do that? Jetty/Netty?, JAX-RS?, etc. Any specific web framework and/or template engine? Something small, something to minimize dependencies, ... I tend to use Jetty with plain servlets and Velocity. But I am open to suggestions. Ideally, we could/should publish JSON and render HTML pages client side (once again, I accept suggestions on JavaScript frameworks). I must warn you though, I am not a web|graphic designer (and I know my limits on the UI front). But, once the basic functionalities are in place and the correct data is available, I am sure some good web designer will fix that up. Coming back to your question, with some guidance, yes. I would like to give it a shot and I have time to dedicate to Apache Giraph. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246594#comment-13246594 ] Jakob Homan commented on GIRAPH-77: --- bq. Do you or Jakob have a favorite stack to do that? Nope. My code was using Scalatra as a learning exercise (and a trojan horse to get Scala into the project) and I was liking it a lot. That may be worth taking a look at. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246604#comment-13246604 ] Paolo Castagna commented on GIRAPH-77: -- Ok, I'll look at that tomorrow (our CTO likes Sinatra ;-)). At least Scala integrate seamlessly with Java (fingers crossed... and I need to double check dependencies and side effects on the Maven front). Where is your code? Have you already started on this? Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Giraph as Whirr service, see WHIRR-530
I don't use Whirr...I haven't heard it mentioned on this forum yet. Anyone? Avery On 4/4/12 9:30 PM, Paolo Castagna wrote: Hi, seen this? WHIRR-530 - Add Giraph as a service https://issues.apache.org/jira/browse/WHIRR-530 This could be quite useful for users who want to give Giraph a spin on cloud infrastructure, just for testing or to run a few small experiments. My experience with Whirr an small 10-20 nodes clusters has be quite positive. Less so for larger clusters, but it more a problem/limit with the cloud provider rather than Whirr itself. I think. Whirr makes extremely easy and pleasant deploy stuff on-demand. ... and Whirr already supports YARN: https://issues.apache.org/jira/browse/WHIRR-391 Is any Giraph developers/users here also a Whirr user? Paolo
Re: On helping new contributors pitch in quickly...
Ack!, I suck. Sorry. I hadn't realized we'd gone through most of them, which itself is a good thing. I'll get some new ones added first thing in the morning. Sorry. -Jakob On Wed, Apr 4, 2012 at 9:45 PM, Paolo Castagna castagna.li...@googlemail.com wrote: To help new contributors pitch in quickly, we maintain a set of JIRAs [1] that focus on getting new contributors started with the mechanics of generating a patch — downloading the source, changing a couple lines, creating a patch, verifying its correctness, uploading it to JIRA and working with the community — rather that deep technical issues within Giraph itself. These are good issues with which to join the community. This is nice, good idea indeed. Put more issues there (even if, at the moment, there does not seems to be much simple stuff that will get people started around). Things such as port Giraph to YARN or a new RPC layer are a bit scary for those just starting (like me). :-) Perhaps, another option is to increase number of examples. You already have a few interesting one, do you have one or two ideas on a couple of examples which could be added to Giraph? Paolo [1] http://bit.ly/newbie_apache_giraph_issues
Re: Giraph as Whirr service, see WHIRR-530
This is interesting. Whirr can already spin up Hadoop MR clusters, which can then run the Giraph jobs. Once Giraph is bootstrapped onto YARN, this will make more sense as a Whirr service. On Wed, Apr 4, 2012 at 9:43 PM, Avery Ching ach...@apache.org wrote: I don't use Whirr...I haven't heard it mentioned on this forum yet. Anyone? Avery On 4/4/12 9:30 PM, Paolo Castagna wrote: Hi, seen this? WHIRR-530 - Add Giraph as a service https://issues.apache.org/jira/browse/WHIRR-530 This could be quite useful for users who want to give Giraph a spin on cloud infrastructure, just for testing or to run a few small experiments. My experience with Whirr an small 10-20 nodes clusters has be quite positive. Less so for larger clusters, but it more a problem/limit with the cloud provider rather than Whirr itself. I think. Whirr makes extremely easy and pleasant deploy stuff on-demand. ... and Whirr already supports YARN: https://issues.apache.org/jira/browse/WHIRR-391 Is any Giraph developers/users here also a Whirr user? Paolo
[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.
[ https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247019#comment-13247019 ] Jakob Homan commented on GIRAPH-77: --- The code I've got is a bunch of messing with Scalatra and a few lines to bring in a new server per worker, but it's probably gone out of date. It's not worth your time really. I've got experience with integrating Scala into Java projects via Maven. Let me spin up a quick patch to demonstrate that, probably in the next day or so. Coordinator should expose a web interface with progress, vertex region assignments, etc. Key: GIRAPH-77 URL: https://issues.apache.org/jira/browse/GIRAPH-77 Project: Giraph Issue Type: New Feature Reporter: Jakob Homan It would be nice if the coordinator worker had a web interface that showed progress, splits, etc. during job execution. Right now it would duplicate information currently being exposed through task status, but with the move to YARN, it will be a necessity. It would be great if we could do this in a modern way to avoid the screen-scraping, etc. currently used to get information from most other Hadoop project's web interfaces. The coordinator could announce its address at the beginning or via status updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira