[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246478#comment-13246478
 ] 

Jakob Homan commented on GIRAPH-168:


My understanding was that the RPC changes FB had made were backports of changes 
that are in later versions, so I'm not sure if OldRPC is the correct 
description.  Also, within the Hadoop world there's not really talk of old 
versus new RPC (except for the PB-based stuff, which will make this really 
confusing...).  Hadoop security is API-incompatible with Hadoop non-security 
(due to changes in UGI) and FB's distro is insecure and API incompatible due to 
new APIs backported from more modern versions.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246489#comment-13246489
 ] 

Jakob Homan commented on GIRAPH-153:


Sounds good to me as well.  I'm fine with devs having to build/test against 
this subproject/module; this ensures we don't get out of synch with our 
adapters.  My mail goal is to make sure anyone wanting just Giraph doesn't need 
the hbase/accumulo stuff and it sounds like this does that.  Thanks for the 
hard work, Brian.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246496#comment-13246496
 ] 

Jakob Homan commented on GIRAPH-168:


bq. except for the PB-based stuf
Where PB = ProtocolBuffers and != FB because this isn't quite confusing enough.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246561#comment-13246561
 ] 

Avery Ching commented on GIRAPH-77:
---

Paolo, would you be interested in working on this? =)

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Paolo Castagna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246581#comment-13246581
 ] 

Paolo Castagna commented on GIRAPH-77:
--

Hi Avery, I am still learning and stepping into the Apache Giraph source code 
(fortunately, it isn't that big) :-)
Do you or Jakob have a favorite stack to do that? Jetty/Netty?, JAX-RS?, etc. 
Any specific web framework and/or template engine? Something small, something 
to minimize dependencies, ... I tend to use Jetty with plain servlets and 
Velocity. But I am open to suggestions.

Ideally, we could/should publish JSON and render HTML pages client side (once 
again, I accept suggestions on JavaScript frameworks).
I must warn you though, I am not a web|graphic designer (and I know my limits 
on the UI front). But, once the basic functionalities are in place and the 
correct data is available, I am sure some good web designer will fix that up.

Coming back to your question, with some guidance, yes. 
I would like to give it a shot and I have time to dedicate to Apache Giraph.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246594#comment-13246594
 ] 

Jakob Homan commented on GIRAPH-77:
---

bq. Do you or Jakob have a favorite stack to do that?
Nope.  My code was using Scalatra as a learning exercise (and a trojan horse to 
get Scala into the project) and I was liking it a lot.  That may be worth 
taking a look at.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Paolo Castagna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246604#comment-13246604
 ] 

Paolo Castagna commented on GIRAPH-77:
--

Ok, I'll look at that tomorrow (our CTO likes Sinatra ;-)). At least Scala 
integrate seamlessly with Java (fingers crossed... and I need to double check 
dependencies and side effects on the Maven front). Where is your code? Have you 
already started on this?

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Giraph as Whirr service, see WHIRR-530

2012-04-04 Thread Avery Ching

I don't use Whirr...I haven't heard it mentioned on this forum yet.  Anyone?

Avery

On 4/4/12 9:30 PM, Paolo Castagna wrote:

Hi,
seen this?

   WHIRR-530 - Add Giraph as a service
   https://issues.apache.org/jira/browse/WHIRR-530

This could be quite useful for users who want to give Giraph a spin on cloud
infrastructure, just for testing or to run a few small experiments.
My experience with Whirr an small 10-20 nodes clusters has be quite positive.
Less so for larger clusters, but it more a problem/limit with the cloud
provider rather than Whirr itself. I think.

Whirr makes extremely easy and pleasant deploy stuff on-demand.

... and Whirr already supports YARN:
https://issues.apache.org/jira/browse/WHIRR-391

Is any Giraph developers/users here also a Whirr user?

Paolo




Re: On helping new contributors pitch in quickly...

2012-04-04 Thread Jakob Homan
Ack!, I suck.  Sorry.  I hadn't realized we'd gone through most of
them, which itself is a good thing.  I'll get some new ones added
first thing in the morning.  Sorry.
-Jakob


On Wed, Apr 4, 2012 at 9:45 PM, Paolo Castagna
castagna.li...@googlemail.com wrote:
 
 To help new contributors pitch in quickly, we maintain a set of JIRAs [1] that
 focus on getting new contributors started with the mechanics of generating a
 patch — downloading the source, changing a couple lines, creating a patch,
 verifying its correctness, uploading it to JIRA and working with the 
 community —
 rather that deep technical issues within Giraph itself. These are good issues
 with which to join the community.
 

 This is nice, good idea indeed.

 Put more issues there (even if, at the moment, there does not seems to be much
 simple stuff that will get people started around). Things such as port 
 Giraph
 to YARN or a new RPC layer are a bit scary for those just starting (like 
 me). :-)

 Perhaps, another option is to increase number of examples. You already have a
 few interesting one, do you have one or two ideas on a couple of examples 
 which
 could be added to Giraph?

 Paolo

  [1] http://bit.ly/newbie_apache_giraph_issues


Re: Giraph as Whirr service, see WHIRR-530

2012-04-04 Thread Jakob Homan
This is interesting.  Whirr can already spin up Hadoop MR clusters,
which can then run the Giraph jobs.  Once Giraph is bootstrapped onto
YARN, this will make more sense as a Whirr service.

On Wed, Apr 4, 2012 at 9:43 PM, Avery Ching ach...@apache.org wrote:
 I don't use Whirr...I haven't heard it mentioned on this forum yet.  Anyone?

 Avery


 On 4/4/12 9:30 PM, Paolo Castagna wrote:

 Hi,
 seen this?

   WHIRR-530 - Add Giraph as a service
   https://issues.apache.org/jira/browse/WHIRR-530

 This could be quite useful for users who want to give Giraph a spin on
 cloud
 infrastructure, just for testing or to run a few small experiments.
 My experience with Whirr an small 10-20 nodes clusters has be quite
 positive.
 Less so for larger clusters, but it more a problem/limit with the cloud
 provider rather than Whirr itself. I think.

 Whirr makes extremely easy and pleasant deploy stuff on-demand.

 ... and Whirr already supports YARN:
 https://issues.apache.org/jira/browse/WHIRR-391

 Is any Giraph developers/users here also a Whirr user?

 Paolo




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247019#comment-13247019
 ] 

Jakob Homan commented on GIRAPH-77:
---

The code I've got is a bunch of messing with Scalatra and a few lines to bring 
in a new server per worker, but it's probably gone out of date.  It's not worth 
your time really.  I've got experience with integrating Scala into Java 
projects via Maven.  Let me spin up a quick patch to demonstrate that, probably 
in the next day or so.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira