All, I have gotten over all of the blocker configuration hurdles between EC2 and EMR and I am able to submit one of my jobs with success. Unfortunately, I am running into a weird issue with my actions where each one will take exactly 11 mins from start to finish even though the delved MR job is no where near that long (1 action is exactly 34 seconds and the other is 5 mins 21 seconds). I cannot guarantee that this is not an issue with overall resources on the node I am running the oozie instance on, I highly doubt it since this is on an m1.large spec, so I am curious if there are any changes to the site that might be able to flesh out what is causing this issue.
Alejandro: The actual setup and configuration of this is fairly straight forward so I am happy to write up a wiki on this if you guys have a specific wiki in mind. I am not sure many people are keen on using EMR as a persistent cluster (I assume most persistant clusters are setup across EC2 nodes) but I am actually very pleased with it so far since it greatly reduces the amount of initial setup required to spin up a cluster. -- Matt On Tue, Dec 18, 2012 at 5:48 PM, Robert Kanter <[email protected]> wrote: > Hi Matt, > > The oozie.service.ProxyUserService.proxyuser.hadoop.hosts and > oozie.service.ProxyUserService.proxyuser.hadoop.groups > properties are part of Oozie's configuration and would go in > oozie-site.xml. This lets you impersonate users on the Oozie side of > things. See > > http://oozie.apache.org/docs/3.3.0/AG_Install.html#User_ProxyUser_Configurationfor > more info. > > There's two similar properties for Hadoop that go into core-site.xml: > hadoop.proxyuser.oozie.hosts > and hadoop.proxyuser.oozie.groups > I think this is what you need to fix your error. See > http://hadoop.apache.org/docs/stable/Secure_Impersonation.html for more > info. > > - Robert > > > > On Tue, Dec 18, 2012 at 3:38 PM, Matt Goeke <[email protected]> > wrote: > > > All, > > > > Still working on getting Oozie 3.3 integrated with EMR with most of my > time > > so far spent resolving the security group config needed for VPC. The EC2 > > configuration was pretty simple but the main blocker right now is getting > > past the error below: > > > > Caused by: org.apache.hadoop.ipc.RemoteException: User: hadoop is not > > allowed to impersonate hadoop > > at org.apache.hadoop.ipc.Client.call(Client.java:1070) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) > > at $Proxy24.getProtocolVersion(Unknown Source) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) > > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) > > at > > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238) > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203) > > at > > > > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) > > at > > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) > > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > > at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) > > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) > > at > > > > > org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:411) > > at > > > > > org.apache.oozie.service.HadoopAccessorService$2.run(HadoopAccessorService.java:409) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > > at > > > > > org.apache.oozie.service.HadoopAccessorService.createFileSystem(HadoopAccessorService.java:409) > > ... 26 more > > > > I know this is usually related to not having the correct proxy configs in > > the core site but my current core site proxy configs are below (and I > have > > bounced both the NN and the JT since applying them): > > > > <property><name>dfs.permissions</name><value>false</value></property> > > > > > <property><name>oozie.service.ProxyUserService.proxyuser.hadoop.hosts</name><value>*</value></property> > > > > > <property><name>oozie.service.ProxyUserService.proxyuser.hadoop.groups</name><value>*</value></property> > > > > If I recall correctly this authorization is only checked at the the JT/NN > > level and therefore shouldn't need to be pushed to the core site on the > > slave machines right? Also, would there be any reason the wildcard would > be > > incompatible across hadoop distros (we are currently using 1.0.3 from > EMR)? > > Lastly, just for the sake of clarity is the proxy hosts config based on > the > > box submitting the oozie request (edge node) or based on the boxes > actually > > running the jobs (data/task nodes)? > > > > -- > > Matt > > > > > > On Thu, Dec 13, 2012 at 6:19 PM, Alejandro Abdelnur <[email protected] > > >wrote: > > > > > Matt, > > > > > > It is not matter of bundling native code or not. Officially we suppose > to > > > do source releases only. As convenience we could do binaries, but there > > are > > > discussions about that, if the could be signed or not. > > > > > > Regarding installing/running oozie in EC2. I never done it. Would you > > mind > > > writing up a wiki on it once you figure it out? > > > > > > Cheers > > > > > > > > > On Thu, Dec 13, 2012 at 4:02 PM, Matt Goeke <[email protected]> > > > wrote: > > > > > > > Thank you both for the follow-up. > > > > > > > > 2 other questions that pertain to this: > > > > 1) I don't remember any natives being required for Oozie so is there > a > > > > reason why we don't release with a -bin like most other apache > > projects? > > > > 2) Are there any issues I might expect to run into when trying to run > > > this > > > > on EC2 backed by EMR? > > > > > > > > -- > > > > Matt > > > > > > > > > > > > On Thu, Dec 13, 2012 at 5:48 PM, Alejandro Abdelnur < > [email protected] > > > > >wrote: > > > > > > > > > Matt, > > > > > > > > > > Apache Oozie release artifacts are sources only. The easiest way to > > > build > > > > > the TARBALL is: > > > > > > > > > > * install Maven > > > > > * run bin/mkdistro.sh -DskipTests > > > > > > > > > > Then follow the Quick Start instructions. > > > > > > > > > > I'll open a JIRA to add this to the docs. > > > > > > > > > > > > > > > On Thu, Dec 13, 2012 at 3:36 PM, Matt Goeke < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > All, > > > > > > > > > > > > I am falling back to Oozie 3.2 for now but can someone possibly > > > explain > > > > > how > > > > > > Oozie 3.3 is supposed to be configured? I was hoping to just > follow > > > the > > > > > > quick start guide but it seems like the packaging does not match > up > > > at > > > > > all. > > > > > > > > > > > > Trying to work through it I ended up downloading maven and > running > > a > > > > 'mvn > > > > > > install' on the folder which built some of the hadooplibs but I > am > > > > still > > > > > > missing all of the bin scripts. > > > > > > > > > > > > -- > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Alejandro > > > > > > > > > > > > > > > > > > > > > -- > > > Alejandro > > > > > >
