Re: Securing the Secondary Name Node

2013-09-12 Thread Christopher Penney
Does anyone have any suggestions or resources I might look at to resolve this? The documentation on setting up Kerberos seems pretty light. Chris On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney cpen...@gmail.comwrote: Hi, After hosting an insecure Hadoop environment for early

Container allocation fails randomly

2013-09-12 Thread Krishna Kishore Bonagiri
Hi, I am using 2.1.0-beta and have seen container allocation failing randomly even when running the same application in a loop. I know that the cluster has enough resources to give, because it gave the resources for the same application all the other times in the loop and ran it successfully.

chaining (the output of) jobs/ reducers

2013-09-12 Thread Adrian CAPDEFIER
Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Chris Curtin
If you want to stay in Java look at Cascading. Pig is also helpful. I think there are other (Spring integration maybe?) but I'm not familiar with them enough to make a recommendation. Note that with Cascading and Pig you don't write 'map reduce' you write logic and they map it to the various

Re: Hadoop Metrics Issue in ganglia.

2013-09-12 Thread Artem Ervits
Check firewall and /etc/hosts also make sure hosts lines up with result of hostname -f command. Both hostname -f and hosts entries should have fqdn names. I use ambari to install my cluster, including ganglia metrics and I had identical issue. Once I corrected that it started working. Artem

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Adrian CAPDEFIER
Thank you, Chris. I will look at Cascading and Pig, but for starters I'd prefer to keep, if possible, everything as close to the hadoop libraries. I am sure I am overlooking something basic as repartitioning is a fairly common operation in MPP environments. On Thu, Sep 12, 2013 at 2:39 PM,

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Shahab Yunus
The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Bryan Beaudreault
It really comes down to the following: In Job A set mapred.output.dir to some directory X. In Job B set mapred.input.dir to the same directory X. For Job A, do context.write() as normally, and each reducer will create an output file in mapred.output.dir. Then in Job B each of those will

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Marco Shaw
Hi I don't this is the approprite place to discuss this.  This list should be a vendor-neutral service.  You arr encouraged to do your own research or look through the popular search engines for others who may have already done such an

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Adrian CAPDEFIER
Thanks Bryan. Yes, I am using hadoop + hdfs. If I understand your point, hadoop tries to start the mapping processes on nodes where the data is local and if that's not possible, then it is hdfs that replicates the data to the mapper nodes? I expected to have to set up this in the code and I

Re: Hadoop Metrics Issue in ganglia.

2013-09-12 Thread Yusaku Sako
I've had similar issues in the past and what I had to do was to list out the FQDNs first before the shortnames in /etc/hosts. Like so: 192.168.126.129 master.bigmix.com master loghost 192.168.126.130 clone1.bigmix.com clone1 192.168.126.133 clone2.bigmix.com clone2 Yusaku On Thu, Sep 12, 2013

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Suresh Srinivas
Raj, You can also use Apache Hadoop releases. Bigtop does fine job as well putting together consumable Hadoop stack. As regards to vendor solutions, this is not the right forum. There are other forums for this. Please refrain from this type of discussions on Apache forum. Regards, Suresh On

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Venkata K Pisupat
Cascading would a good option in case you have a complex flow. However, in your case, you are trying to chain two jobs only. I would suggest you to follow these steps. 1. The output directory of Job1 would be set at the input directory for Job2. 2. Launch Job1 using the new API. In launcher

Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Hadoop Raj
Hi, We are trying to evaluate different implementations of Hadoop for our big data enterprise project. Can the forum members advise on what are the advantages and disadvantages of each implementation i.e. Cloudera Vs Hortonworks Vs MapR. Thanks in advance. Regards, Raj

HDFS Federation and multiple clusters

2013-09-12 Thread LAT
Reading the HDFS Federation documentation, it seems that it provides support for multiple NameNodes in a single cluster. The DataNodes are shared across all NameNodes. It seems one can take a single hadoop cluster, and add HDFS Federation, but I do not see any way to take multiple hadoop

Error in InitializingSharedEdits in NameNode HA

2013-09-12 Thread Sathish Kumar
Hi All, We are trying to set Namenode HA, we are getting the below errors. Can anyone put a hand to slove this. 013-09-12 18:20:53,771 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Operation

RE: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Smith, Joshua D.
Cloudera has the widest distribution and distinguishes itself with Cloudera Impala, Cloudera Search and Sentry (all open source). It also comes with Cloudera Manager which is proprietary, but free for selected functionality. Hortonworks distinguishes itself as being pure open source (no

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-12 Thread Xuri Nagarin
I understand it can be contentious issue especially given that a lot of contributors to this list work for one or the other vendor or have some stake in any kind of evaluation. But, I see no reason why users should not be able to compare notes and share experiences. Over time, genuine pain points