Does anyone have any suggestions or resources I might look at to resolve
this? The documentation on setting up Kerberos seems pretty light.
Chris
On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney cpen...@gmail.comwrote:
Hi,
After hosting an insecure Hadoop environment for early
Hi,
I am using 2.1.0-beta and have seen container allocation failing randomly
even when running the same application in a loop. I know that the cluster
has enough resources to give, because it gave the resources for the same
application all the other times in the loop and ran it successfully.
Howdy,
My application requires 2 distinct processing steps (reducers) to be
performed on the input data. The first operation generates changes the key
values and, records that had different keys in step 1 can end up having the
same key in step 2.
The heavy lifting of the operation is in step1
If you want to stay in Java look at Cascading. Pig is also helpful. I think
there are other (Spring integration maybe?) but I'm not familiar with them
enough to make a recommendation.
Note that with Cascading and Pig you don't write 'map reduce' you write
logic and they map it to the various
Check firewall and /etc/hosts also make sure hosts lines up with result of
hostname -f command. Both hostname -f and hosts entries should have fqdn names.
I use ambari to install my cluster, including ganglia metrics and I had
identical issue. Once I corrected that it started working.
Artem
Thank you, Chris. I will look at Cascading and Pig, but for starters I'd
prefer to keep, if possible, everything as close to the hadoop libraries.
I am sure I am overlooking something basic as repartitioning is a fairly
common operation in MPP environments.
On Thu, Sep 12, 2013 at 2:39 PM,
The temporary file solution will work in a single node configuration, but
I'm not sure about an MPP config.
Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or
both jobs run on all 4 nodes - will HDFS be able to redistribute
automagically the records between nodes or does
It really comes down to the following:
In Job A set mapred.output.dir to some directory X.
In Job B set mapred.input.dir to the same directory X.
For Job A, do context.write() as normally, and each reducer will create an
output file in mapred.output.dir. Then in Job B each of those will
Hi
I don't this is the approprite place to discuss this.
This list should be a vendor-neutral service.
You arr encouraged to do your own research or look through the popular
search engines for others who may have already done such an
Thanks Bryan.
Yes, I am using hadoop + hdfs.
If I understand your point, hadoop tries to start the mapping processes on
nodes where the data is local and if that's not possible, then it is hdfs
that replicates the data to the mapper nodes?
I expected to have to set up this in the code and I
I've had similar issues in the past and what I had to do was to list out
the FQDNs first before the shortnames in /etc/hosts. Like so:
192.168.126.129 master.bigmix.com master loghost
192.168.126.130 clone1.bigmix.com clone1
192.168.126.133 clone2.bigmix.com clone2
Yusaku
On Thu, Sep 12, 2013
Raj,
You can also use Apache Hadoop releases. Bigtop does fine job as well
putting together consumable Hadoop stack.
As regards to vendor solutions, this is not the right forum. There are
other forums for this. Please refrain from this type of discussions on
Apache forum.
Regards,
Suresh
On
Cascading would a good option in case you have a complex flow. However, in your
case, you are trying to chain two jobs only. I would suggest you to follow
these steps.
1. The output directory of Job1 would be set at the input directory for Job2.
2. Launch Job1 using the new API. In launcher
Hi,
We are trying to evaluate different implementations of Hadoop for our big data
enterprise project.
Can the forum members advise on what are the advantages and disadvantages of
each implementation i.e. Cloudera Vs Hortonworks Vs MapR.
Thanks in advance.
Regards,
Raj
Reading the HDFS Federation documentation, it seems that it provides
support for multiple NameNodes in a single cluster. The DataNodes are
shared across all NameNodes. It seems one can take a single hadoop
cluster, and add HDFS Federation, but I do not see any way to take multiple
hadoop
Hi All,
We are trying to set Namenode HA, we are getting the below errors. Can
anyone put a hand to slove this.
013-09-12 18:20:53,771 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation
Cloudera has the widest distribution and distinguishes itself with Cloudera
Impala, Cloudera Search and Sentry (all open source). It also comes with
Cloudera Manager which is proprietary, but free for selected functionality.
Hortonworks distinguishes itself as being pure open source (no
I understand it can be contentious issue especially given that a lot of
contributors to this list work for one or the other vendor or have some
stake in any kind of evaluation. But, I see no reason why users should not
be able to compare notes and share experiences. Over time, genuine pain
points
18 matches
Mail list logo