hi all,
is there anyone having experience with adding a new datanode into a
rack-aware cluster without restarting the namenode, in cdh4 distribution?
as it is said that adding a new datanode is a hot operation that can be
done when the cluster is online.
i tried that but it looked not working
Just FYI, you don't need to stop the job, update the host, and retry.
Just update the host while the job is running and it should retry and restart.
I had a similar issue with one of my node where the hosts file were
not updated. After the updated it has automatically resume the work...
JM
Hi,
I have enabled the fair scheduler and everything is set to default with
only few configuration changes. It is working fine and multiple users can
run queries simultaneously.
But I am not able to change the priority from *http://JobTracker
URL/scheduler* .
Priority column is coming as a
its not data loss, problem is caused that multipleoutputs do not work
with standard committer if you do not write into subdirectory of main
job output.
Hi
Thanks for the insights.
I noticed that these restarts of mappers were because in the shebang i had
Usr/env/bin instead of usr/env/bin python
Any clue of what was going on with reducers not starting but mappers being
executed again and again.
Probably a very naive question but i am newbie you
As harsh suggested, you might want to check the task logs on slaves (you
can do it though web UI by clicking on map/reduce task links) and see if
there are any exceptions .
On Wed, Nov 21, 2012 at 8:06 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi
Thanks for the insights.
I noticed that
Thanks Radim.
Yes, as you said we are not writing into sub-directory of main job. I will
try by making them as sub-directories of output dir.
But one question, when I turn of speculative execution then it is working
fine with same multiple output directory structure. May I know, how exactly
it
Dne 21.11.2012 16:07, AnilKumar B napsal(a):
Thanks Radim.
Yes, as you said we are not writing into sub-directory of main job. I
will try by making them as sub-directories of output dir.
But one question, when I turn of speculative execution then it is
working fine with same multiple output
this is another problem with fileoutputformat committer, its related to
your.
https://issues.apache.org/jira/browse/MAPREDUCE-3772
it works like this: if multipleoutput is relative to job output, then
there is a workaround to make it work with commiter and outputs from
multiple tasks do not
Guys,
I've read that increasing above (default 4kb) number to, say 128kb, might speed
things up.
My input is 40mln serialised records coming from RDMS and I noticed that with
increased IO my job actually runs a tiny bit slower. Is that possible?
p.s. got two questions:
1. During Sqoop import
By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data...
say average transaction occurring...
Jamal,
This is what I am using...
After you start your job, visit jobtracker's WebUI ip-address:50030
And look for Cluster summary. Reduce Task Capacity shall hint you what
optimally set your number to. I could be wrong but it works for me. :)
Cluster Summary (Heap Size is *** MB/966.69 MB)
Hi Sasha
In general the number of reduce tasks is chosen mainly based on the data volume
to reduce phase. In tools like hive and pig by default for every 1GB of map
output there will be a reducer. So if you have 100 gigs of map output then 100
reducers.
If your tasks are more CPU intensive
Bejoy,
I've read somethere about keeping number of mapred.reduce.tasks below the
reduce task capcity. Here is what I just tested:
Output 25Gb. 8DN cluster with 16 Map and Reduce Task Capacity:
1 Reducer - 22mins
4 Reducers - 11.5mins
8 Reducers - 5mins
10 Reducers - 7mins
12 Reducers -
Hello Jamal,
I use a different approach based on the no of cores. If you have, say a
4 cores machine then you can have (0.75*no cores)no. of MR slots.
For example, if you have 4 physical cores OR 8 virtual cores then you can
have 0.75*8=6 MR slots. You can then set 3M+3R or 4M+2R and so on as
Hi Amit,
There is a mention here to Start in the hadoop-20 parent path :
https://github.com/facebook/hadoop-20/wiki/Corona-Single-Node-Setup
Regards,
Rob
On Mon, Nov 12, 2012 at 8:01 AM, Amit Sela am...@infolinks.com wrote:
Hi everyone,
Anyone knows if the new corona tools (Facebook just
Hi Manoj
If you intend to calculate the number of reducers based on the input size, then
in your driver class you should get the size of the input dir in hdfs and say
you intended to give n bytes to a reducer then the number of reducers can be
computed as
Total input size/ bytes per reducer.
Hi,
When we run a MapReduce job, the logs are stored on all the tasktracker nodes.
Is there an easy way to agregate all those logs together and see them
in a single place instead of going to the tasks one by one and open
the file?
Thanks,
JM
Hi,
We had similar requirement and we built small Java application which gets
information about task nodes from Job Tracker and download logs into one
file using URLs of each task tracker.
For huge logs this becomes slow and time consuming.
Hope this helps.
Regards,
Dino Kečo
msn:
Thanks for the info.
I have quickly draft this bash script in case it can help someone...
You just neeed to make sure the IP inside is replaced.
To call it, you need to give the job task page.
./showLogs.sh
http://192.168.23.7:50030/jobtasks.jsp?jobid=job_201211211408_0001type=mappagenum=1;
Hello Jamal,
For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.
Regards,
Mohammad Tariq
On Thu, Nov 22, 2012 at 1:20 AM,
Hi Jamal
It is performed at a frame work level map emits key value pairs and the
framework collects and groups all the values corresponding to a key from all
the map tasks. Now the reducer takes the input as a key and a collection of
values only. The reduce method signature defines it.
got it.
thanks for clarification
On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS bejoy.had...@gmail.com wrote:
**
Hi Jamal
It is performed at a frame work level map emits key value pairs and the
framework collects and groups all the values corresponding to a key from
all the map tasks. Now the
A better place to ask this at, is at the Pentaho's own community
http://wiki.pentaho.com/display/BAD/Pentaho+Big+Data+Community+Home.
At a glance, they have forums and IRC you could use to ask your
questions about their product.
On Wed, Nov 21, 2012 at 11:40 PM, suneel hadoop
IIRC, Facebook's own hadoop branch (Github: facebook/hadoop I guess),
does not support or carry any security features, which Apache Hadoop
0.20.203 - 1.1.x now carries. So out of the box, I expect it to be
incompatible with any of the recent Apache releases.
On Mon, Nov 12, 2012 at 9:31 PM, Amit
Thank you for the info Bejoy.
Cheers!
Manoj.
On Thu, Nov 22, 2012 at 12:04 AM, Bejoy KS bejoy.had...@gmail.com wrote:
**
Hi Manoj
If you intend to calculate the number of reducers based on the input size,
then in your driver class you should get the size of the input dir in hdfs
and say
thanks harsh any hints on how to give user.name in configuration files
for simple authentication,is that given as a property
On Wed, Nov 21, 2012 at 5:52 PM, Harsh J ha...@cloudera.com wrote:
Yes, see
http://hadoop.apache.org/docs/current/hadoop-auth/Configuration.html
and also see
start-all.sh will not carry any arguments to pass to nodes.
Start with start-dfs.sh
or start directly namenode with upgrade option. ./hadoop namenode -upgrade
Regards,
Uma
From: yogesh dhari [yogeshdh...@live.com]
Sent: Thursday, November 22, 2012 12:23 PM
28 matches
Mail list logo