Hello James,
I am new to this group, and relatively new to hadoop.
Welcome to the group!!
I am looking at building a large cluster. I was wondering if anyone has any
best practices for a cluster in the hundreds of nodes? As well, has anyone had
experience with a cluster spanning multiple
Hello,
we would like to invite everyone interested in data storage, analysis and
search
to join us for two days on June 7/8th in Berlin for an in-depth, technical,
developer-focused conference located in the heart of Europe. Presentations will
range from beginner friendly introductions on
Hi all,
As I realize hadoop is mainly used for tasks that take long
time to execute. I'm considering to use hadoop for task
whose lower bound in distributed execution is like 5 to 10
seconds. Am wondering what would the overhead be with
using hadoop.
Does anyone have an idea? Any link where I
By default, for each task hadoop will create a new jvm process which will be
the major cost in my opinion. You can customize configuration to let
tasktracker reuse the jvm to eliminate the overhead to some extend.
On Thu, Apr 8, 2010 at 8:55 PM, Aleksandar Stupar
stupar.aleksan...@yahoo.com
If its too many short duration jobs, you might want to keep an eye on
jobtracker and tweak number of heartbeats processed per second
outofbandheartbeat option. JobTracker might be bombarded with events
otherwise.
On Thu, Apr 8, 2010 at 8:07 PM, Jeff Zhang zjf...@gmail.com wrote:
By default,
Packaging the job and config and sending it to the JobTracker and various
nodes also adds a few seconds overhead.
On Thu, Apr 8, 2010 at 10:37 AM, Jeff Zhang zjf...@gmail.com wrote:
By default, for each task hadoop will create a new jvm process which will
be
the major cost in my opinion. You
On Thu, Apr 8, 2010 at 10:51 AM, Patrick Angeles patr...@cloudera.comwrote:
Packaging the job and config and sending it to the JobTracker and various
nodes also adds a few seconds overhead.
On Thu, Apr 8, 2010 at 10:37 AM, Jeff Zhang zjf...@gmail.com wrote:
By default, for each task hadoop
Hi,
I'm commissioning a new Hadoop cluster with the following spec.
45 x data nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 16GB ram
- 4 x WDC WD1002FBYS 1TB SATA drives (configured as separate ext4
filesystems)
3 x name nodes:
- 2 x Quad-Core AMD Opteron(tm) Processor 2378
- 32GB
Doh, a couple more silly bugs in there. Don't use that version quite yet -
I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for
pointing out the additional problems)
-Todd
On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote:
For Dmitriy and anyone else who
OK, fixed, unit tests passing again. If anyone sees any more problems let
one of us know!
Thanks
-Todd
On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote:
Doh, a couple more silly bugs in there. Don't use that version quite yet -
I'll put up a better patch later today.
On Apr 7, 2010, at 10:50 PM, James Seigel wrote:
I am new to this group, and relatively new to hadoop.
Welcome to the community, James. :)
I am looking at building a large cluster. I was wondering if anyone has any
best practices for a cluster in the hundreds of nodes?
Take a look at
Dear All,
I am trying to install HOD on a cluster. When I tried to allocate a new
Hadoop cluster, I got the following error:
[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker
Hi Kevin,
I am having the same error, but my critical error is:
[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following errors:
JobTracker failed to initialise
Have you solved this? Thanks!
Boyu
On
Both Kevin's and Todd's branches now pass my tests. Thanks again Todd.
-D
On Thu, Apr 8, 2010 at 10:46 AM, Todd Lipcon t...@cloudera.com wrote:
OK, fixed, unit tests passing again. If anyone sees any more problems let
one of us know!
Thanks
-Todd
On Thu, Apr 8, 2010 at 10:39 AM, Todd
Yes Raghava,
I have experience that issue before, and the solution that you mentioned also
solved my issue (adding a context.progress or setcontext to tell the JT that my
jobs are still running)
regards
Eric Arenas
From: Raghava Mutharaju
Dear Raghava,
I also faced this problem. It mostly happens if the computation for the data
that reduce received is taking more time
and is not able to finish within the default time-out 600s. You can also
increase the time-out to ensure that all reduces
complete by setting the property
Hi all,
I'm working on larger application that utilizes Hadoop for some
crunching tasks and utilization is
done via new job API (Job/Configuration). I've noticed how
running/completed jobs are not visible on
JobTracker web view nor are displayed via 'hadoop job -list all' when
they are started
Hi,
Thank you Eric, Prashant and Greg. Although the timeout problem was
resolved, reduce is getting stuck at 99%. As of now, it has been stuck there
for about 3 hrs. That is too high a wait time for my task. Do you guys see
any reason for this?
Speculative execution is on by default
Hi,
I have also experienced this problem. Have you tried speculative execution?
Also, I have had jobs that took a long time for one mapper / reducer because of
a record that was significantly larger than those contained in the other
filesplits. Do you know if it always slows down for the same
Is it better to build the 1000 node cluster in a single data center?
yes.
Do you back one of these things up to a second data center or a different
1000 node cluster?
If you're building your cluster on the West Coast, yes, you had best concern
yourself with Earthquakes,
On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang boyuzhan...@gmail.com wrote:
Hi Kevin,
I am having the same error, but my critical error is:
[2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
allocated because of the following errors.
Hodring at n0 failed with following
Thanks for the reply. I checked out my logs more and found out that
sometimes the hdfs addr is the correct one.
But in the jobtracker log, there is an error:
file /data/mapredsys/zhang~~~/.info can only be replicated on 0 nodes
instead of 1
...
DFS is not ready...
Hi Ted,
Thank you for all the suggestions. I went through the job tracker
logs and I have attached the exceptions found in the logs. I found two
exceptions
1) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not
complete write to file(DFS Client)
2)
23 matches
Mail list logo