Re: Distributed Clusters

2010-04-08 Thread Ravi Phulari
Hello James, I am new to this group, and relatively new to hadoop. Welcome to the group!! I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? As well, has anyone had experience with a cluster spanning multiple

Berlin Buzzwords - early registration extended

2010-04-08 Thread Isabel Drost
Hello, we would like to invite everyone interested in data storage, analysis and search to join us for two days on June 7/8th in Berlin for an in-depth, technical, developer-focused conference located in the heart of Europe. Presentations will range from beginner friendly introductions on

Hadoop overhead

2010-04-08 Thread Aleksandar Stupar
Hi all, As I realize hadoop is mainly used for tasks that take long time to execute. I'm considering to use hadoop for task whose lower bound in distributed execution is like 5 to 10 seconds. Am wondering what would the overhead be with using hadoop. Does anyone have an idea? Any link where I

Re: Hadoop overhead

2010-04-08 Thread Jeff Zhang
By default, for each task hadoop will create a new jvm process which will be the major cost in my opinion. You can customize configuration to let tasktracker reuse the jvm to eliminate the overhead to some extend. On Thu, Apr 8, 2010 at 8:55 PM, Aleksandar Stupar stupar.aleksan...@yahoo.com

Re: Hadoop overhead

2010-04-08 Thread Rajesh Balamohan
If its too many short duration jobs, you might want to keep an eye on jobtracker and tweak number of heartbeats processed per second outofbandheartbeat option. JobTracker might be bombarded with events otherwise. On Thu, Apr 8, 2010 at 8:07 PM, Jeff Zhang zjf...@gmail.com wrote: By default,

Re: Hadoop overhead

2010-04-08 Thread Patrick Angeles
Packaging the job and config and sending it to the JobTracker and various nodes also adds a few seconds overhead. On Thu, Apr 8, 2010 at 10:37 AM, Jeff Zhang zjf...@gmail.com wrote: By default, for each task hadoop will create a new jvm process which will be the major cost in my opinion. You

Re: Hadoop overhead

2010-04-08 Thread Edward Capriolo
On Thu, Apr 8, 2010 at 10:51 AM, Patrick Angeles patr...@cloudera.comwrote: Packaging the job and config and sending it to the JobTracker and various nodes also adds a few seconds overhead. On Thu, Apr 8, 2010 at 10:37 AM, Jeff Zhang zjf...@gmail.com wrote: By default, for each task hadoop

Network problems Hadoop 0.20.2 and Terasort on Debian 2.6.32 kernel

2010-04-08 Thread stephen mulcahy
Hi, I'm commissioning a new Hadoop cluster with the following spec. 45 x data nodes: - 2 x Quad-Core AMD Opteron(tm) Processor 2378 - 16GB ram - 4 x WDC WD1002FBYS 1TB SATA drives (configured as separate ext4 filesystems) 3 x name nodes: - 2 x Quad-Core AMD Opteron(tm) Processor 2378 - 32GB

Re: Errors reading lzo-compressed files from Hadoop

2010-04-08 Thread Todd Lipcon
Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today. (Thanks to Kevin and Ted Yu for pointing out the additional problems) -Todd On Wed, Apr 7, 2010 at 5:24 PM, Todd Lipcon t...@cloudera.com wrote: For Dmitriy and anyone else who

Re: Errors reading lzo-compressed files from Hadoop

2010-04-08 Thread Todd Lipcon
OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know! Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd Lipcon t...@cloudera.com wrote: Doh, a couple more silly bugs in there. Don't use that version quite yet - I'll put up a better patch later today.

Re: Distributed Clusters

2010-04-08 Thread Allen Wittenauer
On Apr 7, 2010, at 10:50 PM, James Seigel wrote: I am new to this group, and relatively new to hadoop. Welcome to the community, James. :) I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? Take a look at

HOD: JobTracker failed to initialise

2010-04-08 Thread Boyu Zhang
Dear All, I am trying to install HOD on a cluster. When I tried to allocate a new Hadoop cluster, I got the following error: [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be allocated because of the following errors. Hodring at n0 failed with following errors: JobTracker

Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address

2010-04-08 Thread Boyu Zhang
Hi Kevin, I am having the same error, but my critical error is: [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be allocated because of the following errors. Hodring at n0 failed with following errors: JobTracker failed to initialise Have you solved this? Thanks! Boyu On

Re: Errors reading lzo-compressed files from Hadoop

2010-04-08 Thread Dmitriy Ryaboy
Both Kevin's and Todd's branches now pass my tests. Thanks again Todd. -D On Thu, Apr 8, 2010 at 10:46 AM, Todd Lipcon t...@cloudera.com wrote: OK, fixed, unit tests passing again. If anyone sees any more problems let one of us know! Thanks -Todd On Thu, Apr 8, 2010 at 10:39 AM, Todd

Re: Reduce gets struck at 99%

2010-04-08 Thread Eric Arenas
Yes Raghava, I have experience that issue before, and the solution that you mentioned also solved my issue (adding a context.progress or setcontext to tell the JT that my jobs are still running) regards Eric Arenas From: Raghava Mutharaju

Re: Reduce gets struck at 99%

2010-04-08 Thread prashant ullegaddi
Dear Raghava, I also faced this problem. It mostly happens if the computation for the data that reduce received is taking more time and is not able to finish within the default time-out 600s. You can also increase the time-out to ensure that all reduces complete by setting the property

Job report on JobTracker

2010-04-08 Thread Sanel Zukan
Hi all, I'm working on larger application that utilizes Hadoop for some crunching tasks and utilization is done via new job API (Job/Configuration). I've noticed how running/completed jobs are not visible on JobTracker web view nor are displayed via 'hadoop job -list all' when they are started

Re: Reduce gets struck at 99%

2010-04-08 Thread Raghava Mutharaju
Hi, Thank you Eric, Prashant and Greg. Although the timeout problem was resolved, reduce is getting stuck at 99%. As of now, it has been stuck there for about 3 hrs. That is too high a wait time for my task. Do you guys see any reason for this? Speculative execution is on by default

Re: Reduce gets struck at 99%

2010-04-08 Thread Gregory Lawrence
Hi, I have also experienced this problem. Have you tried speculative execution? Also, I have had jobs that took a long time for one mapper / reducer because of a record that was significantly larger than those contained in the other filesplits. Do you know if it always slows down for the same

RE: Distributed Clusters

2010-04-08 Thread Michael Segel
Is it better to build the 1000 node cluster in a single data center? yes. Do you back one of these things up to a second data center or a different 1000 node cluster? If you're building your cluster on the West Coast, yes, you had best concern yourself with Earthquakes,

Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address

2010-04-08 Thread Kevin Van Workum
On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Hi Kevin, I am having the same error, but my critical error is: [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be allocated because of the following errors. Hodring at n0 failed with following

Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address

2010-04-08 Thread Boyu Zhang
Thanks for the reply. I checked out my logs more and found out that sometimes the hdfs addr is the correct one. But in the jobtracker log, there is an error: file /data/mapredsys/zhang~~~/.info can only be replicated on 0 nodes instead of 1 ... DFS is not ready...

Re: Reduce gets struck at 99%

2010-04-08 Thread Raghava Mutharaju
Hi Ted, Thank you for all the suggestions. I went through the job tracker logs and I have attached the exceptions found in the logs. I found two exceptions 1) org.apache.hadoop.ipc.RemoteException: java.io.IOException: Could not complete write to file(DFS Client) 2)