RE: Using hadoop as storage cluster?

2008-10-24 Thread David C. Kerber
Are there any tuning settings that can be adjusted to optimize for files of a given size range? There would be quite a few files in the 100kB to 2MB range, which are received and processed daily, with smaller numbers ranging up to ~600MB or so which are summarizations of many of the daily data

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Chris K Wensel
fyi, the src/contrib/ec2 scripts do just what Paco suggests. minus the static IP stuff (you can use the scripts to login via cluster name, and spawn a tunnel for browsing nodes) that is, you can spawn any number of uniquely named, configured, and sized clusters, and you can increase their

no combiner in Hadoop-streaming?

2008-10-24 Thread mark meissonnier
Hi, when I look at the doc there's -combiner JavaClassName so does that mean that there is no stdin/stdout possibility to use a combiner? I mean very often the reducer and the combiner are the same, so it's no work on the programmer's part? Is there a reason why we can't have -combiner mycomb

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Steve Loughran
Paco NATHAN wrote: Hi Karl, Rather than using separate key pairs, you can use EC2 security groups to keep track of different clusters. Effectively, that requires a new security group for every cluster -- so just allocate a bunch of different ones in a config file, then have the launch scripts d

Re: Using hadoop as storage cluster?

2008-10-24 Thread Alex Loddengaard
What files do you expect to be storing? Generally speaking, HDFS (Hadoop's distributed file system) does not handle small files very efficiently. Instead it's optimized for large files, upwards of 64MB each. Alex On Fri, Oct 24, 2008 at 9:41 AM, David C. Kerber < [EMAIL PROTECTED]> wrote: > Hi

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Paco NATHAN
Hi Karl, Rather than using separate key pairs, you can use EC2 security groups to keep track of different clusters. Effectively, that requires a new security group for every cluster -- so just allocate a bunch of different ones in a config file, then have the launch scripts draw from those. We al

Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Pete Wyckoff
Chukwa also could be used here. On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote: Hey Edward, The application we used at Facebook to transmit new data is open source now and available at http://sourceforge.net/projects/scribeserver/. Later, Jeff On Fri, Oct 24, 2008 at 10:

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Karl Anderson
On 23-Oct-08, at 10:01 AM, Paco NATHAN wrote: This workflow could be initiated from a crontab -- totally automated. However, we still see occasional failures of the cluster, and must restart manually, but not often. Stability for that has improved much since the 0.18 release. For us, it's get

Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Jeff Hammerbacher
Hey Edward, The application we used at Facebook to transmit new data is open source now and available at http://sourceforge.net/projects/scribeserver/. Later, Jeff On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > I came up with my line of thinking after reading this

Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Edward Capriolo
I came up with my line of thinking after reading this article: http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data As a guy that was intrigued by the java coffee cup in 95, that now lives as a data center/noc jock/unix guy. Lets say I look at a log manageme

Using hadoop as storage cluster?

2008-10-24 Thread David C. Kerber
Hi - I'm a complete newbie to hadoop, and am wondering if it's appropriate for configuring a bunch of older machines that have no other use, for use as a storage cluster on an otherwise windows network, so that my windows clients see their combined disk space as a single large share? If so, wi

Re: Task Random Fail

2008-10-24 Thread Mice
How many maximum mappers and reducers did you configure? It seems your TaskRunner fails to get response. Maybe you need to try increasing "mapred.job.tracker.handler.count". 2008/10/22, Zhou, Yunqing <[EMAIL PROTECTED]>: > Recently the tasks on our cluster random failed (both map tasks and reduce

Re: LHadoop Server simple Hadoop input and output

2008-10-24 Thread Pete Wyckoff
Another way to do this is make thrift's java compiler generate REST bindings like its php compiler does and there is also libhdfs and http://wiki.apache.org/hadoop/MountableHDFS On 10/23/08 2:54 PM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote: I had downloaded thrift and ran the example appli

Re: Seeking Someone to Review Hadoop Article

2008-10-24 Thread Tom Wheeler
Thanks to such a helpful community, I have had several offers to review my article and won't need any more volunteers. I'll post a link to the article when it's published next week. On Thu, Oct 23, 2008 at 5:31 PM, Tom Wheeler <[EMAIL PROTECTED]> wrote: > Each month the developers at my company w

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Steve Loughran
Paco NATHAN wrote: What seems to be emerging here is a pattern for another special node associated with a Hadoop cluster. The need is to have a machine which can: * handle setup and shutdown of a Hadoop cluster on remote server resources * manage loading and retrieving data via a storage g

HELP: Namenode Startup Failed with an OutofMemoryError

2008-10-24 Thread Yang Zhou
Hi everyone, I have a problem about Hadoop startup. I failed to startup the namenode and I got the following exception in the namenode log file: 2008-10-23 21:54:51,223 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-10-23 21:54:51,224 INFO org.mortbay.util.Cont

Re: Auto-shutdown for EC2 clusters

2008-10-24 Thread Paco NATHAN
What seems to be emerging here is a pattern for another special node associated with a Hadoop cluster. The need is to have a machine which can: * handle setup and shutdown of a Hadoop cluster on remote server resources * manage loading and retrieving data via a storage grid * interact and

Re: HELP: Namenode Startup Failed with an OutofMemoryError

2008-10-24 Thread Steve Loughran
woody zhou wrote: Hi everyone, I have a problem about Hadoop startup. I failed to startup the namenode and I got the following exception in the namenode log file: 2008-10-23 21:54:51,232 ERROR org.apache.hadoop.dfs.NameNode: java.lang.OutOfMemoryError: unable to create new native thread