Are there any tuning settings that can be adjusted to optimize for files of a
given size range?
There would be quite a few files in the 100kB to 2MB range, which are received
and processed daily, with smaller numbers ranging up to ~600MB or so which are
summarizations of many of the daily data
fyi, the src/contrib/ec2 scripts do just what Paco suggests.
minus the static IP stuff (you can use the scripts to login via
cluster name, and spawn a tunnel for browsing nodes)
that is, you can spawn any number of uniquely named, configured, and
sized clusters, and you can increase their
Hi,
when I look at the doc there's
-combiner JavaClassName
so does that mean that there is no stdin/stdout possibility to use a combiner?
I mean very often the reducer and the combiner are the same, so it's no work on
the programmer's part?
Is there a reason why we can't have
-combiner mycomb
Paco NATHAN wrote:
Hi Karl,
Rather than using separate key pairs, you can use EC2 security groups
to keep track of different clusters.
Effectively, that requires a new security group for every cluster --
so just allocate a bunch of different ones in a config file, then have
the launch scripts d
What files do you expect to be storing? Generally speaking, HDFS (Hadoop's
distributed file system) does not handle small files very efficiently.
Instead it's optimized for large files, upwards of 64MB each.
Alex
On Fri, Oct 24, 2008 at 9:41 AM, David C. Kerber <
[EMAIL PROTECTED]> wrote:
> Hi
Hi Karl,
Rather than using separate key pairs, you can use EC2 security groups
to keep track of different clusters.
Effectively, that requires a new security group for every cluster --
so just allocate a bunch of different ones in a config file, then have
the launch scripts draw from those. We al
Chukwa also could be used here.
On 10/24/08 11:47 AM, "Jeff Hammerbacher" <[EMAIL PROTECTED]> wrote:
Hey Edward,
The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.
Later,
Jeff
On Fri, Oct 24, 2008 at 10:
On 23-Oct-08, at 10:01 AM, Paco NATHAN wrote:
This workflow could be initiated from a crontab -- totally automated.
However, we still see occasional failures of the cluster, and must
restart manually, but not often. Stability for that has improved much
since the 0.18 release. For us, it's get
Hey Edward,
The application we used at Facebook to transmit new data is open
source now and available at
http://sourceforge.net/projects/scribeserver/.
Later,
Jeff
On Fri, Oct 24, 2008 at 10:14 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote:
> I came up with my line of thinking after reading this
I came up with my line of thinking after reading this article:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data
As a guy that was intrigued by the java coffee cup in 95, that now
lives as a data center/noc jock/unix guy. Lets say I look at a log
manageme
Hi -
I'm a complete newbie to hadoop, and am wondering if it's appropriate for
configuring a bunch of older machines that have no other use, for use as a
storage cluster on an otherwise windows network, so that my windows clients see
their combined disk space as a single large share?
If so, wi
How many maximum mappers and reducers did you configure?
It seems your TaskRunner fails to get response.
Maybe you need to try increasing "mapred.job.tracker.handler.count".
2008/10/22, Zhou, Yunqing <[EMAIL PROTECTED]>:
> Recently the tasks on our cluster random failed (both map tasks and reduce
Another way to do this is make thrift's java compiler generate REST bindings
like its php compiler does and there is also libhdfs and
http://wiki.apache.org/hadoop/MountableHDFS
On 10/23/08 2:54 PM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote:
I had downloaded thrift and ran the example appli
Thanks to such a helpful community, I have had several offers to
review my article and won't need any more volunteers.
I'll post a link to the article when it's published next week.
On Thu, Oct 23, 2008 at 5:31 PM, Tom Wheeler <[EMAIL PROTECTED]> wrote:
> Each month the developers at my company w
Paco NATHAN wrote:
What seems to be emerging here is a pattern for another special node
associated with a Hadoop cluster.
The need is to have a machine which can:
* handle setup and shutdown of a Hadoop cluster on remote server resources
* manage loading and retrieving data via a storage g
Hi everyone,
I have a problem about Hadoop startup.
I failed to startup the namenode and I got the following exception in the
namenode log file:
2008-10-23 21:54:51,223 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50070
2008-10-23 21:54:51,224 INFO org.mortbay.util.Cont
What seems to be emerging here is a pattern for another special node
associated with a Hadoop cluster.
The need is to have a machine which can:
* handle setup and shutdown of a Hadoop cluster on remote server resources
* manage loading and retrieving data via a storage grid
* interact and
woody zhou wrote:
Hi everyone,
I have a problem about Hadoop startup.
I failed to startup the namenode and I got the following exception in the
namenode log file:
2008-10-23 21:54:51,232 ERROR org.apache.hadoop.dfs.NameNode:
java.lang.OutOfMemoryError: unable to create new native thread
18 matches
Mail list logo