I'm not sure if this is useful info, but I used both the Sun and the IBM JDK
under Linux to run version 0.16.iForget of Hadoop, without any problems. I
did some brief performance testing, didn't see any significant difference,
then we switched over to the Sun JDK exclusively as per the
We'd been using text input and output exclusively, but eventually realized
some efficiency improvements by using slightly more sophisticated classes
specific to our application.
Our main use of Hadoop is processing activity logs from a fleet of servers.
We get about 6GB of compressed data per
We were getting this exact same problem in a really simple MR job, on input
produced from a known-working MR job.
It seemed to happen intermittently, and we couldn't figure out what was up.
In the end we solved the problem by increasing the number of maps (80 to
200, this is a 6 node, 12 code
Sorry if this is a dumb question, but in all my MR classes, I've only ever
called collect once, and now I find myself wanting to call collect multiple
times. Looking at the API it seems like there shouldn't be a problem with
that, but I just wanted to make sure. (...and to seed Google with the
The MR jobs I'm performing are not CPU intensive, so I've always assumed
that they're more IO bound. Maybe that's an exceptional situation, but I'm
not really sure.
A good motherboard with a local IO channel per disk, feeding individual
cores, with memory partitioned up between them... and I've
I have a machine that stores a lot of the data I need to put into my
cluster's HDFS. It's on the same private network as the nodes, but it isn't
a node itself.
What is the easiest way to have it be able to directly inject the data files
into HDFS, without it acting as a datanode for replicas?
I
This has been my experience as well. This should be mentioned in the
Getting Started pages until resolved.
-colin
On Thu, Apr 10, 2008 at 10:54 AM, Michaela Buergle
[EMAIL PROTECTED] wrote:
Hi all,
on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says:
Do not format a
running a job on my 5 node cluster, i get these intermittent exceptions in
my logs:
java.io.IOException: incorrect data check
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
at
causing my jobs to fail,
rather than skipping the problematic input files. i've also looked through
the conf file and don't see anything similar about skipping bad files
without killing the job.
-colin
On Tue, Apr 8, 2008 at 11:53 AM, Colin Freas [EMAIL PROTECTED] wrote:
running a job on my 5
Is the performance of Hadoop impacted by the underlying file system on the
nodes at all?
All my nodes are ext3. I'm wondering if using XFS, Reiser, or ZFS might
improve performance.
Does anyone have any offhand knowledge about this?
-Colin
I've set up a job to run on my small 4 (sometimes 5) node cluster on dual
processor server boxes with 2-8GB of memory.
My job processes 24 100-300MB files that are a days worth of logs, total
data is about 6GB.
I've modified the word count example to do what I need, and it works fine on
small
I believe that this is exactly what happened.
I'm not sure exactly what happened, but the networking stack on the master
node was all screwed up somehow. All the machines serve double duty as
development boxes, and they're on two different networks. The master node
could contact the cluster
are there any issues with having the hadoop-site.xml in .../conf placed on
an nfs mounted dir that all my nodes have access to?
-colin
method, for each key, you do
necessary processing on the collection based on the value object types.
The main point here is to keep track of the differences from the
beginning to the end, and process them accordingly.
Nathan
-Original Message-
From: Colin Freas [mailto:[EMAIL PROTECTED
i'm working to set up a cluster across several machines where users' home
dirs are on an nfs mount.
i setup key authentication for the hadoop user, install all the software on
one node, get everything running, and move on to another node.
once there, however, my sshd complains because the host
ah, yes. that worked. thanks!
On Fri, Mar 21, 2008 at 12:48 PM, Natarajan, Senthil [EMAIL PROTECTED]
wrote:
I guess the following file might have localhost entry, change to hostname
HADOOP_INSTALL/conf/masters
HADOOP_INSTALL/conf/slaves
-Original Message-
From: Colin Freas
: Colin Freas [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 10:40 AM
To: core-user@hadoop.apache.org
Subject: Master as DataNode
setting up a simple hadoop cluster with two machines, i've gotten to the
point where the two machines can see each other, things seem fine, but
i'm
[EMAIL PROTECTED]
wrote:
Check your logs. That should work out of the box with the
configuration
steps you described.
Jeff
-Original Message-
From: Colin Freas [mailto:[EMAIL PROTECTED]
Sent: Friday, March 21, 2008 10:40 AM
To: core-user@hadoop.apache.org
18 matches
Mail list logo