Hi,
I just wanted to share a test we conducted in our small cluster of 3
datanodes and one namenode. Basically we have lots of data to process and
we run a parsing script outside hadoop that creates the key,value pairs.
This output which is plain txt files is then imported into hadoop
thanks for the tip. I'll look into it - it doesn't look too hard in my
case to do. -Marshall
Owen O'Malley wrote:
If you use custom key types, you really should be defining a
RawComparator. It will perform much much better.
-- Owen
If I've got a sequence of streaming jobs, each of which depends on the
output of the previous one, is there a good way to launch that
sequence? Meaning, I want step B to only start once step A has
finished.
From within Java JobClient code, I can do submitJob/runJob, but is
there any
On Fri, May 1, 2009 at 4:22 AM, Usman Waheed usm...@opera.com wrote:
Hi,
I just wanted to share a test we conducted in our small cluster of 3
datanodes and one namenode. Basically we have lots of data to process and we
run a parsing script outside hadoop that creates the key,value pairs.
Hi Todd,
Thank You for your input. Our data is like any apache log file(s). Basic
logging info which we are parsing.
Our data is alot which is why we are using HADOOP :).
I will look into running TT's on the hdfs clients just for job processing
and not to store any data locally. We can
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
I am using Hadoop on a small storage cluster (x86_64, CentOS 5.3,
Hadoop-0.19.1). The hdfs is mounted using fuse and everything seemed
to work just fine so far. However, I noticed that I cannot:
1) use svn to check out files on the
Thanks Aaron. That worked! However, when i run everything as local, I
see everything executing much faster on local as compared to a single
node. Is there any reason for the same?
-Asim
On Thu, Apr 30, 2009 at 9:23 AM, Aaron Kimball aa...@cloudera.com wrote:
First thing I would do is to run the
HDFS does not allow you to overwrite bytes of a file that have already been
written. The only operations it supports are read (an existing file), write
(a new file), and (in newer versions, not always enabled) append (to an
existing file).
-- Philip
On Fri, May 1, 2009 at 5:56 PM, Robert Engel
In hadoop 0.19.1, (and 19.0) libhdfs (which is used by the fuse package for
hdfs access) explicitly denies open requests that pass O_RDWR
If you have binary applications that pass the flag, but would work correctly
given the limitations of HDFS, you may alter the code in
src/c++/libhdfs/hdfs.c to
Less work by skipping setting up the input splits, distributing the job jar
files, scheduling the map tasks on the task trackers, collecting the task
status results, then starting all the reduce tasks, collecting all the
results, sorting them, feeding them to the reduce tasks, then writing them
to
10 matches
Mail list logo