Multiple HDFS clients

2009-05-01 Thread Usman Waheed
Hi, I just wanted to share a test we conducted in our small cluster of 3 datanodes and one namenode. Basically we have lots of data to process and we run a parsing script outside hadoop that creates the key,value pairs. This output which is plain txt files is then imported into hadoop

Re: Implementing compareTo in user-written keys where one extends the other is error prone

2009-05-01 Thread Marshall Schor
thanks for the tip. I'll look into it - it doesn't look too hard in my case to do. -Marshall Owen O'Malley wrote: If you use custom key types, you really should be defining a RawComparator. It will perform much much better. -- Owen

Sequence of Streaming Jobs

2009-05-01 Thread Dan Milstein
If I've got a sequence of streaming jobs, each of which depends on the output of the previous one, is there a good way to launch that sequence? Meaning, I want step B to only start once step A has finished. From within Java JobClient code, I can do submitJob/runJob, but is there any

Re: Multiple HDFS clients

2009-05-01 Thread Todd Lipcon
On Fri, May 1, 2009 at 4:22 AM, Usman Waheed usm...@opera.com wrote: Hi, I just wanted to share a test we conducted in our small cluster of 3 datanodes and one namenode. Basically we have lots of data to process and we run a parsing script outside hadoop that creates the key,value pairs.

Re: Multiple HDFS clients

2009-05-01 Thread Usman Waheed
Hi Todd, Thank You for your input. Our data is like any apache log file(s). Basic logging info which we are parsing. Our data is alot which is why we are using HADOOP :). I will look into running TT's on the hdfs clients just for job processing and not to store any data locally. We can

cannot open an hdfs file in O_RDWR mode

2009-05-01 Thread Robert Engel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I am using Hadoop on a small storage cluster (x86_64, CentOS 5.3, Hadoop-0.19.1). The hdfs is mounted using fuse and everything seemed to work just fine so far. However, I noticed that I cannot: 1) use svn to check out files on the

Re: unable to see anything in stdout

2009-05-01 Thread Asim
Thanks Aaron. That worked! However, when i run everything as local, I see everything executing much faster on local as compared to a single node. Is there any reason for the same? -Asim On Thu, Apr 30, 2009 at 9:23 AM, Aaron Kimball aa...@cloudera.com wrote: First thing I would do is to run the

Re: cannot open an hdfs file in O_RDWR mode

2009-05-01 Thread Philip Zeyliger
HDFS does not allow you to overwrite bytes of a file that have already been written. The only operations it supports are read (an existing file), write (a new file), and (in newer versions, not always enabled) append (to an existing file). -- Philip On Fri, May 1, 2009 at 5:56 PM, Robert Engel

Re: cannot open an hdfs file in O_RDWR mode

2009-05-01 Thread jason hadoop
In hadoop 0.19.1, (and 19.0) libhdfs (which is used by the fuse package for hdfs access) explicitly denies open requests that pass O_RDWR If you have binary applications that pass the flag, but would work correctly given the limitations of HDFS, you may alter the code in src/c++/libhdfs/hdfs.c to

Re: unable to see anything in stdout

2009-05-01 Thread jason hadoop
Less work by skipping setting up the input splits, distributing the job jar files, scheduling the map tasks on the task trackers, collecting the task status results, then starting all the reduce tasks, collecting all the results, sorting them, feeding them to the reduce tasks, then writing them to