Re: Is Hadoop compatiable with IBM JDK 1.5 64 bit for AIX 5?

2008-07-18 Thread Colin Freas
I'm not sure if this is useful info, but I used both the Sun and the IBM JDK under Linux to run version 0.16.iForget of Hadoop, without any problems. I did some brief performance testing, didn't see any significant difference, then we switched over to the Sun JDK exclusively as per the

Re: Input/Output Formaters and FileTypes

2008-06-20 Thread Colin Freas
We'd been using text input and output exclusively, but eventually realized some efficiency improvements by using slightly more sophisticated classes specific to our application. Our main use of Hadoop is processing activity logs from a fleet of servers. We get about 6GB of compressed data per

Re: Stack Overflow When Running Job

2008-06-09 Thread Colin Freas
We were getting this exact same problem in a really simple MR job, on input produced from a known-working MR job. It seemed to happen intermittently, and we couldn't figure out what was up. In the end we solved the problem by increasing the number of maps (80 to 200, this is a 6 node, 12 code

Simple question: call collect multiple times?

2008-06-09 Thread Colin Freas
Sorry if this is a dumb question, but in all my MR classes, I've only ever called collect once, and now I find myself wanting to call collect multiple times. Looking at the API it seems like there shouldn't be a problem with that, but I just wanted to make sure. (...and to seed Google with the

Re: Hadoop Distributed Virtualisation

2008-06-06 Thread Colin Freas
The MR jobs I'm performing are not CPU intensive, so I've always assumed that they're more IO bound. Maybe that's an exceptional situation, but I'm not really sure. A good motherboard with a local IO channel per disk, feeding individual cores, with memory partitioned up between them... and I've

hdfs injection node?

2008-04-16 Thread Colin Freas
I have a machine that stores a lot of the data I need to put into my cluster's HDFS. It's on the same private network as the nodes, but it isn't a node itself. What is the easiest way to have it be able to directly inject the data files into HDFS, without it acting as a datanode for replicas? I

Re: Formatting the file system: Misleading hint in Wiki?

2008-04-10 Thread Colin Freas
This has been my experience as well. This should be mentioned in the Getting Started pages until resolved. -colin On Thu, Apr 10, 2008 at 10:54 AM, Michaela Buergle [EMAIL PROTECTED] wrote: Hi all, on http://wiki.apache.org/hadoop/GettingStartedWithHadoop - it says: Do not format a

incorrect data check

2008-04-08 Thread Colin Freas
running a job on my 5 node cluster, i get these intermittent exceptions in my logs: java.io.IOException: incorrect data check at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native Method) at

Re: incorrect data check

2008-04-08 Thread Colin Freas
causing my jobs to fail, rather than skipping the problematic input files. i've also looked through the conf file and don't see anything similar about skipping bad files without killing the job. -colin On Tue, Apr 8, 2008 at 11:53 AM, Colin Freas [EMAIL PROTECTED] wrote: running a job on my 5

Performance impact of underlying file system?

2008-04-01 Thread Colin Freas
Is the performance of Hadoop impacted by the underlying file system on the nodes at all? All my nodes are ext3. I'm wondering if using XFS, Reiser, or ZFS might improve performance. Does anyone have any offhand knowledge about this? -Colin

reduce task hanging or just slow?

2008-03-31 Thread Colin Freas
I've set up a job to run on my small 4 (sometimes 5) node cluster on dual processor server boxes with 2-8GB of memory. My job processes 24 100-300MB files that are a days worth of logs, total data is about 6GB. I've modified the word count example to do what I need, and it works fine on small

Re: reduce task hanging or just slow?

2008-03-31 Thread Colin Freas
I believe that this is exactly what happened. I'm not sure exactly what happened, but the networking stack on the master node was all screwed up somehow. All the machines serve double duty as development boxes, and they're on two different networks. The master node could contact the cluster

nfs mount hadoop-site?

2008-03-27 Thread Colin Freas
are there any issues with having the hadoop-site.xml in .../conf placed on an nfs mounted dir that all my nodes have access to? -colin

Re: MapReduce with related data from disparate files

2008-03-25 Thread Colin Freas
method, for each key, you do necessary processing on the collection based on the value object types. The main point here is to keep track of the differences from the beginning to the end, and process them accordingly. Nathan -Original Message- From: Colin Freas [mailto:[EMAIL PROTECTED

NFS mounted home, host RSA keys, localhost, strict sshds and bad mojo.

2008-03-21 Thread Colin Freas
i'm working to set up a cluster across several machines where users' home dirs are on an nfs mount. i setup key authentication for the hadoop user, install all the software on one node, get everything running, and move on to another node. once there, however, my sshd complains because the host

Re: NFS mounted home, host RSA keys, localhost, strict sshds and bad mojo.

2008-03-21 Thread Colin Freas
ah, yes. that worked. thanks! On Fri, Mar 21, 2008 at 12:48 PM, Natarajan, Senthil [EMAIL PROTECTED] wrote: I guess the following file might have localhost entry, change to hostname HADOOP_INSTALL/conf/masters HADOOP_INSTALL/conf/slaves -Original Message- From: Colin Freas

Re: Master as DataNode

2008-03-21 Thread Colin Freas
: Colin Freas [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 10:40 AM To: core-user@hadoop.apache.org Subject: Master as DataNode setting up a simple hadoop cluster with two machines, i've gotten to the point where the two machines can see each other, things seem fine, but i'm

Re: Master as DataNode

2008-03-21 Thread Colin Freas
[EMAIL PROTECTED] wrote: Check your logs. That should work out of the box with the configuration steps you described. Jeff -Original Message- From: Colin Freas [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 10:40 AM To: core-user@hadoop.apache.org