Re: too many open files? Isn't 4K enough???

2008-11-21 Thread Yuri Pradkin
We created a Jira for this as well as provided a patch. Please see http://issues.apache.org/jira/browse/HADOOP-4614 I hope it'll make it into svn soon (it's been kind of slow lately). Are you able to create a reproducible setup for this? I haven't been able to. Yes we did see consistent

Re: too many open files? Isn't 4K enough???

2008-11-05 Thread Yuri Pradkin
On Wednesday 05 November 2008 15:27:34 Karl Anderson wrote: I am running into a similar issue. It seems to be affected by the number of simultaneous tasks. For me, while I generally allow up to 4 mappers per node, in this particular instance I had only one mapper reading from a single gzipped

too many open files? Isn't 4K enough???

2008-11-04 Thread Yuri Pradkin
Hi, I'm running current snapshot (-r709609), doing a simple word count using python over streaming. I'm have a relatively moderate setup of 17 nodes. I'm getting this exception: java.io.FileNotFoundException:

Re: extracting input to a task from a (streaming) job?

2008-09-26 Thread Yuri Pradkin
August 2008 10:09:48 Yuri Pradkin wrote: On Thursday 07 August 2008 16:43:10 John Heidemann wrote: On Thu, 07 Aug 2008 19:42:05 +0200, Leon Mergen wrote: Hello John, On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann [EMAIL PROTECTED] wrote: I have a large Hadoop streaming job that generally

IsolationRunner [was Re: extracting input to a task from a (streaming) job?]

2008-08-27 Thread Yuri Pradkin
Yuri Pradkin wrote: I believe you should set keep.failed.tasks.files to true -- this way, give a task id, you can see what input files it has in ~/ taskTracker/${taskid}/work (source: http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#Isolatio nR unner ) I forgot to add: I set

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Yuri Pradkin
On Thursday 21 August 2008 00:14:56 Gopal Gandhi wrote: I am using Hadoop streaming and I need to pass arguments to my map/reduce script. Because a map/reduce script is triggered by hadoop, like hadoop -file MAPPER -mapper $MAPPER -file REDUCER -reducer $REDUCER ... How can I pass

Re: extracting input to a task from a (streaming) job?

2008-08-08 Thread Yuri Pradkin
On Thursday 07 August 2008 16:43:10 John Heidemann wrote: On Thu, 07 Aug 2008 19:42:05 +0200, Leon Mergen wrote: Hello John, On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann [EMAIL PROTECTED] wrote: I have a large Hadoop streaming job that generally works fine, but a few (2-4) of the ~3000

Re: secondary namenode web interface

2008-04-08 Thread Yuri Pradkin
interface. The null pointer message in the secondary Namenode log is a harmless bug but should be fixed. It would be nice if you can open a JIRA for it. Thanks, Dhruba -Original Message- From: Yuri Pradkin [mailto:[EMAIL PROTECTED] Sent: Friday, April 04, 2008 2:45 PM To: core-user

Re: secondary namenode web interface

2008-04-08 Thread Yuri Pradkin
On Tuesday 08 April 2008 11:54:35 am Konstantin Shvachko wrote: If you have anything in mind that can be displayed on the UI please let us know. You can also find a jira for the issue, it would be good if this discussion is reflected in it. Well, I guess we could have interface to browse the

Re: one key per output part file

2008-04-03 Thread Yuri Pradkin
Here is how we (attempt to) do it: Reducer (in streaming) writes one file for each different key it receives as input. Here's some example code in perl: my $envdir = $ENV{'mapred_output_dir'}; my $fs = ($envdir =~ s/^file://); if ($fs) { #output goes onto NFS

secondary namenode web interface

2008-04-02 Thread Yuri Pradkin
Hi, I'm running Hadoop (latest snapshot) on several machines and in our setup namenode and secondarynamenode are on different systems. I see from the logs than secondary namenode regularly checkpoints fs from primary namenode. But when I go to the secondary namenode HTTP

key/value after reduce

2008-02-12 Thread Yuri Pradkin
Hi, I'm relatively new to Hadoop and I have what I hope is a simple question: I don't understand why the key/value assumption is preserved AFTER the reduce operation, in other words why the output of a reducer is expected as key,value instead of arbitrary, possibly binary bytes? Why can't

Re: key/value after reduce

2008-02-12 Thread Yuri Pradkin
. Miles On 12/02/2008, Yuri Pradkin [EMAIL PROTECTED] wrote: Hi, I'm relatively new to Hadoop and I have what I hope is a simple question: I don't understand why the key/value assumption is preserved AFTER the reduce operation, in other words why the output of a reducer is expected