Accessing stderr with Hadoop Streaming

2009-06-23 Thread S D
Is there a way to access stderr when using Hadoop Streaming? I see how stdout is written to the log files but I'm more concerned about what happens when errors occur. Access to stderr would help debug when a run doesn't complete successfully but I haven't been able to figure out how to retrieve wha

Re: Hadoop & Python

2009-05-20 Thread s d
Thanks, What would be the # of severs , file sizes that in their range the performance hit will be minor? I am concerned about implementing it all only to rewrite it later to scale economically. Thanks for all the information. On Tue, May 19, 2009 at 1:30 PM, Amr Awadallah wrote: > S d, >

Re: Hadoop & Python

2009-05-19 Thread s d
and small scale tests. On Tue, May 19, 2009 at 9:48 AM, Alex Loddengaard wrote: > Streaming is slightly slower than native Java jobs. Otherwise Python works > great in streaming. > > Alex > > On Tue, May 19, 2009 at 8:36 AM, s d wrote: > > > Hi, > > How robust

Hadoop & Python

2009-05-19 Thread s d
Hi, How robust is using hadoop with python over the streaming protocol? Any disadvantages (performance? flexibility?) ? It just strikes me that python is so much more convenient when it comes to deploying and crunching text files. Thanks,

Duplicate Output Directories in S3

2009-03-22 Thread S D
I have an Hadoop Streaming program that crawls the web for data items, processes each retrieved item and then stores the results on S3. For each processed item a directory on S3 is created to store the results produced by the processing. At the conclusion of a program run I've been getting a duplic

Re: Hadoop Streaming throw an exception with wget as the mapper

2009-03-13 Thread S D
I've used wget with Hadoop Streaming without any problems. Based on the error code you're getting, I suggest you make sure that you have the proper write permissions for the directory in which Hadoop will process (e.g., download, convert, ...) on each of the task tracker machines. The location wher

Re: Controlling maximum # of tasks per node on per-job basis?

2009-03-13 Thread S D
I ran into this problem as well and several people on this list provided a helpful response: once the tasktracker starts, the maximum number of tasks per node can not be changed. In my case, I've solved this challenge by stopping and starting mapred (stop-mapred.sh, start-mapred.sh) between jobs. T

Re: Hadoop FS shell no longer working with S3 Native

2009-03-04 Thread S D
My fault on this one. I mistakenly thought the environment variables (AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_ACCESS_KEY) would override values set in hadoop-site.xml; I now see that this is not the case for the Hadoop FS shell commands. John On Wed, Mar 4, 2009 at 5:18 PM, S D wrote: >

Hadoop FS shell no longer working with S3 Native

2009-03-04 Thread S D
I'm using Hadoop 0.19.0 with S3 Native. Up until a few days ago I was successfully able to use the various shell functions successfully; e.g., hadoop dfs -ls . To ensure access to my Amazon S3 Native data store I set the following environment variables: AMAZON_ACCESS_KEY_ID and AMAZON_SECRET_A

Re: Can anyone verify Hadoop FS shell command return codes?

2009-02-25 Thread S D
t; Instead, the output is again obtained with backticks. > > I don't know the way in which irb captures the return value: for analogy I > would say that backticks are used for capturing the output even in irb. > > Best > > Roldano > > > > On Mon, Feb 23, 2009 a

Can anyone verify Hadoop FS shell command return codes?

2009-02-23 Thread S D
I'm attempting to use Hadoop FS shell (http://hadoop .apache.org/core/docs/current/hdfs_shell.html) within a ruby script. My challenge is that I'm unable to get the function return value of the commands I'm invoking. As an example, I try to run get as follows hadoop fs -get /user/hadoop/testFile.t

Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf

2009-02-18 Thread S D
ny changes you make will be ignored. > > > 2009/2/18 S D > > > Thanks for your response Rasit. You may have missed a portion of my post. > > > > > On a different note, when I attempt to pass params via -D I get a usage > > message; when I use > > &g

Re: Overriding mapred.tasktracker.map.tasks.maximum with -jobconf

2009-02-18 Thread S D
; > For more details about these options: > Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info > > > > I think -jobconf is not used in v.0.19 . > > 2009/2/18 S D > > > I'm having trouble overriding the maximum number of map tasks that

Overriding mapred.tasktracker.map.tasks.maximum with -jobconf

2009-02-18 Thread S D
I'm having trouble overriding the maximum number of map tasks that run on a given machine in my cluster. The default value of mapred.tasktracker.map.tasks.maximum is set to 2 in hadoop-default.xml. When running my job I passed -jobconf mapred.tasktracker.map.tasks.maximum=1 to limit map tasks to

Re: How do you remove a machine from the cluster? Slaves file not working...

2009-02-17 Thread S D
unning. Have a look at the mapred.hosts.exclude property for how to > exclude tasktrackers. > > Tom > > On Tue, Feb 17, 2009 at 5:31 PM, S D wrote: > > Thanks for your response. For clarification, I'm using S3 Native instead > of > > HDFS. Hence, I'm not even calling st

Re: How do you remove a machine from the cluster? Slaves file not working...

2009-02-17 Thread S D
te Student > University of California, Santa Cruz > > > On Tue, Feb 17, 2009 at 2:14 PM, S D wrote: > > > I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman). > It > > seemed as if problems were occurring on mystique (I was noticing errors > &

How do you remove a machine from the cluster? Slaves file not working...

2009-02-17 Thread S D
I have a Hadoop 0.19.0 cluster of 3 machines (storm, mystique, batman). It seemed as if problems were occurring on mystique (I was noticing errors with tasks that executed on mystique). So I decided to remove mystique. I did so by calling stop-mapred.sh (I'm using S3 Native, not HDFS), removing mys

Re: Hostnames on MapReduce Web UI

2009-02-15 Thread S D
need the localhost localhost.localdomain line so I thought I better avoid removing it altogether. Thanks, John On Sun, Feb 15, 2009 at 10:38 AM, Nick Cen wrote: > Try comment out te localhost definition in your /etc/hosts file. > > 2009/2/14 S D > > > I'm reviewing t

Re: Race Condition?

2009-02-15 Thread S D
ss it > concurrently, and maybe one of them deletes it when it's done and one > doesn't. Normally each task should run in its own temp directory though. > > On Sun, Feb 15, 2009 at 2:51 PM, S D wrote: > > > I was not able to determine the command shell return value fo

Re: Race Condition?

2009-02-15 Thread S D
r: 'localdir' not found Any clues on what could be going on? Thanks, John On Sat, Feb 14, 2009 at 6:45 PM, Matei Zaharia wrote: > Have you logged the output of the dfs command to see whether it's always > succeeded the copy? > > On Sat, Feb 14, 2009 at 2:46

Re: can't edit the file that mounted by fuse_dfs by editor

2009-02-15 Thread S D
I followed these instructions http://wiki.apache.org/hadoop/MountableHDFS and was able to get things working with 0.19.0 on Fedora. The only problem I ran into was the AMD64 issue on one of my boxes (see the note on the above link); I edited the Makefile and set OSARCH as suggested but couldn't g

Race Condition?

2009-02-14 Thread S D
In my Hadoop 0.19.0 program each map function is assigned a directory (representing a data location in my S3 datastore). The first thing each map function does is copy the particular S3 data to the local machine that the map task is running on and then being processing the data; e.g., command = "h

Hostnames on MapReduce Web UI

2009-02-13 Thread S D
I'm reviewing the task trackers on the web interface ( http://jobtracker-hostname:50030/) for my cluster of 3 machines. The names of the task trackers do not list real domain names; e.g., one of the task trackers is listed as: tracker_localhost:localhost/127.0.0.1:48167 I believe that the network

Weird Results with Streaming

2009-02-10 Thread S D
of /user/hadoop/base that listed only hadoopInput followed by a second refresh that listed the other subdirectories. Any clues on how this could be? Perhaps there is a leftover process still running? Thanks, John On Mon, Feb 2, 2009 at 9:38 PM, Amareshwari Sriramadasu < amar...@yahoo-inc.co

copyFromLocal *

2009-02-09 Thread S D
I'm using the Hadoop FS shell to move files into my data store (either HDFS or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't seem to work. Is there any way I can get that kind of functionality? Thanks, John

Re: Reporter for Hadoop Streaming?

2009-02-05 Thread S D
This does it. Thanks! On Thu, Feb 5, 2009 at 9:14 PM, Arun C Murthy wrote: > > On Feb 5, 2009, at 1:40 PM, S D wrote: > > Is there a way to use the Reporter interface (or something similar such as >> Counters) with Hadoop streaming? Alternatively, can how could STDOUT be >

Reporter for Hadoop Streaming?

2009-02-05 Thread S D
Is there a way to use the Reporter interface (or something similar such as Counters) with Hadoop streaming? Alternatively, can how could STDOUT be intercepted for the purpose of updates? If anyone could point me to documentation or examples that cover this I'd appreciate it. Thanks, John

Re: Regarding "Hadoop multi cluster" set-up

2009-02-04 Thread S D
Shefali, Is your firewall blocking port 54310 on the master? John On Wed, Feb 4, 2009 at 12:34 PM, shefali pawar wrote: > Hi, > > I am trying to set-up a two node cluster using Hadoop0.19.0, with 1 > master(which should also work as a slave) and 1 slave node. > > But while running bin/start-dfs

Re: Hadoop FS Shell - command overwrite capability

2009-02-04 Thread S D
p" file in bin directory. > > I think File System API also needs some improvement. I wonder if it's > considered by head developers. > > Hope this helps, > Rasit > > 2009/2/4 S D : > > I'm using the Hadoop FS commands to move files from my local machine

Re: hadoop dfs -test question (with a bit o' Ruby)

2009-02-03 Thread S D
Coincidentally I'm aware of the AWS::S3 package in Ruby but I'd prefer to avoid that... On Tue, Feb 3, 2009 at 5:02 PM, S D wrote: > I'm at my wit's end. I want to do a simple test for the existence of a file > on Hadoop. Here is the Ruby code I'm trying: >

hadoop dfs -test question (with a bit o' Ruby)

2009-02-03 Thread S D
I'm at my wit's end. I want to do a simple test for the existence of a file on Hadoop. Here is the Ruby code I'm trying: val = `hadoop dfs -test -e s3n://holeinthebucket/user/hadoop/file.txt` puts "Val: #{val}" if val == 1 // do one thing else // do another end I never get a return value for

Hadoop FS Shell - command overwrite capability

2009-02-03 Thread S D
I'm using the Hadoop FS commands to move files from my local machine into the Hadoop dfs. I'd like a way to force a write to the dfs even if a file of the same name exists. Ideally I'd like to use a "-force" switch or some such; e.g., hadoop dfs -copyFromLocal -force adirectory s3n://wholeinthe

Re: Hadoop Streaming Semantics

2009-02-02 Thread S D
You > need not include it in your streaming jar. > -Amareshwari > > > S D wrote: > >> Thanks for your response Amereshwari. I'm unclear on how to take advantage >> of NLineInputFormat with Hadoop Streaming. Is the idea that I modify the >> streaming jar file (c

Re: Hadoop Streaming Semantics

2009-01-30 Thread S D
t; You can use NLineInputFormat for this, which splits one line (N=1, by > default) as one split. > So, each map task processes one line. > See > http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html > > -Amareshwari > > S D wrote:

Hadoop Streaming Semantics

2009-01-29 Thread S D
Hello, I have a clarifying question about Hadoop streaming. I'm new to the list and didn't see anything posted that covers my questions - my apologies if I overlooked a relevant post. I have an input file consisting of a list of files (one per line) that need to be processed independently of each