Re: next gen map reduce

2011-08-01 Thread Dieter Plaetinck
On Thu, 28 Jul 2011 06:13:01 -0700 Thomas Graves tgra...@yahoo-inc.com wrote: Its currently still on the MR279 branch - http://svn.apache.org/viewvc/hadoop/common/branches/MR-279/. It is planned to be merged to trunk soon. Tom On 7/28/11 7:31 AM, real great..

RE: Moving Files to Distributed Cache in MapReduce

2011-08-01 Thread Michael Segel
Yeah, I'll write something up and post it on my web site. Definitely not InfoQ stuff, but a simple tip and tricks stuff. -Mike Subject: Re: Moving Files to Distributed Cache in MapReduce From: a...@apache.org Date: Sun, 31 Jul 2011 19:21:14 -0700 To: common-user@hadoop.apache.org We

Re: next gen map reduce

2011-08-01 Thread Thomas Graves
The jira has more details and an architecture doc attached. https://issues.apache.org/jira/browse/MAPREDUCE-279 Tom On 8/1/11 2:12 AM, Dieter Plaetinck dieter.plaeti...@intec.ugent.be wrote: On Thu, 28 Jul 2011 06:13:01 -0700 Thomas Graves tgra...@yahoo-inc.com wrote: Its currently still

Using -libjar option

2011-08-01 Thread Aquil H. Abdullah
Hello All, I am new to Hadoop, and I am trying to use the GenericOptionsParser Class. In particular, I would like to use the -libjar option to specify additional jar files to include in the classpath. I've created a class that extends Configured and Implements Tool: *public class* OptionDemo

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 12:11:27 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: but it still isn't clear to me how the -libjars option is parsed, whether or not I need to explicitly add it to the classpath inside my run method, or where it needs to be placed in the command-line? IIRC

Re: Using -libjar option

2011-08-01 Thread Harsh J
Aquil, On a side-note, if you use Tool, GenericOptsParser is automatically used internally (by ToolRunner), so you don't have to re-parse your args in your run(…) method. What you get as run(args) are the remnant args alone, if your application handles any. Would help, as John pointed out, if

Re: Using -libjar option

2011-08-01 Thread Aquil H. Abdullah
[See Response Inline] I've tried invoking getLib On Mon, Aug 1, 2011 at 12:56 PM, Harsh J ha...@cloudera.com wrote: Aquil, On a side-note, if you use Tool, GenericOptsParser is automatically used internally (by ToolRunner), so you don't have to re-parse your args in your run(…) method. What

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 13:21:27 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: [AA] I am currently invoking my application as follows: hadoop jar /home/test/hadoop/test.option.demo.jar test.option.demo.OptionDemo -libjar /home/test/hadoop/lib/mytestlib.jar I believe the problem might

Mappers fail to initialize and are killed after 600 seconds

2011-08-01 Thread Stevens, Keith D.
Hi all, I'm running a simple mapreduce job that connects to an hbase table, reads each row, counts some co-occurrence frequencies, and writes everything out to hdfs at the end. Everything seems to be going smoothly until the last 5, out of 108, tasks run. The last 5 tasks seem to be stuck

Re: Using -libjar option

2011-08-01 Thread Aquil H. Abdullah
Don't I feel sheepish... OK, so I've hacked this sample code below, from the ConfigurationPrinter example in Hadoop: The Definitive Guide. If -libjars had been added to the configuration I would expect to see it when I iterate over the urls, however I see it as one of the remaining options:

Re: Using -libjar option

2011-08-01 Thread John Armstrong
On Mon, 1 Aug 2011 15:30:49 -0400, Aquil H. Abdullah aquil.abdul...@gmail.com wrote: Don't I feel sheepish... Happens to the best, or so they tell me. OK, so I've hacked this sample code below, from the ConfigurationPrinter example in Hadoop: The Definitive Guide. If -libjars had been added

Re: Mappers fail to initialize and are killed after 600 seconds

2011-08-01 Thread Harsh J
Are there no userlogs from the failed tasks? TaskTracker logs won't carry user-code (task) logs. Could you paste those syslog lines (from the task) to pastebin/etc. since the lists may not be accepting attachments? On Tue, Aug 2, 2011 at 12:51 AM, Stevens, Keith D. steven...@llnl.gov wrote: Hi

Re: Mappers fail to initialize and are killed after 600 seconds

2011-08-01 Thread Stevens, Keith D.
In short, there are no userlogs. stderr and stdout are both empty. I copied the output from syslog to the following pastebin: http://pastebin.com/0XXE9Jze. The first 22 lines look to be exactly the same as the syslogs for other, non-dying, tasks. The main departure is on line 23 where the

RE: Hadoop-streaming using binary executable c program

2011-08-01 Thread Daniel Yehdego
Hi Bobby, I have written a small Perl script which do the following job: Assume we have an output from the mapper MAP1 RNA-1 STRUCTURE-1 MAP2 RNA-2 STRUCTURE-2 MAP3 RNA-3 STRUCTURE-3 and what the script does is reduce in the following manner :

RE: Hadoop cluster network requirement

2011-08-01 Thread Michael Segel
Yeah what he said. Its never a good idea. Forget about losing a NN or a Rack, but just losing connectivity between data centers. (It happens more than you think.) Your entire cluster in both data centers go down. Boom! Its a bad design. You're better off doing two different clusters. Is

How to access contents of a Map Reduce job's working directory

2011-08-01 Thread Shrish Bajpai
I have just started to explore Hadoop but I am stuck in a situation now. I want to run a MapReduce job in hadoop which needs to create a setup folder in working directory. During the execution the job will generate some additional text files within this setup folder. The problem is I dont know

Re: Hadoop cluster network requirement

2011-08-01 Thread Mohit Anchlia
Assuming everything is up this solution still will not scale given the latency, tcpip buffers, sliding window etc. See BDP Sent from my iPad On Aug 1, 2011, at 4:57 PM, Michael Segel michael_se...@hotmail.com wrote: Yeah what he said. Its never a good idea. Forget about losing a NN or a

Hive-HBase Integration Jar Question

2011-08-01 Thread Neerja Bhatnagar
Hi, I am using hive-hbase-handler-0.7.0-cdh3u0.jar (under hive-0.7.0-cdh3u0/lib) thrift-fb303-0.5.0.jar (under hive-0.7.0-cdh3u0/lib) thrift-0.2.0.jar (under hbase-0.90.1-cdh3u0/lib) in my project. We use Maven; could anyone please tell me where I can get the pom information for these jars?

Re: Hive-HBase Integration Jar Question

2011-08-01 Thread Mayuresh
In our case we have our own maven repo where we uploaded these jars. You can also install it in your local repo from the command line if you don't have your own maven repo. On Aug 2, 2011 7:00 AM, Neerja Bhatnagar bnee...@gmail.com wrote: Hi, I am using hive-hbase-handler-0.7.0-cdh3u0.jar

Max Number of Open Connections

2011-08-01 Thread jagaran das
Hi, What is the max number of open connections to a namenode? I am using  FSDataOutputStream out = dfs.create(src); Cheers, JD 

Re: maprd vs mapreduce api

2011-08-01 Thread Roger Chen
Your reducer is writing IntWritable but your output format class is still Text. Change one of those so they match the other. On Mon, Aug 1, 2011 at 8:40 PM, garpinc garp...@hotmail.com wrote: I was following this tutorial on version 0.19.1

The best architecture for EC2/Hadoop interface?

2011-08-01 Thread Mark Kerzner
Hi, I want to give my users a GUI that would allow them to start Hadoop clusters and run applications that I will provide on the AMIs. What would be a good approach to make it simple for the user? Should I write a Java Swing app that will wrap around the EC2 commands? Should I use some more