hadoop on fedora 15

2011-08-05 Thread Manish
Hi,

Has anybody been able to run hadoop standalone mode on fedora 15 ?
I have installed it correctly. It runs till map but gets stuck in reduce.
It fails with the error mapred.JobClient Status : FAILED Too many
fetch-failures. I read several articles on net for this problem, all of them
say about the /etc/hosts and some say firewall issue.
I enabled firewall for the port range and also checked my /etc/hosts file. its
content is localhost and that is the only line in it.

Is sun-java absolute necessary or open-jdk will work ?

can someone give me some suggestion to get along with this problem ? 

Thanks  regard

Manish



Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-05 Thread John Armstrong
On Fri, 5 Aug 2011 08:50:02 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
 The book also
 mentioned the value if mutable, I think the key might also be mutable,
 means as we loop each value in iterableNullWritable, the content of
the
 key object is reset.

The mutability of the value is one of the weirdnesses of Hadoop you have
to get used to, and one of the few times it becomes important that Java
object semantics are pointer semantics.  Anyway, it wouldn't surprise me if
the key were also replaced on iteration, but I'd have to dig into the
Hadoop code to check on that.  Or you could!


Re: hadoop on fedora 15

2011-08-05 Thread madhu phatak
disable iptables and try again

On Fri, Aug 5, 2011 at 2:20 PM, Manish manish.iitg...@gmail.com wrote:

 Hi,

 Has anybody been able to run hadoop standalone mode on fedora 15 ?
 I have installed it correctly. It runs till map but gets stuck in reduce.
 It fails with the error mapred.JobClient Status : FAILED Too many
 fetch-failures. I read several articles on net for this problem, all of
 them
 say about the /etc/hosts and some say firewall issue.
 I enabled firewall for the port range and also checked my /etc/hosts file.
 its
 content is localhost and that is the only line in it.

 Is sun-java absolute necessary or open-jdk will work ?

 can someone give me some suggestion to get along with this problem ?

 Thanks  regard

 Manish




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: hadoop on fedora 15

2011-08-05 Thread Harsh J
Sun JDK is what its been thoroughly tested upon. You can run on
OpenJDK perhaps, but YMMV.

Hadoop has a strict requirement of having a proper network setup before use.

What port range did you open? TaskTracker would use 50060 for
intercommunication (over lo, if its bound to that). Check if your
daemons are binding to the right interfaces and have proper name-IP
resolutions, and then check if that port is allowed communication
upon.

On Fri, Aug 5, 2011 at 2:20 PM, Manish manish.iitg...@gmail.com wrote:
 Hi,

 Has anybody been able to run hadoop standalone mode on fedora 15 ?
 I have installed it correctly. It runs till map but gets stuck in reduce.
 It fails with the error mapred.JobClient Status : FAILED Too many
 fetch-failures. I read several articles on net for this problem, all of them
 say about the /etc/hosts and some say firewall issue.
 I enabled firewall for the port range and also checked my /etc/hosts file. its
 content is localhost and that is the only line in it.

 Is sun-java absolute necessary or open-jdk will work ?

 can someone give me some suggestion to get along with this problem ?

 Thanks  regard

 Manish





-- 
Harsh J


Re: Upload, then decompress archive on HDFS?

2011-08-05 Thread Harsh J
I suppose we could do with a simple identity mapping/identity reducing
example/tool that can easily be reutilized for purposes such as these.
Could you file a JIRA on this?

The -text is like -cat but has codec and some file format detection.
Hopefully it should work for your case.

On Fri, Aug 5, 2011 at 8:44 PM, Keith Wiley kwi...@keithwiley.com wrote:
 I can envision an M/R job for the purpose of manipulating hdfs, such as 
 (de)compressing files and resaving them back to HDFS.  I just didn't think it 
 should be necessary to *write a program* to do something so seemingly 
 minimal.  This (tarring/compressing/etc.) seems like an obvious method for 
 moving data back and forth; I would expect the tools to support it.

 I'll read up on -text.  Maybe that really is what I wanted, although I'm 
 dubious since this has nothing to do with textual data at all.  Anyway, I'll 
 see what I can find on that.

 Thanks.

 On Aug 4, 2011, at 9:04 PM, Harsh J wrote:

 Keith,

 The 'hadoop fs -text' tool does decompress a file given to it if
 needed/able, but what you could also do is run a distributed mapreduce
 job that converts from compressed to decompressed, that'd be much
 faster.

 On Fri, Aug 5, 2011 at 4:58 AM, Keith Wiley kwi...@keithwiley.com wrote:
 Instead of hd fs -put hundreds of files of X megs, I want to do it once 
 on a gzipped (or zipped) archive, one file, much smaller total megs.  Then 
 I want to decompress the archive on HDFS?  I can't figure out what hd fs 
 type command would do such a thing.

 Thanks.


 
 Keith Wiley     kwi...@keithwiley.com     keithwiley.com    
 music.keithwiley.com

 It's a fine line between meticulous and obsessive-compulsive and a slippery
 rope between obsessive-compulsive and debilitatingly slow.
                                           --  Keith Wiley
 





-- 
Harsh J


streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
I can use cacheFile to load .so files into the distributed cache and it works 
fine (the streaming executable links against the .so and runs), but I can't get 
it to work with -cacheArchive.  It always says it can't find the .so file.  I 
realize that if you jar a directory, the directory will be recreated when you 
unjar, but I've tried jaring a file directly.  It is easily verified that 
unjarring such a file reproduces the original file as a sibling of the jar file 
itself.  So it seems to me that cacheArchive should have transferred the jar 
file to the cwd of my task, unjarred it, and produced a .so file right there, 
but it doesn't link up with the executable.  Like I said, I know this basic 
approach works just fine with cacheFile.

What could be the problem here?  I can't easily see the files on the cluster 
since it is a remote cluster with limited access.  I don't believe I can ssh to 
any individual machine to investigate the files that are created for a 
task...but I think I have worked through the process logically and I'm not sure 
what I'm doing wrong.

Thoughts?


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

Luminous beings are we, not this crude matter.
   --  Yoda




Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Quick followup.  I substituted the true mapper for a little python script that 
just lists the cwd's contents and dumps them to the streaming output (stderr).  
Oddly, I it doesn't look like the .jar far was unpacked.  I can see it there, 
but not the unpacked version, so it looks like -cacheArchive transferred the 
file but didn't unjar it.

Anyone ever seen something like this before?


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use.
   --  Galileo Galilei




Re: streaming cacheArchive shared libraries

2011-08-05 Thread Ramya Sunil
Hi Keith,

I have tried the exact use case you have mentioned and it works fine for me.
Below is the command line for the same:

[ramya]$ jar vxf samplelib.jar
 created: META-INF/
 inflated: META-INF/MANIFEST.MF
 inflated: libhdfs.so

[ramya]$ hadoop dfs -put samplelib.jar samplelib.jar

[ramya]$ hadoop jar hadoop-streaming.jar -input InputDir -mapper ls
testlink/libhdfs.so -reducer NONE -output out -cacheArchive
hdfs://namenode:port/user/ramya/samplelib.jar#testlink

[ramya]$ hadoop dfs -cat out/*
testlink/libhdfs.so
testlink/libhdfs.so
testlink/libhdfs.so


Hope it helps.

Thanks
Ramya

On 8/5/11 10:10 AM, Keith Wiley kwi...@keithwiley.com wrote:


I can use cacheFile to load .so files into the distributed cache and it
works fine (the streaming executable links against the .so and runs), but I
can't get it to work with -cacheArchive.  It always says it can't find the
.so file.  I realize that if you jar a directory, the directory will be
recreated when you unjar, but I've tried jaring a file directly.  It is
easily verified that unjarring such a file reproduces the original file as a
sibling of the jar file itself.  So it seems to me that cacheArchive should
have transferred the jar file to the cwd of my task, unjarred it, and
produced a .so file right there, but it doesn't link up with the executable.
 Like I said, I know this basic approach works just fine with cacheFile.

What could be the problem here?  I can't easily see the files on the cluster
since it is a remote cluster with limited access.  I don't believe I can ssh
to any individual machine to investigate the files that are created for a
task...but I think I have worked through the process logically and I'm not
sure what I'm doing wrong.

Thoughts?


Keith Wiley *kwi...@keithwiley.com* keithwiley.com
music.keithwiley.com

Luminous beings are we, not this crude matter.
   --  Yoda



Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Okay, I think I understand.  The symlink name that follows the pound sign in 
the -cacheArchive directive isn't the name of the transferred jar file -- it is 
the name of a directory that the .jar file will be put into and then be 
unjarred.  So, it doesn't act like like jar would on a local machine, where 
files are recreated at the current directory level.  Rather, everything is 
pushed down by one level.  Wish a corresponding cmddev flag to point 
LD_LIBRARY_PATH to the correct location, I think I can get it to find the 
shared libraries now.

On Aug 5, 2011, at 10:27 , Keith Wiley wrote:

 Quick followup.  I substituted the true mapper for a little python script 
 that just lists the cwd's contents and dumps them to the streaming output 
 (stderr).  Oddly, I it doesn't look like the .jar far was unpacked.  I can 
 see it there, but not the unpacked version, so it looks like -cacheArchive 
 transferred the file but didn't unjar it.
 
 Anyone ever seen something like this before?



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!
   --  Homer Simpson




Re: streaming cacheArchive shared libraries

2011-08-05 Thread Keith Wiley
Right, so it was pushed down a level into the testlink directory.  That's why 
my shared libraries were not linking properly to my mapper executable.  I can 
fix that by using cmddev to redirect LD_LIBRARY_PATH.  I think that'll work.

On Aug 5, 2011, at 10:44 , Ramya Sunil wrote:

 Hi Keith,
 
 I have tried the exact use case you have mentioned and it works fine for me.
 Below is the command line for the same:
 
 [ramya]$ jar vxf samplelib.jar
 created: META-INF/
 inflated: META-INF/MANIFEST.MF
 inflated: libhdfs.so
 
 [ramya]$ hadoop dfs -put samplelib.jar samplelib.jar
 
 [ramya]$ hadoop jar hadoop-streaming.jar -input InputDir -mapper ls
 testlink/libhdfs.so -reducer NONE -output out -cacheArchive
 hdfs://namenode:port/user/ramya/samplelib.jar#testlink
 
 [ramya]$ hadoop dfs -cat out/*
 testlink/libhdfs.so
 testlink/libhdfs.so
 testlink/libhdfs.so
 
 
 Hope it helps.
 
 Thanks
 Ramya



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me.
   --  Abe (Grandpa) Simpson




Order of Operations

2011-08-05 Thread Premal Shah
Hi,
According to the attached image found on yahoo's hadoop
tutorialhttp://developer.yahoo.com/hadoop/tutorial/module4.html,
the order of operations is map  combine  partition which should be
followed by reduce

Here is my an example key emmited by the map operation
LongValueSum:geo_US|1311722400|E 1

This should get combined with other keys as
geo_US|1311722400|E 100
(assuming there are 100 keys of the same type)

Then i'd like to partition the keys by the value before the first pipe(|)
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29
geo_US

so here's my streaming command

hadoop jar
/usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar \
-D mapred.reduce.tasks=8 \
-D stream.num.map.output.key.fields=1 \
-D mapred.text.key.partitioner.options=-k1,1 \
-D stream.map.output.field.separator=\| \
-file mapper.py \
-mapper mapper.py \
-file reducer.py \
-reducer reducer.py \
-combiner org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer
\
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-input input_file \
-output output_path


This is the error I get

java.lang.NumberFormatException: For input string: 1311722400|E1
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:419)
at java.lang.Long.parseLong(Long.java:468)* at
org.apache.hadoop.mapred.lib.aggregate.LongValueSum.addNextValue(LongValueSum.java:48)
at 
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:59)
at 
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:35)*
at
org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1349)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

I think its because the partitioner is running before the combiner.
Any thoughts?

-- 
Regards,
Premal Shah.


cmdenv LD_LIBRARY_PATH

2011-08-05 Thread Keith Wiley
I know you can do something like this:

-cmdenv LD_LIBRARY_PATH=./my_libs

if you have shared libraries in a subdirectory under the cwd (such as occurs 
when using -cacheArchive to load and unpack a jar full of .so files into the 
distributed cache)...but this destroys the existing path.  I think I want 
something more like this:

-cmdenv LD_LIBRARY_PATH=./my_libs:$LD_LIBRARY_PATH

but it interprets the environment variable as it constructs the command.  It 
uses the local version of the variable and converts it as it builds the hadoop 
command, it doesn't send the $ version to hadoop to be converted at that later 
time.

Is this something that can be fixed by some combination of single,double,back 
quotes and back slashes?  I'm uncertain of the proper sequence.


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also.
   --  Mark Twain




Hadoop order of operations

2011-08-05 Thread Premal

According to the attached image found on yahoo's hadoop tutorial, the order
of operations is map  combine  partition which should be followed by
reduce

Here is my an example key emmited by the map operation

LongValueSum:geo_US|1311722400|E1

Assuming there are 100 keys of the same type, this should get combined as

geo_US|1311722400|E 100

Then i'd like to partition the keys by the value before the first pipe(|)
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29

geo_US

Here's the streaming command

hadoop jar
/usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar \
-D mapred.reduce.tasks=8 \
-D stream.num.map.output.key.fields=1 \
-D mapred.text.key.partitioner.options=-k1,1 \
-D stream.map.output.field.separator=\| \
-file mapper.py \
-mapper mapper.py \
-file reducer.py \
-reducer reducer.py \
-combiner org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer
\
-partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
-input input_file \
-output output_path


This is the error I get
java.lang.NumberFormatException: For input string: 1311722400|E1
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
at java.lang.Long.parseLong(Long.java:419)
at java.lang.Long.parseLong(Long.java:468)
at
org.apache.hadoop.mapred.lib.aggregate.LongValueSum.addNextValue(LongValueSum.java:48)
at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:59)
at
org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:35)
at 
org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1349)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

I looks like the partitioner is running before the combiner. Any thoughts?
-- 
View this message in context: 
http://old.nabble.com/Hadoop-order-of-operations-tp32205781p32205781.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: maprd vs mapreduce api

2011-08-05 Thread Stevens, Keith D.
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the 
identity function.  So you should be able to just do 

conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class);

without having to implement your own no-op classes.

I recommend reading the javadoc for differences between the old api and the new 
api, for example http://hadoop.apache.org/common/docs/r0.20.2/api/index.html 
indicates the different functionality of Mapper in the new api and it's dual 
use as the identity mapper.

Cheers,
--Keith

On Aug 5, 2011, at 1:15 PM, garpinc wrote:

 
 I was following this tutorial on version 0.19.1
 
 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html
 
 I however wanted to use the latest version of api 0.20.2
 
 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
 
 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
 public static class NOOPMapper 
  extends MapperText, IntWritable, Text, IntWritable{
 
 
   public void map(Text key, IntWritable value, Context context
   ) throws IOException, InterruptedException {
 
   context.write(key, value);
 
   }
 }
 
 public static class NOOPReducer 
  extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();
 
   public void reduce(Text key, IterableIntWritable values, 
  Context context
  ) throws IOException, InterruptedException {
 context.write(key, result);
   }
 }
 
 
 And then with code:
   Configuration conf = new Configuration();
   Job job = new Job(conf, testdriver);
 
   job.setOutputKeyClass(Text.class);
   job.setOutputValueClass(IntWritable.class);
 
   job.setInputFormatClass(TextInputFormat.class);
   job.setOutputFormatClass(TextOutputFormat.class);
 
   FileInputFormat.addInputPath(job, new Path(In));
   FileOutputFormat.setOutputPath(job, new Path(Out));
 
   job.setMapperClass(NOOPMapper.class);
   job.setReducerClass(NOOPReducer.class);
 
   job.waitForCompletion(true);
 
 
 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
   at TestDriver$NOOPMapper.map(TestDriver.java:1)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0
 
 
 
 Can anyone tell me what I need for this to work.
 
 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java 
 -- 
 View this message in context: 
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.
 



Re: maprd vs mapreduce api

2011-08-05 Thread Mohit Anchlia
On Fri, Aug 5, 2011 at 3:42 PM, Stevens, Keith D. steven...@llnl.gov wrote:
 The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the 
 identity function.  So you should be able to just do

 conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
 conf.setReducerClass(org.apache.hadoop.mapreduce.Reducer.class);

 without having to implement your own no-op classes.

 I recommend reading the javadoc for differences between the old api and the 
 new api, for example 
 http://hadoop.apache.org/common/docs/r0.20.2/api/index.html indicates the 
 different functionality of Mapper in the new api and it's dual use as the 
 identity mapper.

Sorry for asking on this thread :) Does Definitive Guide 2 cover the new api?

 Cheers,
 --Keith

 On Aug 5, 2011, at 1:15 PM, garpinc wrote:


 I was following this tutorial on version 0.19.1

 http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html

 I however wanted to use the latest version of api 0.20.2

 The original code in tutorial had following lines
 conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class);
 conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);

 both Identity classes are deprecated.. So seemed the solution was to create
 mapper and reducer as follows:
 public static class NOOPMapper
      extends MapperText, IntWritable, Text, IntWritable{


   public void map(Text key, IntWritable value, Context context
                   ) throws IOException, InterruptedException {

       context.write(key, value);

   }
 }

 public static class NOOPReducer
      extends ReducerText,IntWritable,Text,IntWritable {
   private IntWritable result = new IntWritable();

   public void reduce(Text key, IterableIntWritable values,
                      Context context
                      ) throws IOException, InterruptedException {
     context.write(key, result);
   }
 }


 And then with code:
               Configuration conf = new Configuration();
               Job job = new Job(conf, testdriver);

               job.setOutputKeyClass(Text.class);
               job.setOutputValueClass(IntWritable.class);

               job.setInputFormatClass(TextInputFormat.class);
               job.setOutputFormatClass(TextOutputFormat.class);

               FileInputFormat.addInputPath(job, new Path(In));
               FileOutputFormat.setOutputPath(job, new Path(Out));

               job.setMapperClass(NOOPMapper.class);
               job.setReducerClass(NOOPReducer.class);

               job.waitForCompletion(true);


 However I get this message
 java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
 cast to org.apache.hadoop.io.Text
       at TestDriver$NOOPMapper.map(TestDriver.java:1)
       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
       at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 11/08/01 16:41:01 INFO mapred.JobClient:  map 0% reduce 0%
 11/08/01 16:41:01 INFO mapred.JobClient: Job complete: job_local_0001
 11/08/01 16:41:01 INFO mapred.JobClient: Counters: 0



 Can anyone tell me what I need for this to work.

 Attached is full code..
 http://old.nabble.com/file/p32174859/TestDriver.java TestDriver.java
 --
 View this message in context: 
 http://old.nabble.com/maprd-vs-mapreduce-api-tp32174859p32174859.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





java.io.IOException: config()

2011-08-05 Thread jagaran das
Hi,

I have been struck with this exception:

java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.(Configuration.java:211)
at org.apache.hadoop.conf.Configuration.(Configuration.java:198)
at org.apache.hadoop.hbase.HBaseConfiguration.create(HBaseConfiguration.java:99)
at test.TestApp.main(TestApp.java:19)


  
05Aug2011 20:08:53,303 DEBUG 
[LeaseChecker@DFSClient[clientName=DFSClient_-1591195062, 
ugi=jagarandas,staff,com.apple.sharepoint.group.1,_developer,_lpoperator,_lpadmin,_appserveradm,admin,_appserverusr,localaccounts,everyone,fmsadmin,com.apple.access_screensharing,com.apple.sharepoint.group.2,com.apple.sharepoint.group.3]:
 java.lang.Throwable: for testing

  
05Aug2011 20:08:53,315 DEBUG [listenerContainer-1] (DFSClient.java:3012) - 
DFSClient writeChunk allocating new packet seqno=0, 
src=/home/hadoop/listenerContainer-1jagaran-dass-macbook-pro.local_222812011-08-05-20-08-52,
 packetSize=65557, chunksPerPacket=127, bytesCurBlock=0

I saw the source code :

 public Configuration(boolean loadDefaults) {
    this.loadDefaults = loadDefaults;
    if (LOG.isDebugEnabled()) {
      LOG.debug(StringUtils.stringifyException(new IOException(config(;
    }
    synchronized(Configuration.class) {
      REGISTRY.put(this, null);
    }
  }

Log is in debug mode.

Can anyone please help me on this??

Regards,
JD

Re: Hadoop order of operations

2011-08-05 Thread Harsh J
Premal,

Didn't go through your entire thread, but the right order is: map
(N) - partition (N) - combine (0…N).

On Sat, Aug 6, 2011 at 4:04 AM, Premal premal.j.s...@gmail.com wrote:

 According to the attached image found on yahoo's hadoop tutorial, the order
 of operations is map  combine  partition which should be followed by
 reduce

 Here is my an example key emmited by the map operation

    LongValueSum:geo_US|1311722400|E        1

 Assuming there are 100 keys of the same type, this should get combined as

    geo_US|1311722400|E     100

 Then i'd like to partition the keys by the value before the first pipe(|)
 http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29

    geo_US

 Here's the streaming command

    hadoop jar
 /usr/local/hadoop/contrib/streaming/hadoop-streaming-0.20.203.0.jar \
    -D mapred.reduce.tasks=8 \
    -D stream.num.map.output.key.fields=1 \
    -D mapred.text.key.partitioner.options=-k1,1 \
    -D stream.map.output.field.separator=\| \
    -file mapper.py \
    -mapper mapper.py \
    -file reducer.py \
    -reducer reducer.py \
    -combiner org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer
 \
    -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner \
    -input input_file \
    -output output_path


 This is the error I get
    java.lang.NumberFormatException: For input string: 1311722400|E    1
        at
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
        at java.lang.Long.parseLong(Long.java:419)
        at java.lang.Long.parseLong(Long.java:468)
        at
 org.apache.hadoop.mapred.lib.aggregate.LongValueSum.addNextValue(LongValueSum.java:48)
        at
 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:59)
        at
 org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorReducer.reduce(ValueAggregatorReducer.java:35)
        at 
 org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1349)
        at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1435)
        at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)

 I looks like the partitioner is running before the combiner. Any thoughts?
 --
 View this message in context: 
 http://old.nabble.com/Hadoop-order-of-operations-tp32205781p32205781.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





-- 
Harsh J


Re: java.io.IOException: config() IMP

2011-08-05 Thread jagaran das
Hi,

I am using CDH3.
I need to stream huge amount of data from our application to hadoop.
I am opening a connection like
config.set(fs.default.name,hdfsURI);
FileSystem dfs = FileSystem.get(config);
String path = hdfsURI + connectionKey;
Path destPath = new Path(path);
logger.debug(Path --  + destPath.getName());
outStream = dfs.create(destPath);
and keeping the outStream open for some time and writing continuously through 
it and then closing it.
But it is throwing 

5Aug2011 21:36:48,550 DEBUG 
[LeaseChecker@DFSClient[clientName=DFSClient_218151655, ugi=jagarandas]: 
java.lang.Throwable: for testing
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.toString(DFSClient.java:1181)
at org.apache.hadoop.util.Daemon.init(Daemon.java:38)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.put(DFSClient.java:1094)
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:547)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:219)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
at 
com.abc.ireporter.common.persistence.ConnectionManager.createConnection(ConnectionManager.java:66)
at 
com.abc.ireporter.common.persistence.HDPPersistor.writeToHDP(HDPPersistor.java:93)
at 
com.abc.ireporter.datatransformer.translator.HDFSTranslator.persistData(HDFSTranslator.java:41)
at 
com.abc.ireporter.datatransformer.adapter.TranslatorAdapter.processData(TranslatorAdapter.java:61)
at 
com.abc.ireporter.datatransformer.DefaultMessageListener.persistValidatedData(DefaultMessageListener.java:276)
at 
com.abc.ireporter.datatransformer.DefaultMessageListener.onMessage(DefaultMessageListener.java:93)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:506)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:463)
at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:435)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:322)
at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:260)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:944)
at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:868)
at java.lang.Thread.run(Thread.java:680)
] (RPC.java:230) - Call: renewLease 4
05Aug2011 21:36:48,550 DEBUG [listenerContainer-1] (DFSClient.java:3274) - 
DFSClient writeChunk allocating new packet seqno=0, 
src=/home/hadoop/listenerContainer-1jagaran-dass-macbook-pro.local_247811312605307819,
 packetSize=65557, chunksPerPacket=127, bytesCurBlock=0
05Aug2011 21:36:48,551 DEBUG [Thread-11] (DFSClient.java:2499) - Allocating new 
block
05Aug2011 21:36:48,552 DEBUG [sendParams-0] (Client.java:761) - IPC Client (47) 
connection to localhost/127.0.0.1:8020 from jagarandas sending #3
05Aug2011 21:36:48,553 DEBUG [IPC Client (47) connection to 
localhost/127.0.0.1:8020 from jagarandas] (Client.java:815) - IPC Client (47) 
connection to localhost/127.0.0.1:8020 from jagarandas got value #3
05Aug2011 21:36:48,556 DEBUG [Thread-11] (RPC.java:230) - Call: addBlock 4
05Aug2011 21:36:48,557 DEBUG [Thread-11] (DFSClient.java:3094) - pipeline = 
127.0.0.1:50010
05Aug2011 21:36:48,557 DEBUG [Thread-11] (DFSClient.java:3102) - Connecting to 
127.0.0.1:50010
05Aug2011 21:36:48,559 DEBUG [Thread-11] (DFSClient.java:3109) - Send buf size 
131072
05Aug2011 21:36:48,635 DEBUG [DataStreamer for file 
/home/hadoop/listenerContainer-1jagaran-dass-macbook-pro.local_247811312605307819
 block blk_-5183404460805094255_1042] (DFSClient.java:2533) - DataStreamer 
block blk_-5183404460805094255_1042 wrote packet seqno:0 size:1522 
offsetInBlock:0 lastPacketInBlock:true
05Aug2011 21:36:48,638 DEBUG [ResponseProcessor for block 
blk_-5183404460805094255_1042] (DFSClient.java:2640) - DFSClient Replies for 
seqno 0 are SUCCESS
05Aug2011 21:36:48,639 DEBUG [DataStreamer for file 
/home/hadoop/listenerContainer-1jagaran-dass-macbook-pro.local_247811312605307819
 block blk_-5183404460805094255_1042] (DFSClient.java:2563) - Closing old block 
blk_-5183404460805094255_1042
05Aug2011 21:36:48,645 DEBUG [sendParams-0] (Client.java:761) - IPC Client (47) 
connection to localhost/127.0.0.1:8020 from jagarandas sending #4
05Aug2011 21:36:48,647 DEBUG [IPC Client (47) connection to 
localhost/127.0.0.1:8020 from jagarandas] (Client.java:815) - IPC Client (47) 
connection to