java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase

2012-05-04 Thread 唐方爽
Hi,

I try to run a Hadoop reduce-side join, then I get the following:

java.lang.NoClassDefFoundError: 
org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at DataJoin.run(DataJoin.java:105)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at DataJoin.main(DataJoin.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 23 more

The command I use : hadoop jar JoinHadoop.jar DataJoin 
/group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join
The source is from Hadoop in action, chapter5, listing 5.3. I use eclipse to 
export it as a jar
My Hadoop is 0.19.2

Thanks!

The source code:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
//import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
//import org.apache.hadoop.mapred.KeyValueTextInputFormat;
//import org.apache.hadoop.mapred.MapReduceBase;
//import org.apache.hadoop.mapred.Mapper;
//import org.apache.hadoop.mapred.OutputCollector;
//import org.apache.hadoop.mapred.Reducer;
//import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;

public class DataJoin extends Configured implements Tool {

public static class MapClass extends DataJoinMapperBase {

protected Text generateInputTag(String inputFile) {
return new Text(inputFile);
}

protected Text generateGroupKey(TaggedMapOutput aRecord) {
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(,);
String groupKey = tokens[0];
return new Text(groupKey);
}

protected TaggedMapOutput generateTaggedMapOutput(Object value) {
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}

public static class Reduce extends DataJoinReducerBase {

protected TaggedMapOutput combine(Object[] tags, Object[] values) {
if (tags.length  2) return null;
String joinedStr = ;
for (int i=0; ivalues.length; i++) {
if (i  0) joinedStr += ,;
   TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(,, 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new 

java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase

2012-05-04 Thread 唐方爽
Hi,

I try to run a Hadoop reduce-side join, then I get the following:

java.lang.NoClassDefFoundError: 
org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at DataJoin.run(DataJoin.java:105)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at DataJoin.main(DataJoin.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 23 more

The command I use : hadoop jar JoinHadoop.jar DataJoin 
/group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join
The source is from Hadoop in action, chapter5, listing 5.3. I use eclipse to 
export it as a jar
My Hadoop is 0.19.2

Thanks!

The source code:
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
//import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
//import org.apache.hadoop.mapred.KeyValueTextInputFormat;
//import org.apache.hadoop.mapred.MapReduceBase;
//import org.apache.hadoop.mapred.Mapper;
//import org.apache.hadoop.mapred.OutputCollector;
//import org.apache.hadoop.mapred.Reducer;
//import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;

public class DataJoin extends Configured implements Tool {

public static class MapClass extends DataJoinMapperBase {

protected Text generateInputTag(String inputFile) {
return new Text(inputFile);
}

protected Text generateGroupKey(TaggedMapOutput aRecord) {
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(,);
String groupKey = tokens[0];
return new Text(groupKey);
}

protected TaggedMapOutput generateTaggedMapOutput(Object value) {
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}

public static class Reduce extends DataJoinReducerBase {

protected TaggedMapOutput combine(Object[] tags, Object[] values) {
if (tags.length  2) return null;
String joinedStr = ;
for (int i=0; ivalues.length; i++) {
if (i  0) joinedStr += ,;
   TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(,, 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new TaggedWritable(new 

Re: Splitting data input to Distcp

2012-05-04 Thread Pedro Figueiredo
On 3 May 2012, at 23:47, Himanshu Vijay wrote:

 Pedro,
 
 Thanks for the response. Unfortunately I am running it on in-house cluster
 and from there I need to upload to S3.
 

Hi,

Last night I was thinking about this... what happens if you copy

s3://region.elasticmapreduce/libs/s3distcp/1.0.1/s3distcp.jar

to your cluster and run

hadoop jar s3distcp.jar --src hdfs:///path/to/files --dest s3://bucket/path 
--outputCodec lzo (or what have you)

?

Alternatively, you could run the following Pig or Hive jobs (using output 
compression):

--- pig ---
local_data = load '/path/to/files' as ( ... );
store local_data into 's3://bucket/path' using ...;

--- hive ---
create external table foo (
  ...
)
[row format ... | serde]
location '/path/to/files';

create external table s3_foo (
  ...
)
[row format ... | serde]
location 's3://bucket/path';

insert overwrite table s3_foo
select * from foo;

Obviously an equivalent Native or Streaming job is trivial to write, too.

Cheers,

Pedro Figueiredo
Skype: pfig.89clouds
http://89clouds.com/ - Big Data Consulting






java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase

2012-05-04 Thread 唐方爽
Hi,

I try to run a Hadoop reduce-side join, then I get the following:

java.lang.NoClassDefFoundError:
org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at DataJoin.run(DataJoin.java:105)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at DataJoin.main(DataJoin.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 23 more

What's the problem?

The command I use : hadoop jar JoinHadoop.jar DataJoin
/group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join

The source is from *Hadoop in action*, chapter5, listing 5.3. I use eclipse
to export it as a jar
My Hadoop is 0.19.2

Thanks!

The source code:


import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
//import java.util.Iterator;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
//import org.apache.hadoop.mapred.KeyValueTextInputFormat;
//import org.apache.hadoop.mapred.MapReduceBase;
//import org.apache.hadoop.mapred.Mapper;
//import org.apache.hadoop.mapred.OutputCollector;
//import org.apache.hadoop.mapred.Reducer;
//import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;

public class DataJoin extends Configured implements Tool {

public static class MapClass extends DataJoinMapperBase {

protected Text generateInputTag(String inputFile) {
return new Text(inputFile);
}

protected Text generateGroupKey(TaggedMapOutput aRecord) {
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(,);
String groupKey = tokens[0];
return new Text(groupKey);
}

protected TaggedMapOutput generateTaggedMapOutput(Object value) {
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}

public static class Reduce extends DataJoinReducerBase {

protected TaggedMapOutput combine(Object[] tags, Object[] values) {
if (tags.length  2) return null;
String joinedStr = ;
for (int i=0; ivalues.length; i++) {
if (i  0) joinedStr += ,;
   TaggedWritable tw = (TaggedWritable) values[i];
String line = ((Text) tw.getData()).toString();
String[] tokens = line.split(,, 2);
joinedStr += tokens[1];
}
TaggedWritable retv = new 

Re: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase

2012-05-04 Thread JunYong Li
is any other error log, check
1. JoinHadoop.jar collectly submit to hadoop
2. DataJoinMapperBase really in the JoinHadoop.jar

2012/5/4 唐方爽 fstang...@gmail.com

 Hi,

 I try to run a Hadoop reduce-side join, then I get the following:

 java.lang.NoClassDefFoundError:
 org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at DataJoin.run(DataJoin.java:105)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at DataJoin.main(DataJoin.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
... 23 more

 What's the problem?

 The command I use : hadoop jar JoinHadoop.jar DataJoin
 /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join

 The source is from *Hadoop in action*, chapter5, listing 5.3. I use eclipse
 to export it as a jar
 My Hadoop is 0.19.2

 Thanks!

 The source code:


 import java.io.DataInput;
 import java.io.DataOutput;
 import java.io.IOException;
 //import java.util.Iterator;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 //import org.apache.hadoop.mapred.KeyValueTextInputFormat;
 //import org.apache.hadoop.mapred.MapReduceBase;
 //import org.apache.hadoop.mapred.Mapper;
 //import org.apache.hadoop.mapred.OutputCollector;
 //import org.apache.hadoop.mapred.Reducer;
 //import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.TextInputFormat;
 import org.apache.hadoop.mapred.TextOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
 import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
 import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;

 public class DataJoin extends Configured implements Tool {

public static class MapClass extends DataJoinMapperBase {

protected Text generateInputTag(String inputFile) {
return new Text(inputFile);
}

protected Text generateGroupKey(TaggedMapOutput aRecord) {
String line = ((Text) aRecord.getData()).toString();
String[] tokens = line.split(,);
String groupKey = tokens[0];
return new Text(groupKey);
}

protected TaggedMapOutput generateTaggedMapOutput(Object value) {
TaggedWritable retv = new TaggedWritable((Text) value);
retv.setTag(this.inputTag);
return retv;
}
}

public static class Reduce extends DataJoinReducerBase {

protected TaggedMapOutput combine(Object[] tags, Object[] values) {
if (tags.length  2) return null;
String joinedStr = ;
for (int i=0; ivalues.length; i++) {
if (i  0) joinedStr += ,;
   TaggedWritable tw = (TaggedWritable) values[i];
  

Re: java.lang.NoClassDefFoundError: org/apache/hadoop/contrib/utils/join/DataJoinMapperBase

2012-05-04 Thread 唐方爽
DataJoinMapperBase is not in the JoinHadoop.jar.
When I add it and related classes to JoinHadoop.jar, it works!
(although I got an IOException at reduce stage... maybe I should check the
code or input files)
thanks!

2012/5/4 JunYong Li lij...@gmail.com

 is any other error log, check
 1. JoinHadoop.jar collectly submit to hadoop
 2. DataJoinMapperBase really in the JoinHadoop.jar

 2012/5/4 唐方爽 fstang...@gmail.com

  Hi,
 
  I try to run a Hadoop reduce-side join, then I get the following:
 
  java.lang.NoClassDefFoundError:
  org/apache/hadoop/contrib/utils/join/DataJoinMapperBase
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at
  java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at DataJoin.run(DataJoin.java:105)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at DataJoin.main(DataJoin.java:119)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
  Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.contrib.utils.join.DataJoinMapperBase
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 ... 23 more
 
  What's the problem?
 
  The command I use : hadoop jar JoinHadoop.jar DataJoin
  /group/asciaa/fst/input_test_join /group/asciaa/fst/out_test_join
 
  The source is from *Hadoop in action*, chapter5, listing 5.3. I use
 eclipse
  to export it as a jar
  My Hadoop is 0.19.2
 
  Thanks!
 
  The source code:
 
 
  import java.io.DataInput;
  import java.io.DataOutput;
  import java.io.IOException;
  //import java.util.Iterator;
 
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.conf.Configured;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.io.Writable;
  import org.apache.hadoop.mapred.FileInputFormat;
  import org.apache.hadoop.mapred.FileOutputFormat;
  import org.apache.hadoop.mapred.JobClient;
  import org.apache.hadoop.mapred.JobConf;
  //import org.apache.hadoop.mapred.KeyValueTextInputFormat;
  //import org.apache.hadoop.mapred.MapReduceBase;
  //import org.apache.hadoop.mapred.Mapper;
  //import org.apache.hadoop.mapred.OutputCollector;
  //import org.apache.hadoop.mapred.Reducer;
  //import org.apache.hadoop.mapred.Reporter;
  import org.apache.hadoop.mapred.TextInputFormat;
  import org.apache.hadoop.mapred.TextOutputFormat;
  import org.apache.hadoop.util.Tool;
  import org.apache.hadoop.util.ToolRunner;
 
  import org.apache.hadoop.contrib.utils.join.DataJoinMapperBase;
  import org.apache.hadoop.contrib.utils.join.DataJoinReducerBase;
  import org.apache.hadoop.contrib.utils.join.TaggedMapOutput;
 
  public class DataJoin extends Configured implements Tool {
 
 public static class MapClass extends DataJoinMapperBase {
 
 protected Text generateInputTag(String inputFile) {
 return new Text(inputFile);
 }
 
 protected Text generateGroupKey(TaggedMapOutput aRecord) {
 String line = ((Text) aRecord.getData()).toString();
 String[] tokens = line.split(,);
 String groupKey = tokens[0];
 return new Text(groupKey);
 }
 
 protected TaggedMapOutput generateTaggedMapOutput(Object value) {
 TaggedWritable retv = new TaggedWritable((Text) value);
 retv.setTag(this.inputTag);
 return retv;
 }
 }
 
 public 

How to create an archive-file in Java to distribute a MapFile via Distributed Cache

2012-05-04 Thread i...@christianherta.de
Hello,
I have written a chain of map-reduce jobs which creates a Mapfile. I want
to use the Mapfile in a proximate map-reduce job via distributed cache.
Therefore I have to create an archive file of the folder with holds the
/data and /index files.

In the documentation and in the Book Hadoop the definite guide there are
only examples how this is done on the command line. Is this possible in
HDFS via the Hadoop Java Api, too?

P.S.: To distribute the files separately is not a solution. They would go
in different temporary folders.

Thanks in advance
Christian

Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache

2012-05-04 Thread Harsh J
Hi,

The Java API offers a DistributedCache class which lets you do this.
The usage is detailed at
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html

On Fri, May 4, 2012 at 5:11 PM, i...@christianherta.de
i...@christianherta.de wrote:
 Hello,
 I have written a chain of map-reduce jobs which creates a Mapfile. I want
 to use the Mapfile in a proximate map-reduce job via distributed cache.
 Therefore I have to create an archive file of the folder with holds the
 /data and /index files.

 In the documentation and in the Book Hadoop the definite guide there are
 only examples how this is done on the command line. Is this possible in
 HDFS via the Hadoop Java Api, too?

 P.S.: To distribute the files separately is not a solution. They would go
 in different temporary folders.

 Thanks in advance
 Christian



-- 
Harsh J


Re: How to create an archive-file in Java to distribute a MapFile via Distributed Cache

2012-05-04 Thread Shi Yu

My humble experience:  I would prefer specifying the files in 
command line using -files option, then treat them explicitly in 
the Mapper configure or setup function using 

File f1 = new File(file1name);
File f2 = new File(file2name);

Cause I am not 100% sure how does distributed cached determine 
the order of paths (archives) stored in the array.  I once 
messed up at this point so from then on I stick on the old 
method. 


Bad connect ack with firstBadLink

2012-05-04 Thread madhu phatak
Hi,
We are running a three node cluster . From two days whenever we copy file
to hdfs , it is throwing  java.IO.Exception Bad connect ack with
firstBadLink . I searched in net, but not able to resolve the issue. The
following is the stack trace from datanode log

2012-05-04 18:08:08,868 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
blk_-7520371350112346377_50118 received exception java.net.SocketException:
Connection reset
2012-05-04 18:08:08,869 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
172.23.208.17:50010,
storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:662)


It will be great if some one can point to the direction how to solve this
problem.

-- 
https://github.com/zinnia-phatak-dev/Nectar


Re: Bad connect ack with firstBadLink

2012-05-04 Thread Mohit Anchlia
Please see:

http://hbase.apache.org/book.html#dfs.datanode.max.xcievers

On Fri, May 4, 2012 at 5:46 AM, madhu phatak phatak@gmail.com wrote:

 Hi,
 We are running a three node cluster . From two days whenever we copy file
 to hdfs , it is throwing  java.IO.Exception Bad connect ack with
 firstBadLink . I searched in net, but not able to resolve the issue. The
 following is the stack trace from datanode log

 2012-05-04 18:08:08,868 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
 blk_-7520371350112346377_50118 received exception java.net.SocketException:
 Connection reset
 2012-05-04 18:08:08,869 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
 172.23.208.17:50010,
 storageID=DS-1340171424-172.23.208.17-50010-1334672673051, infoPort=50075,
 ipcPort=50020):DataXceiver
 java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
at

 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
at

 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
at java.lang.Thread.run(Thread.java:662)


 It will be great if some one can point to the direction how to solve this
 problem.

 --
 https://github.com/zinnia-phatak-dev/Nectar



Re: Reduce Hangs at 66%

2012-05-04 Thread Michael Segel
Well 
That was one of the things I had asked. 
ulimit -a says it all. 

But you have to do this for the users... hdfs, mapred, and hadoop

(Which is why I asked about which flavor.)

On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:

 Keith
 
 What is the the output for ulimit -n? Your value for number of open files is 
 probably too low.
 
 Raj
 
 
 
 
 
 From: Keith Thompson kthom...@binghamton.edu
 To: common-user@hadoop.apache.org 
 Sent: Thursday, May 3, 2012 4:33 PM
 Subject: Re: Reduce Hangs at 66%
 
 I am not sure about ulimits, but I can answer the rest. It's a Cloudera
 distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce step,
 I am taking keys in the form of (gridID, date), each with a value of 1. The
 reduce step just sums the 1's as the final output value for the key (It's
 counting how many people were in the gridID on a certain day).
 
 I have been running other more complicated jobs with no problem, so I'm not
 sure why this one is being peculiar. This is the code I used to execute the
 program from the command line (the source is a file on the hdfs):
 
 hadoop jar jarfile driver source /thompson/outputDensity/density1
 
 The program then executes the map and gets to 66% of the reduce, then stops
 responding. The cause of the error seems to be:
 
 Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
 The temporary job-output directory
 hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
 doesn't exist!
 
 I don't understand what the _temporary is. I am assuming it's something
 Hadoop creates automatically.
 
 
 
 On Thu, May 3, 2012 at 5:02 AM, Michel Segel 
 michael_se...@hotmail.comwrote:
 
 Well...
 Lots of information but still missing some of the basics...
 
 Which release and version?
 What are your ulimits set to?
 How much free disk space do you have?
 What are you attempting to do?
 
 Stuff like that.
 
 
 
 Sent from a remote device. Please excuse any typos...
 
 Mike Segel
 
 On May 2, 2012, at 4:49 PM, Keith Thompson kthom...@binghamton.edu
 wrote:
 
 I am running a task which gets to 66% of the Reduce step and then hangs
 indefinitely. Here is the log file (I apologize if I am putting too much
 here but I am not exactly sure what is relevant):
 
 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
 Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
 task_201202240659_6433_r_00, for tracker
 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
 Task 'attempt_201202240659_6433_m_01_0' has completed
 task_201202240659_6433_m_01 successfully.
 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
 Error from attempt_201202240659_6432_r_00_0: Task
 attempt_201202240659_6432_r_00_0 failed to report status for 1800
 seconds. Killing!
 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
 Removing task 'attempt_201202240659_6432_r_00_0'
 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
 Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
 tip task_201202240659_6432_r_00, for tracker
 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
 Removing task 'attempt_201202240659_6432_r_00_0'
 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
 Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
 task_201202240659_6432_r_00, for tracker
 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
 Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
 The temporary job-output directory
 hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
 doesn't exist!
 at
 org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
 at
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
 at
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
 at
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 at org.apache.hadoop.mapred.Child.main(Child.java:262)
 
 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
 Removing task 'attempt_201202240659_6432_r_00_1'
 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker:
 Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip
 task_201202240659_6432_r_00, for tracker
 

Re: Bad connect ack with firstBadLink

2012-05-04 Thread Mapred Learn
Check your number of blocks in the cluster.

This also indicates that your datanodes are more full than they should be.

Try deleting unnecessary blocks.

On Fri, May 4, 2012 at 7:40 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Please see:

 http://hbase.apache.org/book.html#dfs.datanode.max.xcievers

 On Fri, May 4, 2012 at 5:46 AM, madhu phatak phatak@gmail.com wrote:

  Hi,
  We are running a three node cluster . From two days whenever we copy file
  to hdfs , it is throwing  java.IO.Exception Bad connect ack with
  firstBadLink . I searched in net, but not able to resolve the issue. The
  following is the stack trace from datanode log
 
  2012-05-04 18:08:08,868 INFO
  org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
  blk_-7520371350112346377_50118 received exception
 java.net.SocketException:
  Connection reset
  2012-05-04 18:08:08,869 ERROR
  org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
  172.23.208.17:50010,
  storageID=DS-1340171424-172.23.208.17-50010-1334672673051,
 infoPort=50075,
  ipcPort=50020):DataXceiver
  java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:168)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
 at java.lang.Thread.run(Thread.java:662)
 
 
  It will be great if some one can point to the direction how to solve this
  problem.
 
  --
  https://github.com/zinnia-phatak-dev/Nectar
 



Re: Reduce Hangs at 66%

2012-05-04 Thread Keith Thompson
Thanks everyone for your help. It is running fine now.


On Fri, May 4, 2012 at 11:22 AM, Michael Segel michael_se...@hotmail.comwrote:

 Well
 That was one of the things I had asked.
 ulimit -a says it all.

 But you have to do this for the users... hdfs, mapred, and hadoop

 (Which is why I asked about which flavor.)

 On May 3, 2012, at 7:03 PM, Raj Vishwanathan wrote:

  Keith
 
  What is the the output for ulimit -n? Your value for number of open
 files is probably too low.
 
  Raj
 
 
 
 
  
  From: Keith Thompson kthom...@binghamton.edu
  To: common-user@hadoop.apache.org
  Sent: Thursday, May 3, 2012 4:33 PM
  Subject: Re: Reduce Hangs at 66%
 
  I am not sure about ulimits, but I can answer the rest. It's a Cloudera
  distribution of Hadoop 0.20.2. The HDFS has 9 TB free. In the reduce
 step,
  I am taking keys in the form of (gridID, date), each with a value of 1.
 The
  reduce step just sums the 1's as the final output value for the key
 (It's
  counting how many people were in the gridID on a certain day).
 
  I have been running other more complicated jobs with no problem, so I'm
 not
  sure why this one is being peculiar. This is the code I used to execute
 the
  program from the command line (the source is a file on the hdfs):
 
  hadoop jar jarfile driver source /thompson/outputDensity/density1
 
  The program then executes the map and gets to 66% of the reduce, then
 stops
  responding. The cause of the error seems to be:
 
  Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
  The temporary job-output directory
  hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
  doesn't exist!
 
  I don't understand what the _temporary is. I am assuming it's something
  Hadoop creates automatically.
 
 
 
  On Thu, May 3, 2012 at 5:02 AM, Michel Segel michael_se...@hotmail.com
 wrote:
 
  Well...
  Lots of information but still missing some of the basics...
 
  Which release and version?
  What are your ulimits set to?
  How much free disk space do you have?
  What are you attempting to do?
 
  Stuff like that.
 
 
 
  Sent from a remote device. Please excuse any typos...
 
  Mike Segel
 
  On May 2, 2012, at 4:49 PM, Keith Thompson kthom...@binghamton.edu
  wrote:
 
  I am running a task which gets to 66% of the Reduce step and then
 hangs
  indefinitely. Here is the log file (I apologize if I am putting too
 much
  here but I am not exactly sure what is relevant):
 
  2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker:
  Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip
  task_201202240659_6433_r_00, for tracker
  'tracker_analytix7:localhost.localdomain/127.0.0.1:56515'
  2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress:
  Task 'attempt_201202240659_6433_m_01_0' has completed
  task_201202240659_6433_m_01 successfully.
  2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress:
  Error from attempt_201202240659_6432_r_00_0: Task
  attempt_201202240659_6432_r_00_0 failed to report status for 1800
  seconds. Killing!
  2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
  Removing task 'attempt_201202240659_6432_r_00_0'
  2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker:
  Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to
  tip task_201202240659_6432_r_00, for tracker
  'tracker_analytix4:localhost.localdomain/127.0.0.1:44204'
  2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker:
  Removing task 'attempt_201202240659_6432_r_00_0'
  2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker:
  Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip
  task_201202240659_6432_r_00, for tracker
  'tracker_analytix5:localhost.localdomain/127.0.0.1:59117'
  2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress:
  Error from attempt_201202240659_6432_r_00_1: java.io.IOException:
  The temporary job-output directory
  hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary
  doesn't exist!
  at
 
 org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
  at
 
 org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240)
  at
 
 org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
  at
  org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
  at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
  at org.apache.hadoop.mapred.Child.main(Child.java:262)
 
  2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker:
  

Fwd: May 21 talk at Pasadena JUG

2012-05-04 Thread Mattmann, Chris A (388J)
(apologies for cross posting)

Hey Folks in the SoCal area -- if you're around on May 21st, I'll be speaking 
at the Pasadena JUG on Apache OODT,
Big Data and likely Apache Hadoop (in prep for my Hadoop Summit coming talk).

Info is below thanks to David Noble for setting this up!

Cheers,
Chris

Begin forwarded message:

The announcement is up on the Meetup site and the Pasadena JUG website, and has 
been sent to mailing lists for the Pasadena JUG, LA JUG, and OC JUG.

If you invite people, please do encourage them to RSVP on the Meetup site. It's 
useful to make sure we have enough food, but also to make sure we set up the 
right room. Last month's talk on Mule  MongoDB had 55 people RSVP (and 
probably more attend) and we had to bump up to a larger room than usual. 
Fortunately Idealab is equipped for that size group :-)

http://www.meetup.com/pasadenajug/
http://www.pasadenajug.org/

I'll follow up with the Apache lists in the next day or so, unless you beat me 
to it.



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.govmailto:chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Need to improve documentation for v 0.23.x ( v 2.x)

2012-05-04 Thread Jagat
Hello All,

As Apache Hadoop community is ready to release the next 2.0 alpha version
of Hadoop , i would like to bring attention towards need to make better
documentation of the tutorials and examples for the same.

Just one short example

See the Single Node Setup tutorials for v
1.xhttp://hadoop.apache.org/common/docs/r1.0.2/single_node_setup.htmland
v
0.23http://hadoop.apache.org/common/docs/r0.23.1/hadoop-yarn/hadoop-yarn-site/SingleCluster.html,
you can say 0.23  author is in hurry with keeping all things in
assumption that reader knows everything what and where to do.

We should spend some time on documentation , with so many beautiful
features coming it would be great if you guys plan some special hackathon
meetings to improve its documentation , code examples so that people can
understand how to use them effectively.

At present only two people can understand 0.23 , those who wrote the code
and the other one is java compiler who is verifying its code :)

*Tom White , *I request if you are reading this message , please pick-up
your pen again to write Hadoop Definitive Guide edition 4th dedicated to
next release for greater benefit of community.

Thanks


Re: How to add debugging to map- red code

2012-05-04 Thread Mapred Learn
Hi Harsh,
Could you show one sample of how to do this ?

I have not seen/written  any mapper code where people use log4j logger or
log4j file to set the log level.

Thanks in advance
-JJ

On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote:

 Doing (ii) would be an isolated app-level config and wouldn't get
 affected by the toggling of
 (i). The feature from (i) is available already in CDH 4.0.0-b2 btw.

 On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
  Hi Harsh,
 
  Does doing (ii) mess up with hadoop (i) level ?
 
  Or does it happen in both the options anyways ?
 
 
  Thanks,
  -JJ
 
  On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote:
 
  Yes this is possible, and there's two ways to do this.
 
  1. Use a distro/release that carries the
  https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will let
  you avoid work (see 2, which is same as your idea)
 
  2. Configure your implementation's logger object's level in the
  setup/setConf methods of the task, by looking at some conf prop to
  decide the level. This will work just as well - and will also avoid
  changing Hadoop's own Child log levels, unlike the (1) method.
 
  On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com
  wrote:
   Hi,
   I m trying to find out best way to add debugging in map- red code.
   I have System.out.println() statements that I keep on commenting and
  uncommenting so as not to increase stdout size
  
   But problem is anytime I need debug, I Hv to re-compile.
  
   If there a way, I can define log levels using log4j in map-red code
 and
  define log level as conf option ?
  
   Thanks,
   JJ
  
   Sent from my iPhone
 
 
 
  --
  Harsh J
 



 --
 Harsh J



Re: How to add debugging to map- red code

2012-05-04 Thread Nitin Pawar
here is a sample code from log4j documentation
if you want to specify a specific file where you want to write the log ..
you can have a log4j properties file and add it to the classpath

 import com.foo.Bar;

 // Import log4j classes.
 *import org.apache.log4j.Logger;
 import org.apache.log4j.BasicConfigurator;*

 public class MyApp {

   // Define a static logger variable so that it references the
   // Logger instance named MyApp.
   *static* Logger logger = *Logger.getLogger(MyApp.class);*

   public static void main(String[] args) {

 // Set up a simple configuration that logs on the console.
 *BasicConfigurator.configure();*

 logger.info(Entering application.);
 Bar bar = new Bar();
 bar.doIt();
 logger.info(Exiting application.);
   }
 }


On Sat, May 5, 2012 at 3:40 AM, Mapred Learn mapred.le...@gmail.com wrote:

 Hi Harsh,
 Could you show one sample of how to do this ?

 I have not seen/written  any mapper code where people use log4j logger or
 log4j file to set the log level.

 Thanks in advance
 -JJ

 On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote:

  Doing (ii) would be an isolated app-level config and wouldn't get
  affected by the toggling of
  (i). The feature from (i) is available already in CDH 4.0.0-b2 btw.
 
  On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com
  wrote:
   Hi Harsh,
  
   Does doing (ii) mess up with hadoop (i) level ?
  
   Or does it happen in both the options anyways ?
  
  
   Thanks,
   -JJ
  
   On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote:
  
   Yes this is possible, and there's two ways to do this.
  
   1. Use a distro/release that carries the
   https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will
 let
   you avoid work (see 2, which is same as your idea)
  
   2. Configure your implementation's logger object's level in the
   setup/setConf methods of the task, by looking at some conf prop to
   decide the level. This will work just as well - and will also avoid
   changing Hadoop's own Child log levels, unlike the (1) method.
  
   On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com
 
   wrote:
Hi,
I m trying to find out best way to add debugging in map- red code.
I have System.out.println() statements that I keep on commenting and
   uncommenting so as not to increase stdout size
   
But problem is anytime I need debug, I Hv to re-compile.
   
If there a way, I can define log levels using log4j in map-red code
  and
   define log level as conf option ?
   
Thanks,
JJ
   
Sent from my iPhone
  
  
  
   --
   Harsh J
  
 
 
 
  --
  Harsh J
 




-- 
Nitin Pawar


Re: How to add debugging to map- red code

2012-05-04 Thread Mapred Learn
Thanks Nitin but I was asking in context to mapper code..

Sent from my iPhone

On May 4, 2012, at 9:06 PM, Nitin Pawar nitinpawar...@gmail.com wrote:

 here is a sample code from log4j documentation
 if you want to specify a specific file where you want to write the log ..
 you can have a log4j properties file and add it to the classpath
 
 import com.foo.Bar;
 
 // Import log4j classes.
 *import org.apache.log4j.Logger;
 import org.apache.log4j.BasicConfigurator;*
 
 public class MyApp {
 
   // Define a static logger variable so that it references the
   // Logger instance named MyApp.
   *static* Logger logger = *Logger.getLogger(MyApp.class);*
 
   public static void main(String[] args) {
 
 // Set up a simple configuration that logs on the console.
 *BasicConfigurator.configure();*
 
 logger.info(Entering application.);
 Bar bar = new Bar();
 bar.doIt();
 logger.info(Exiting application.);
   }
 }
 
 
 On Sat, May 5, 2012 at 3:40 AM, Mapred Learn mapred.le...@gmail.com wrote:
 
 Hi Harsh,
 Could you show one sample of how to do this ?
 
 I have not seen/written  any mapper code where people use log4j logger or
 log4j file to set the log level.
 
 Thanks in advance
 -JJ
 
 On Thu, May 3, 2012 at 4:32 PM, Harsh J ha...@cloudera.com wrote:
 
 Doing (ii) would be an isolated app-level config and wouldn't get
 affected by the toggling of
 (i). The feature from (i) is available already in CDH 4.0.0-b2 btw.
 
 On Fri, May 4, 2012 at 4:58 AM, Mapred Learn mapred.le...@gmail.com
 wrote:
 Hi Harsh,
 
 Does doing (ii) mess up with hadoop (i) level ?
 
 Or does it happen in both the options anyways ?
 
 
 Thanks,
 -JJ
 
 On Fri, Apr 20, 2012 at 8:28 AM, Harsh J ha...@cloudera.com wrote:
 
 Yes this is possible, and there's two ways to do this.
 
 1. Use a distro/release that carries the
 https://issues.apache.org/jira/browse/MAPREDUCE-336 fix. This will
 let
 you avoid work (see 2, which is same as your idea)
 
 2. Configure your implementation's logger object's level in the
 setup/setConf methods of the task, by looking at some conf prop to
 decide the level. This will work just as well - and will also avoid
 changing Hadoop's own Child log levels, unlike the (1) method.
 
 On Fri, Apr 20, 2012 at 8:47 PM, Mapred Learn mapred.le...@gmail.com
 
 wrote:
 Hi,
 I m trying to find out best way to add debugging in map- red code.
 I have System.out.println() statements that I keep on commenting and
 uncommenting so as not to increase stdout size
 
 But problem is anytime I need debug, I Hv to re-compile.
 
 If there a way, I can define log levels using log4j in map-red code
 and
 define log level as conf option ?
 
 Thanks,
 JJ
 
 Sent from my iPhone
 
 
 
 --
 Harsh J
 
 
 
 
 --
 Harsh J
 
 
 
 
 
 -- 
 Nitin Pawar