On Wed, Apr 11, 2012 at 5:14 AM, Tom Wilcox <[email protected]> wrote: > 1) Removed all references to HADOOP_CLASSPATH in hadoop-env.sh and replaced > with the following so that any initial HADOOP_CLASSPATH settings have > precedence: > > # Extra Java CLASSPATH elements. Optional. > export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$ZOOKEEPER_INSTALL/*" > export HADOOP_CLASSPATH="$HADOOP_CLASSPATH:$PIGDIR/*" >
Above you are including a version that is probably different from hbase's and its being stuck ahead of ours on the classpath IIRC. Not sure why this would give you the behavior you are seeing. I'd have thought it'd have made no difference. Could it be that your hbase is homed at different locations up in zk and you are picking up an old home because you are picking up an old config? (It doesn't looks so when I look at your pastebins -- you seem to have same ensemble in each case w/ same /zookeeper_data homedir). Different zk instances up for each test? I'm a little baffled. > 2) Ran the job with the following (so that HADOOP_CLASSPATH contained all > appropriate HBase API jars): > > HADOOP_CLASSPATH=`hbase classpath` hadoop jar SampleUploader.jar > uk.org.cse.ingestion.SampleUploader sample.10.csv tomstable dat no > > We are now dealing with the following error: > > [sshexec] org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not > find any valid local directory for > taskTracker/hadoop1/distcache/-6735763131868259398_188156722_559071878/namenode/tmp/mapred/staging/hadoop1/.staging/job_201204111219_0013/libjars/hbase-0.95-SNAPSHOT.jar > [sshexec] at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) > [sshexec] at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) > [sshexec] at > org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:172) > [sshexec] at > org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187) > [sshexec] at > org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1212) > [sshexec] at java.security.AccessController.doPrivileged(Native Method) > [sshexec] at javax.security.auth.Subject.doAs(Subject.java:396) > [sshexec] at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > [sshexec] at > org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1203) > [sshexec] at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1118) > [sshexec] at > org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2430) > [sshexec] at java.lang.Thread.run(Thread.java:662) > [sshexec] > These dirs set up out on your cluster? Google it. There's a couple of possible explanations. You might go review how to package a jar for mapreduce. It can be a little tricky to get right. Best to ship in the job jar all of its dependencies and keep your cluster CLASSPATH clean. See the trick where the hbase mapreduce jobs pull in jars of the CLASSPATH that its needs down in TableMapReduceUtil#addDependencyJars. Perhaps review too the hbase story on mapreduce and CLASSPATHing: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath Good luck lads, St.Ack
