Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Ryan Rawson Mon, 20 Sep 2010 14:11:20 -0700

Ok that looks good.  Sometimes when you successively build and chain
classpaths you can accidently overwrite the previous ones.  But we are
looking fine here.


What version of java is hadoop running under?  We are compiling our
HBase jars using java6, so that is another source of potential
incompatibilities...

Do you have any custom changes to any of the bin/* scripts in hadoop?

What else can you tell us about your environment?


On Mon, Sep 20, 2010 at 2:00 PM, Taylor, Ronald C <[email protected]> wrote:
>
>
> Found it -
> http://pastebin.com/SfFYSLJy
>
>
> -----Original Message-----
> From: Ryan Rawson [mailto:[email protected]]
> Sent: Monday, September 20, 2010 1:50 PM
> To: Taylor, Ronald C
> Cc: [email protected]; [email protected]; [email protected]; 
> Ronald Taylor; Witteveen, Tim
> Subject: Re: Guava*.jar use - hadoop-hbase communications failure - the 
> hbase*.jar classes apparently not being found by Hadoop
>
> Hey,
>
> yes, the symlink is a pretty good way to be able to inplace upgrade easily.  
> But still, normally those other jars are in another subdir so their full path 
> should be:
> /home/hbase/hbase/lib/log4j-1.2.16.jar
>
> the hbase scripts rely on those paths to build the classpath, so dont 
> rearrange the dir layout too much.
>
> As for the pastebin you will need to send us your direct link, since so many 
> people post and there isnt really good searching systems, its generally 
> preferred to send the direct link to your pastebin.  If you ever interact 
> with us on IRC this is also how we get big dumps done as well.
>
> Thanks!
> -ryan
>
> On Mon, Sep 20, 2010 at 1:38 PM, Taylor, Ronald C <[email protected]> 
> wrote:
>> Ryan,
>>
>> The hbase*.jar is in the root hbase directory (at /home/hbase/hbase). Now, 
>> that is symbolic link on all the nodes (as you can see below), but that 
>> should not matter, right?
>>
>> l...@h01 hbase]$ pwd
>> /home/hbase
>> [rtay...@h01 hbase]$ ls -l
>> lrwxrwxrwx  1 root  hadoop    19 Aug 26 08:47 hbase ->
>> hbase-0.89.20100726 drwxr-xr-x  9 hbase hadoop  4096 Sep 18 22:54
>> hbase-0.89.20100726
>> [rtay...@h01 hbase]$
>>
>>
>> Anyhoo, I just put the hbase-env.sh file on pastebin.com. Please take a 
>> look. I posted it under the title:
>>
>> "Ronald Taylor / hadoop-env.sh file - HBase-Hadoop hbase*.jar problem"
>>
>> This is the first time I've used pastebin.com, so hopefully I uploaded 
>> properly. Please let me know if not.
>>
>> I don't think I mispelled anything on the HADOOP_CLASSPATH line (I just 
>> verified file existence based on those spellings, see below "ls" listings), 
>> but very happy to have an expert take a look.
>>  Ron
>>
>>
>> export
>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.2
>> 0100726.jar:/home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zooke
>> eper-3.3.1.jar
>>
>>
>> [rtay...@h01 conf]$ ls /home/hbase/hbase/conf
>> hadoop-metrics.properties  hbase-default_with_RT_mods.xml
>> hbase-env.sh    hbase-site.xml.psuedo-distributed.template
>> regionservers hbase-default_ORIG.xml     hbase-default.xml
>> hbase-site.xml  log4j.properties                            tohtml.xsl
>>
>> [rtay...@h01 conf]$ ls /home/hbase/hbase/hbase-0.89.20100726.jar
>> /home/hbase/hbase/hbase-0.89.20100726.jar
>> [rtay...@h01 conf]$
>>
>> [rtay...@h01 conf]$ ls /home/hbase/hbase/log4j-1.2.16.jar
>> /home/hbase/hbase/log4j-1.2.16.jar
>> [rtay...@h01 conf]$
>>
>> [rtay...@h01 conf]$ ls /home/hbase/hbase/zookeeper-3.3.1.jar
>> /home/hbase/hbase/zookeeper-3.3.1.jar
>> [rtay...@h01 conf]$
>>
>>
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:[email protected]]
>> Sent: Monday, September 20, 2010 1:17 PM
>> To: Taylor, Ronald C
>> Cc: [email protected]; [email protected];
>> [email protected]; Ronald Taylor; Witteveen, Tim
>> Subject: Re: Guava*.jar use - hadoop-hbase communications failure -
>> the hbase*.jar classes apparently not being found by Hadoop
>>
>> Hey,
>>
>> If you could, perhaps you could paste up your hadoop-env.sh on pastebin.com? 
>>  That would help... sometimes I have made errors in the bash shell trickery, 
>> and it probably would help to get more eyes checking it out.
>>
>> Normally in the stock hbase distro the Hbase JAR is in the root hbase dir, 
>> and the other jars in the lib/ sub directory, am I correct to assume you've 
>> moved the jars around a bit?
>>
>> Good luck,
>> -ryan
>>
>> On Mon, Sep 20, 2010 at 1:14 PM, Taylor, Ronald C <[email protected]> 
>> wrote:
>>>
>>> Hello Ryan, Dave, other developers,
>>>
>>> Have not fixed the problem. Here's where things stand:
>>>
>>> 1) As Ryan suggested, we have checked all the nodes to make sure that we 
>>> copied over the hadoop-env.sh file with the HADOOP_CLASSPATH setting, set 
>>> like so:
>>>
>>> export
>>> HADOOP_CLASSPATH=/home/hbase/hbase/conf:/home/hbase/hbase/hbase-0.89.
>>> 2
>>> 0100726.jar:
>>> /home/hbase/hbase/log4j-1.2.16.jar:/home/hbase/hbase/zookeeper-3.3.1.
>>> j
>>> ar
>>>
>>> Answer: yep, that was OK, the files are there. We also restarted Hadoop and 
>>> Hbase again. No change - program still fails on not finding  the 
>>> TableOutputFormat class.
>>>
>>> 2) Following Dave's advice of avoiding the problem by not using 
>>> TableOutputFormat (by skipping the Reducer stage), I tried a variant of 
>>> that. I kept the Reducer stage in, but changed it to output to a file, 
>>> instead of an Hbase table.
>>>
>>> That did not work either. I tried running the new program from the hadoop 
>>> acct and now get a msg (from the Mapper stage, I believe) saying that the 
>>> hbase.mapreduce.TableMapper class cannot be found. So - it is not just 
>>> TableOutputFormat class - it is all the classes in the hbase*.jar file that 
>>> are not being found.
>>>
>>> Does this have anything to do with the guava*.jar file that Ryan mentioned, 
>>> which (as far as I can tell) we don't have installed?
>>>
>>> Obviously, we need more help.
>>>
>>> In the meantime, as a stop-gap, I'm planning on writing our analysis 
>>> programs this way:
>>>
>>> 1) extract data from the source Hbase table and store in an HDFS
>>> file, all data needed for analysis contained independently on each
>>> row - this task to be done by a non-MapReduce class that can access
>>> Hbase tables
>>>
>>> 2) call an MapReduce class that will process the file in parallel and
>>> return an new file (well, a directory of files which I'll combine
>>> into
>>> one) as output
>>>
>>> 3) write the contents of the new results file back into an Hbase
>>> table using another non-MapReduce class
>>>
>>> I presume this will work, but again, obviously, it's not optimal and we 
>>> need to resolve this issue so MapReduce classes can access Hbase tables 
>>> directly on our cluster.
>>>
>>> Does anybody have any advice?
>>>  Cheers,
>>>   Ron
>>>
>>> ___________________________________________
>>> Ronald Taylor, Ph.D.
>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>> National Laboratory
>>> 902 Battelle Boulevard
>>> P.O. Box 999, Mail Stop J4-33
>>> Richland, WA  99352 USA
>>> Office:  509-372-6568
>>> Email: [email protected]
>>>
>>>
>>> -----Original Message-----
>>> From: Buttler, David [mailto:[email protected]]
>>> Sent: Monday, September 20, 2010 10:17 AM
>>> To: [email protected]; '[email protected]'
>>> Subject: RE: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> I find it is often faster to skip the reduce phase when updating rows in 
>>> hbase.  (A trick I picked up from Ryan) Essentially, you read a row from 
>>> hbase, do your processing, and write the row back to hbase.
>>> The only time you would want to do the reduce phase is if there is some 
>>> aggregation that you need, or if there is some output you want to skip 
>>> (e.g. you have a zipfian distribution and you want to ignore the low count 
>>> occurrences).
>>>
>>> Dave
>>>
>>> -----Original Message-----
>>> From: Taylor, Ronald C
>>> Sent: Sunday, September 19, 2010 9:59 PM
>>> To: 'Ryan Rawson'; [email protected];
>>> [email protected]
>>> Cc: Taylor, Ronald C; 'Ronald Taylor'; Witteveen, Tim
>>> Subject: RE: Guava*.jar use - hadoop-hbase failure - could use some
>>> help, a class is apparently not being found by Hadoop
>>>
>>>
>>> Ryan,
>>>
>>> Thanks for the quick feedback. I will check the other nodes on the cluster 
>>> to see if they have been properly updated.
>>>
>>> However, I am now really confused as to use of the guava*.jar file that you 
>>> talk about. This is the first time I've heard about this. I presume we are 
>>> talking about a jar file packaging the guava libraries from Google?
>>>
>>> I cannot find this guava*.jar in either the /home/hadoop/hadoop directory 
>>> or in the /home/hbase/hbase directories, where the Hadoop and Hbase 
>>> installs place the other *.jar files. I'm afraid that I don't even know 
>>> where we should have downloaded it. Does it come with Hbase, or with 
>>> Hadoop? Where should it have been placed, after installation? Should I now 
>>> download it - since we appear to be missing it - from here?
>>>  http://code.google.com/p/guava-libraries/downloads/list
>>>
>>> I Googled and found issue HBASE-2714 (Remove Guava as a client
>>> dependency, June 11 2010) here
>>>
>>> http://www.mail-archive.com/[email protected]/msg00950.html
>>> (see below, where I've included the text)
>>>
>>> which appears to say that Hbase (at least *some* release of Hbase -
>>> does this include 0.89?) has a dependency on Guava, in order to run a
>>> MapReduce job over Hbase. But nothing on Guava is mentioned at
>>>
>>>
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapr
>>> e
>>> duce/package-summary.html#classpath
>>>
>>> (I cannot find anything in the Hbase 0.89 online documents on Guava
>>> or in how to set CLASSPATH or in what *.jar files to include so I can
>>> use MapReduce with Hbase; the best guidance I can find is in this
>>> earlier
>>> document.)
>>>
>>> So - I could really use further clarification in regard to Guava as to what 
>>> I should be doing to set up Hbase-MapReduce work.
>>>
>>>  Regards,
>>>   Ron
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>> From
>>>
>>> http://www.mail-archive.com/[email protected]/msg00950.html
>>>
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Why not?
>>>
>>> In theory, the new TableMapReduceUtil.addDependencyJars should take care of 
>>> shipping it in the distributedcache. Apparently it's not working?
>>>
>>> ryan rawson commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> not everyone uses that mechanism to run map reduce jobs on hbase.  The 
>>> standard for a long time was to add hbase.jar and zookeeper-3.2.2.jar to 
>>> the hadoop classpath, thus not requiring every job include the hbase jars.
>>>
>>> Todd Lipcon commented on HBASE-2714:
>>> ------------------------------------
>>>
>>> Does this mean in general that we can't add more dependencies to the
>>> hbase client? I think instead we should make it easier to run hbase
>>> MR jobs *without* touching the Hadoop config (eg right now you have
>>> to restart MR to upgrade hbase, that's not going to fly for a lot of
>>> clusters)
>>>
>>> stack commented on HBASE-2714:
>>> ------------------------------
>>>
>>> So, we need to change our recommendations here:
>>> http://hbase.apache.org/docs/r0.20.4/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpath?
>>>
>>>
>>>> Remove Guava as a client dependency
>>>> -----------------------------------
>>>>
>>>>                 Key: HBASE-2714
>>>>                 URL:
>>>> https://issues.apache.org/jira/browse/HBASE-2714
>>>>             Project: HBase
>>>>          Issue Type: Improvement
>>>>          Components: client
>>>>            Reporter: Jeff Hammerbacher
>>>>
>>>> We shouldn't need Guava on the classpath to run a MapReduce job over HBase.
>>>
>>>
>>> %%%%%%%%%%%%%%%%%%%%%%%%
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Ryan Rawson [mailto:[email protected]]
>>> Sent: Sunday, September 19, 2010 12:45 AM
>>> To: [email protected]
>>> Cc: [email protected]; Taylor, Ronald C
>>> Subject: Re: hadoop-hbase failure - could use some help, a class is
>>> apparently not being found by Hadoop
>>>
>>> hey,
>>>
>>> looks like you've done all the right things... you might want to double 
>>> check that all the 'slave' machines have the updated hadoop-env.sh and that 
>>> the path referenced therein is present _on all the machines_.
>>>
>>> You also need to include the guava*.jar as well.  the log4j is already 
>>> included by mapred by default, so no need there.
>>>
>>> -ryan
>>>
>>>
>>>
>>> On Fri, Sep 17, 2010 at 4:19 PM, Taylor, Ronald C <[email protected]> 
>>> wrote:
>>>>
>>>> Hi folks,
>>>>
>>>> Got a problem in basic Hadoop-Hbase communication. My small test
>>>> program ProteinCounter1.java - shown in full below - reports out
>>>> this error
>>>>
>>>>   java.lang.RuntimeException: java.lang.ClassNotFoundException:
>>>> org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>        at
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809
>>>> )
>>>>
>>>> The full invocation and error msgs are shown at bottom.
>>>>
>>>> We are using Hadoop 20.2 with HBase0.89.2010726 on a 24-node cluster. 
>>>> Hadoop and Hbase each appears to work fine separately. That is, I've 
>>>> created programs that run MapReduce on files, and programs that import 
>>>> data into Hbase tables and manipulate such. Both types of programs have 
>>>> gone quite smoothly.
>>>>
>>>> Now I want to combine the two - use MapReduce programs on data drawn from 
>>>> an Hbase table, with results placed back into an Hbase table.
>>>>
>>>> But my test program for such, as you see from the error msg, is not
>>>> working. Apparently the
>>>>   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
>>>>  class is not found.
>>>>
>>>> However, I have added these paths, including the relevant Hbase *.jar, to 
>>>> HADOOP_CLASSPATH, so the missing class should have been found, as you can 
>>>> see:
>>>>
>>>>  export HADOOP_CLASSPATH=/home/hbase/hbase/conf:
>>>> /home/hbase/hbase/hbase-0.89.20100726.jar:
>>>> /home/rtaylor/HadoopWork/log4j-1.2.16.jar:
>>>> /home/rtaylor/HadoopWork/zookeeper-3.3.1.jar
>>>>
>>>>  This change was made in the ../hadoop/conf/hadoop-env.sh file.
>>>>
>>>> I checked the manifest of /home/hbase/hbase/hbase-0.89.20100726.jar
>>>> and
>>>>    org/apache/hadoop/hbase/mapreduce/TableOutputFormat.class
>>>>  is indeed present that Hbase *.jar file.
>>>>
>>>> Also, I have restarted both Hbase and Hadoop after making this change.
>>>>
>>>> Don't understand why the TableOutputFormat class is not being found. Or is 
>>>> the error msg misleading, and something else is going wrong? I would very 
>>>> much appreciate any advice people have as to what is going wrong. Need to 
>>>> get this working very soon.
>>>>
>>>>   Regards,
>>>>     Ron T.
>>>>
>>>> ___________________________________________
>>>> Ronald Taylor, Ph.D.
>>>> Computational Biology & Bioinformatics Group Pacific Northwest
>>>> National Laboratory
>>>> 902 Battelle Boulevard
>>>> P.O. Box 999, Mail Stop J4-33
>>>> Richland, WA  99352 USA
>>>> Office:  509-372-6568
>>>> Email: [email protected]
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>> %
>>>> %
>>>> %%%%%%%%%%%%
>>>>
>>>> contents of the "ProteinCounter1.java" file:
>>>>
>>>>
>>>>
>>>> //  to compile
>>>> // javac ProteinCounter1.java
>>>> // jar cf ProteinCounterTest.jar  *.class
>>>>
>>>> // to run
>>>> //   hadoop jar ProteinCounterTest.jar ProteinCounter1
>>>>
>>>>
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
>>>> import org.apache.hadoop.mapreduce.Job; import
>>>> org.apache.hadoop.io.IntWritable;
>>>>
>>>> import java.util.*;
>>>> import java.io.*;
>>>> import org.apache.hadoop.hbase.*;
>>>> import org.apache.hadoop.hbase.client.*; import
>>>> org.apache.hadoop.hbase.io.*; import org.apache.hadoop.hbase.util.*;
>>>> import org.apache.hadoop.hbase.mapreduce.*;
>>>>
>>>>
>>>> // %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>>>> /**
>>>>  * counts the number of times each protein appears in the
>>>> proteinTable
>>>>  *
>>>>  */
>>>> public class ProteinCounter1 {
>>>>
>>>>
>>>>    static class ProteinMapper1 extends
>>>> TableMapper<ImmutableBytesWritable, IntWritable> {
>>>>
>>>>        private int numRecords = 0;
>>>>        private static final IntWritable one = new IntWritable(1);
>>>>
>>>>       �...@override
>>>>            public void map(ImmutableBytesWritable row, Result
>>>> values, Context context) throws IOException {
>>>>
>>>>            // retrieve the value of proteinID, which is the row key
>>>> for each protein in the proteinTable
>>>>            ImmutableBytesWritable proteinID_Key = new
>>>> ImmutableBytesWritable(row.get());
>>>>            try {
>>>>                context.write(proteinID_Key, one);
>>>>            } catch (InterruptedException e) {
>>>>                throw new IOException(e);
>>>>            }
>>>>            numRecords++;
>>>>            if ((numRecords % 100) == 0) {
>>>>                context.setStatus("mapper processed " + numRecords + "
>>>> proteinTable records so far");
>>>>            }
>>>>        }
>>>>    }
>>>>
>>>>    public static class ProteinReducer1 extends
>>>> TableReducer<ImmutableBytesWritable,
>>>>                                               IntWritable,
>>>> ImmutableBytesWritable> {
>>>>
>>>>        public void reduce(ImmutableBytesWritable proteinID_key,
>>>> Iterable<IntWritable> values,
>>>>                            Context context)
>>>>            throws IOException, InterruptedException {
>>>>            int sum = 0;
>>>>            for (IntWritable val : values) {
>>>>                sum += val.get();
>>>>            }
>>>>
>>>>            Put put = new Put(proteinID_key.get());
>>>>            put.add(Bytes.toBytes("resultFields"),
>>>> Bytes.toBytes("total"), Bytes.toBytes(sum));
>>>>            System.out.println(String.format("stats : proteinID_key :
>>>> %d, count : %d",
>>>>
>>>> Bytes.toInt(proteinID_key.get()), sum));
>>>>            context.write(proteinID_key, put);
>>>>        }
>>>>    }
>>>>
>>>>    public static void main(String[] args) throws Exception {
>>>>
>>>>        org.apache.hadoop.conf.Configuration conf;
>>>>           conf =
>>>> org.apache.hadoop.hbase.HBaseConfiguration.create();
>>>>
>>>>        Job job = new Job(conf, "HBaseTest_Using_ProteinCounter");
>>>>        job.setJarByClass(ProteinCounter1.class);
>>>>
>>>>        org.apache.hadoop.hbase.client.Scan scan = new Scan();
>>>>
>>>>        String colFamilyToUse = "proteinFields";
>>>>        String fieldToUse = "Protein_Ref_ID";
>>>>
>>>>        // retreive this one column from the specified family
>>>>        scan.addColumn(Bytes.toBytes(colFamilyToUse),
>>>> Bytes.toBytes(fieldToUse));
>>>>
>>>>           org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter
>>>> filterToUse =
>>>>                 new
>>>> org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter();
>>>>        scan.setFilter(filterToUse);
>>>>
>>>>        TableMapReduceUtil.initTableMapperJob("proteinTable", scan,
>>>> ProteinMapper1.class,
>>>>                              ImmutableBytesWritable.class,
>>>>                                              IntWritable.class,
>>>> job);
>>>>        TableMapReduceUtil.initTableReducerJob("testTable",
>>>> ProteinReducer1.class, job);
>>>>        System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>>    }
>>>> }
>>>>
>>>>
>>>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>>>
>

Re: Pastebin page - hadoop-hbase communications failure - the hbase*.jar classes apparently not being found by Hadoop

Reply via email to