Ok, i found a tutorial for this myself. For everybody who ran into the problem: here is a tutorial explaining WriteableComparable types.

http://developer.yahoo.com/hadoop/tutorial/module5.html


Am 09.08.2012 17:14, schrieb Björn-Elmar Macek:
Ah ok, i got the idea: i can use the abstract class instead of the low level interface, though i am not sure, how to use it. It would just be nice, if complexer mechanics like the sorting would have an up-to-date tutorial with some example code. If i find the time, i will make one, since i want to make a presentation for Hadoop anyways.

Thanks for your help! I will try to use the abstract class.


Am 09.08.2012 17:03, schrieb Björn-Elmar Macek:
Hi Bertrand,

i am using RawComperator because this one was used in the tutorial of some famous (hadoop) guy describing how to sort the input for the reducer. Is there an easier alternative?


Am 09.08.2012 16:57, schrieb Bertrand Dechoux:
I am just curious but are you using Writable? If so there is a WritableComparator... If you are going to interpret every bytes (you create a String, so you do), there no clear reason for choosing such a low level API.

Regards

Bertrand

On Thu, Aug 9, 2012 at 4:47 PM, Björn-Elmar Macek <[email protected] <mailto:[email protected]>> wrote:

    Hi again,

    this is an direct response to my previous posting with the title
    "Logs cannot be created", where logs could not be created (Spill
    failed). I got the hint, that i gotta check privileges, but that
    was not the problem, because i own the folders that were used
    for this.

    I finally found an important hint in a log saying:
    12/08/09 15:30:49 WARN mapred.JobClient: Error reading task
    
outputhttp://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stdout
    
<http://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stdout>
    12/08/09 15:30:49 WARN mapred.JobClient: Error reading task
    
outputhttp://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stderr
    
<http://its-cs229.its.uni-kassel.de:50060/tasklog?plaintext=true&attemptid=attempt_201208091516_0001_m_000048_0&filter=stderr>
    12/08/09 15:34:34 INFO mapred.JobClient: Task Id :
    attempt_201208091516_0001_m_000055_0, Status : FAILED
    java.io.IOException: Spill failed
            at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
            at
    
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
            at
    uni.kassel.macek.rtprep.RetweetMapper.map(RetweetMapper.java:26)
            at
    uni.kassel.macek.rtprep.RetweetMapper.map(RetweetMapper.java:12)
            at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
            at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
            at java.security.AccessController.doPrivileged(Native
    Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at
    
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
            at org.apache.hadoop.mapred.Child.main(Child.java:249)
    Caused by: java.lang.NumberFormatException: For input string: ""
            at
    
java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
            at java.lang.Integer.parseInt(Integer.java:468)
            at java.lang.Integer.parseInt(Integer.java:497)
            at uni.kassel.macek.rtprep.Tweet.getRT(Tweet.java:126)
            at
    
uni.kassel.macek.rtprep.TwitterValueGroupingComparator.compare(TwitterValueGroupingComparator.java:47)
            at
    org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1111)
            at
    org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
            at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
            at
    
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399)
            at
    
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
            at
    
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)



    corresponding to the following lines of code within the class
    TwitterValueGroupingComparator:

    public class TwitterValueGroupingComparator implements
    RawComparator<Text> {
    ...
        public int compare(byte[] text1, int start1, int length1,
    byte[] text2,
            int start2, int length2) {

        byte[] tweet1 = new byte[length1];// length1-1 (???)
        byte[] tweet2 = new byte[length2];// length1-1 (???)

        System.arraycopy(text1, start1, tweet1, 0, length1);//
    start1+1 (???)
        System.arraycopy(text2, start2, tweet2, 0, length2);//
    start2+1 (???)

        Tweet atweet1 = new Tweet(new String(tweet1));
        Tweet atweet2 = new Tweet(new String(tweet2));


        String key1 = atweet1.getAuthor();
        String key2 = atweet2.getAuthor();
    ////////////////////////////////////////////////////////////////
    //THE FOLLOWING LINE IS THE ONE MENTIONED IN THE LOG (47)
    /////////////////////////////////////////////////////////////////
        if (atweet1.getRT() > 0 && !atweet1.getMention().equals(""))
            key1 = atweet1.getMention();
        if (atweet2.getRT() > 0 && !atweet2.getMention().equals(""))
            key2 = atweet2.getMention();

        int realKeyCompare = key1.compareTo(key2);
        return realKeyCompare;
        }

    }

    As i am taking the incoming bytes and interpret them as Tweets
    by recreating the appropriate CSV-Strings and Tokenizing it, i
    was kind of sure, that the problem somehow are the leading
    bytes, that Hadoop puts in front of the data being compared.
    Since i never really understood what hadoop is doing to the
    strings when they are sent to the KeyComparator  i simply
    appended all strings to a file in order to see myself.

    You can see the results here:

    ??2009-06-12 07:32:47, davedilbeck, tampabaycom, , ,
    http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but
    it's mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
    ??2009-06-12 07:32:47, davedilbeck, tampabaycom, , ,
    http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but
    it's mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
    I2009-06-12 04:33:19, ntmp, tsukunep, , , , 1, 0,
    ??????????????????, null
    ??2009-06-12 07:32:47, davedilbeck, tampabaycom, , ,
    http://tinyurl.com/nookw7, 1, 1, Lots of vivid imagery here but
    it's mostly bullshit   Alex Sink June Cleaver or Joan Crawford, null
    ^2009-06-12 04:33:20, aclouatre, , , , , 0, 0, Bored out of my
    mind Watching food network, null
    b2009-06-12 04:33:20, djnewera, adoremii369, , , , 1, 0, LOL
    WORDUP ANT NOTHING LIKE THE HOOD, null


    As you can see there are different leading characters: sometimes
    its "??", other times its "b" or "^", etc.

    My question is now:
    How many bits do i have to cut off, so i get the original Text
    as a String that i put into the key-position of my mapper
    output? What are the concepts behind this?

    Thanks for your help in advance!

    Best regards,
    Elmar Macek






--
Bertrand Dechoux






Reply via email to