Re: copy job for mapreduce failing due to large rows

T Vinod Gupta Mon, 09 Jan 2012 18:56:54 -0800

Hi, thanks for your response.
copying table A to table A was my plan but thats not what I am doing. I am
copying table A to table B. Also, I am wondering - if I were able to create
such large rows from my java client in the first place, then how come map
reduce is erroring out? it doesn't make sense.


How do I ensure that each row is committed before moving on to the next
row? The commit to table B is happening in the reduce stage right.. This is
how I am doing map reduce -

    public static void main(String args[]) throws Exception {
        Configuration config = HBaseConfiguration.create();

        Job job = new Job(config, "HBaseBackuper");
        job.setJarByClass(HBaseBackuper.class);
        Scan scan = new Scan();
        scan.setCaching(50);
        scan.setCacheBlocks(false);

        job.setOutputFormatClass(NullOutputFormat.class);

        TableMapReduceUtil.initTableMapperJob(TABLE_NAME, scan,
BackuperMapper.class,
                ImmutableBytesWritable.class, Put.class, job);
        TableMapReduceUtil.initTableReducerJob(DEST_TABLE_NAME, null, job);
        job.setNumReduceTasks(0);
        boolean b = job.waitForCompletion(true);
        if (!b) {
            throw new IOException("error with job!");
        }
    }


    static class BackuperMapper extends TableMapper<ImmutableBytesWritable,
Put> {

        @Override
        public void map(ImmutableBytesWritable row, Result result, Context
context) throws
                IOException, InterruptedException {
            String rowKey = Bytes.toString(row.get());

            String newRowKey = transformRowKey(rowKey);

            context.write(new
ImmutableBytesWritable(Bytes.toBytes(newRowKey)), resultToPut(newRowKey,
result));
            System.out.println("Remapping " + rowKey);
            return;
        }

        private static Put resultToPut(String rowKeyStr, Result result)
throws IOException {
            Put put = new Put(Bytes.toBytes(rowKeyStr));

            NavigableMap<byte[], NavigableMap<byte[], byte[]>>
familyQualifierMap = result.getNoVersionMap();
            for (byte[] familyBytes : familyQualifierMap.keySet()) {
                NavigableMap<byte[], byte[]> qualifierMap =
familyQualifierMap.get(familyBytes);

                for (byte[] qualifier : qualifierMap.keySet()) {
                    put.add(familyBytes, qualifier,
qualifierMap.get(qualifier));
                }
            }

            return put;
        }

something wrong here?

thanks

On Mon, Jan 9, 2012 at 6:35 PM, Michael Segel <[email protected]>wrote:

>
> Uhmm...
> You're copying data from Table A back to Table A?
>
> Ok... you really want to disable your caching altogether and make sure
> each row as you write it is committed to the table.
>
> Try that... it will hurt your performance, but it may keep you afloat.
>
> HTH
>
> -Mike
>
>
> You've got a scanner and you're running through your table. You're co
> > Date: Mon, 9 Jan 2012 16:10:25 -0800
> > Subject: copy job for mapreduce failing due to large rows
> > From: [email protected]
> > To: [email protected]
> >
> > hi,
> > I wrote a mapreduce job to copy rows from my table to the same table
> since
> > i want to change my row key schema. but the job is failing consistently
> at
> > the same point due to presence of large rows. i don't know how to unblock
> > myself.
> >
> > here is the error stack i see.
> >
> > attempt_201112151554_0028_m_000120_2: Remapping
> > 165845033445190:1313884800:weekly:AudEng
> > attempt_201112151554_0028_m_000120_2: Remapping
> > 165845033445190:1313884800:weekly:ContentEng
> > 12/01/10 00:01:01 INFO mapred.JobClient: Task Id :
> > attempt_201112151554_0028_m_000121_2, Status : FAILED
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> > 1 action: servers with issues: ip-10-68-145-124.ec2.internal:60020,
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1227)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchOfPuts(HConnectionManager.java:1241)
> >         at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:826)
> >         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:682)
> >         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:667)
> >         at
> >
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:127)
> >         at
> >
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:82)
> >         at
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
> >         at
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >         at
> >
> com.akanksh.information.hbasetest.HBaseBackuper$BackuperMapper.map(HBaseBackuper.java:68)
> >         at
> >
> com.akanksh.information.hbasetest.HBaseBackuper$BackuperMapper.map(HBaseBackuper.java:34)
> >         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:416)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:264)
> >
> > when i open the region server log, i only see a warning here -
> >
> > 2012-01-10 00:00:13,745 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Bl
> > ock cache LRU eviction started; Attempting to free 59.84 MB of
> total=508.6
> > MB
> > 2012-01-10 00:00:13,793 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Bl
> > ock cache LRU eviction completed; freed=59.88 MB, total=449.28 MB,
> > single=130.23
> >  MB, multi=352.69 MB, memory=21.27 MB
> > 2012-01-10 00:00:17,230 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LR
> > U Stats: total=451.46 MB, free=146.87 MB, max=598.34 MB, blocks=9096,
> > accesses=1
> > 663927726, hits=1565631235, hitRatio=94.09%%, cachingAccesses=1638666127,
> > cachin
> > gHits=1563292171, cachingHitsRatio=95.40%%, evictions=83895,
> > evicted=75364860, e
> > victedPerRun=898.3236694335938
> > 2012-01-10 00:00:52,545 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> Server
> > handl
> > er 9 on 60020, responseTooLarge for: next(-5685114053145855194, 50) from
> > 10.68.1
> > 45.124:44423: Size: 121.7m
> > 2012-01-10 00:01:06,229 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Bl
> > ock cache LRU eviction started; Attempting to free 59.89 MB of
> total=508.64
> > MB
> >
> > i saw a similar thread in the past where your suggestion was to use bulk
> > load. but i am essentially going through a schema change and doing
> > migrations. so how do i go about it. i tried decreasing the scan caching
> > size from 500 to 50. i do setCacheBlocks(false) in my job.
> >
> >
> http://mail-archives.apache.org/mod_mbox/hbase-user/201112.mbox/%[email protected]%3E
> >
> > any suggestions? i need to get unblocked asap since this is affecting my
> > production.
> >
> > thanks
> > vinod
>
>

Re: copy job for mapreduce failing due to large rows

Reply via email to