Ted,

it seems it is due to the Jira-11083: throttle bandwidth during snapshot
export <https://issues.apache.org/jira/browse/HBASE-11083> After I revert
it back, the job succeed again. It seems even when I set the throttle
bandwidth high, like 200M, iftop shows much lower value. Maybe the throttle
is sleeping longer than it supposed to? But I am not clear why a slow copy
job can cause LeaseExpiredException. Any idea?

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on 
/hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does
not have any open files.
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$


Thanks
Tian-Ying


On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <[email protected]> wrote:

> Tianying:
> Have you checked audit log on namenode for deletion event corresponding to
> the files involved in LeaseExpiredException ?
>
> Cheers
>
>
> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <[email protected]>
> wrote:
>
> > This time re-run passed (although with many failed/retry tasks) with my
> > throttle bandwidth as 200M(although by iftop, it never reach close to
> that
> > number). Is there a way to increase the lease expire time for low
> throttle
> > bandwidth for individual export job?
> >
> > Thanks
> > Tian-Ying
> >
> >
> >
> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <[email protected]>
> > wrote:
> >
> > > yes, I am using the bandwidth throttle feature. The export job of this
> > > table actually succeed for its first run. When I rerun it (for my
> robust
> > > testing) it seems never pass.  I am wondering if it has some werid
> state
> > (I
> > > did clean up the target cluster even removed
> > > /hbase/.archive/rich_pint_data_v1 folder)
> > >
> > > It seems even if I set the throttle value really large, it still fail.
> > And
> > > I think even after I replace the jar back to the one without throttle,
> it
> > > still fail for re-run.
> > >
> > > Is there some way that I can increase the lease to be very large to
> test
> > > it out?
> > >
> > >
> > >
> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> > [email protected]
> > > > wrote:
> > >
> > >> the file is the file in export, so you are creating that file.
> > >> do you have the bandwidth throttle on?
> > >>
> > >> I'm thinking that the file is slow writing: e.g. write(few bytes) wait
> > >> write(few bytes)
> > >> and on the wait your lease expire
> > >> or something like that can happen if your MR job is stuck in someway
> > (slow
> > >> machine or similar) and it is not writing within the lease timeout
> > >>
> > >> Matteo
> > >>
> > >>
> > >>
> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <[email protected]>
> > >> wrote:
> > >>
> > >> > we are using
> > >> >
> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported several
> > >> snapshot
> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth throttle
> in
> > >> > exportSnapshot)
> > >> >
> > >> > I found when the  LeaseExpiredException first reported, that file
> > indeed
> > >> > not there, and the map task retry. And I verifified couple minutes
> > >> later,
> > >> > that HFile does exist under /.archive. But the retry map task still
> > >> > complain the same error of file  not exist...
> > >> >
> > >> > I will check the namenode log for the LeaseExpiredException.
> > >> >
> > >> >
> > >> > Thanks
> > >> >
> > >> > Tian-Ying
> > >> >
> > >> >
> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <[email protected]>
> wrote:
> > >> >
> > >> > > Can you give us the hbase and hadoop releases you're using ?
> > >> > >
> > >> > > Can you check namenode log around the time LeaseExpiredException
> was
> > >> > > encountered ?
> > >> > >
> > >> > > Cheers
> > >> > >
> > >> > >
> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> [email protected]>
> > >> > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > When I export large table with 460+ regions, I saw the
> > >> exportSnapshot
> > >> > job
> > >> > > > fail sometime (not all the time). The error of the map task is
> > >> below:
> > >> > > But I
> > >> > > > verified the file highlighted below, it does exist. Smaller
> table
> > >> seems
> > >> > > > always pass. Any idea? Is it because it is too big and get
> session
> > >> > > timeout?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> > >> > > > No lease on
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> > >> > > > File does not exist. Holder DFSClient_NONMAPREDUCE_279781617_1
> > does
> > >> > > > not have any open files.
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> > >> > > >         at
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > Thanks
> > >> > > >
> > >> > > > Tian-Ying
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to