Re: Lease exception

2017-01-11 Thread Rajeshkumar J
This is the log i got
2017-01-05 11:41:49,629 DEBUG
[B.defaultRpcServer.handler=15,queue=0,port=16020] ipc.RpcServer:
B.defaultRpcServer.handler=15,queue=0,port=16020: callId: 3 service:
ClientService methodName: Scan size: 23 connection: xx.xx.xx.xx:x
org.apache.hadoop.hbase.regionserver.LeaseException: lease '706' does not
exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)
2017-01-05 11:41:49,629 TRACE
[B.defaultRpcServer.handler=18,queue=0,port=16020] ipc.RpcServer: callId: 2
service: ClientService methodName: Scan size: 29 connection:
xx.xx.xx.xx:x param: scanner_id: 706 number_of_rows: 2147483647
close_scanner: false next_call_seq: 0 client_handles_partials: true
client_handles_heartbeats: true connection: xx.xx.xx.xx:x, response
scanner_id: 706 more_results: true stale: false more_results_in_region:
false queueTime: 1 processingTime: 60136 totalTime: 60137

I have hbase scanner timeout of 6 but here total time is greater than
that so I am getting lease exception. can any one suggest me is there any
way to find out why it takes this time.

Thanks

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang <ud1...@gmail.com> wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang <ud1...@gmail.com> wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com
> >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > &g

Re: Lease exception

2016-12-26 Thread Rajeshkumar J
Also how to change this property hbase.client.scanner.timeout.period in
client side as I only know that to change this property in hbase-site.xml

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang <ud1...@gmail.com> wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang <ud1...@gmail.com> wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com
> >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > > > >
> > > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >
> > > > >> Which hbase release are you using ?
> > > > >>
> > > > >> There is heartbeat support when scanning.
> > > > >> Looks like the version you use doesn't have this support.
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> >   Thanks for the reply. I have properties as below
> > > > >> >
> > > > >> > 
> > > > >> >hbase.regionserver.lease.period
> > > > >> >90
> > > > >> >  
> > > > >> >  
> > > > >> >hbase.rpc.timeout
> > > > >> >90>/value>
> > > > >> >  
> > > > >> >
> > > > >> >
> > > > >> > Correct me If I am wrong.
> > > > >> >
> > > > >> > I know hbase.regionserver.lease.period, which says how long a
> > > scanner
> > > > >> > lives between 

Re: Lease exception

2016-12-26 Thread Rajeshkumar J
sorry for the delay. I didn't get the lease concept here whether it is
specific to hbase or like lease in hadoop?

On Thu, Dec 22, 2016 at 3:13 PM, Phil Yang <ud1...@gmail.com> wrote:

> https://github.com/apache/hbase/blob/rel/1.1.1/hbase-
> server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.
> java#L2491
>
> There is a TTL for scanners at server, to prevent client don't close the
> scanners and they leak. The TTL is configured by
> hbase.client.scanner.timeout.period at server and refreshed when a scan
> RPC
> request comes . The TTLs of all scanners are managed by Lease. Your error
> happens when server closes a scanner but in Lease it is already expired. So
> I think you can try to increase  hbase.client.scanner.timeout.period at
> server or decrease your hbase.client.scanner.timeout.period at client to
> prevent the scanner expired before its scanning done.
> hbase.client.scanner.timeout.period is used both at client and server, may
> be different if you change one of sides.
>
> BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
> has some data-loss bugs on scanning.
>
> Thanks,
> Phil
>
>
> 2016-12-22 17:26 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
>
> > can you please explain what is the cause of this lease exception and is
> > there any solve this in current version
> >
> > Thanks
> >
> > On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang <ud1...@gmail.com> wrote:
> >
> > > In fact at client the rpc timeout of scan request is also
> > > hbase.client.scanner.timeout.period which replaces the
> > > deprecated hbase.regionserver.lease.period.
> > >
> > > Your code that throws LeaseException has been removed by HBASE-16604,
> > maybe
> > > you can try to upgrade your cluster to 1.1.7? Your client can also
> > upgrade
> > > to 1.1.7 which will ignore UnknowScannerException and retry when the
> > lease
> > > is expired at server.
> > >
> > > Thanks,
> > > Phil
> > >
> > >
> > > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com
> >:
> > >
> > > > Also there is a solution what i have found from hbase user guide that
> > > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > > period.
> > > > How these two properties plays a part in the above exception. Please
> > can
> > > > anyone explain?
> > > >
> > > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > I am using hbase version 1.1.1
> > > > > Also I didn't understand something here. Whenever a scanner.next()
> is
> > > > > called it needs to return rows(based on caching value) within
> leasing
> > > > > period or else scanner client will be closed eventually throwing
> this
> > > > > exception. Correct me as I didn't get the clear understanding of
> this
> > > > issue
> > > > >
> > > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > > >
> > > > >> Which hbase release are you using ?
> > > > >>
> > > > >> There is heartbeat support when scanning.
> > > > >> Looks like the version you use doesn't have this support.
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > Hi,
> > > > >> >
> > > > >> >   Thanks for the reply. I have properties as below
> > > > >> >
> > > > >> > 
> > > > >> >hbase.regionserver.lease.period
> > > > >> >90
> > > > >> >  
> > > > >> >  
> > > > >> >hbase.rpc.timeout
> > > > >> >90>/value>
> > > > >> >  
> > > > >> >
> > > > >> >
> > > > >> > Correct me If I am wrong.
> > > > >> >
> > > > >> > I know hbase.regionserver.lease.period, which says how long a
> > > scanner
> > > > >> > lives between calls to scanner.next().
> > > >

Re: Lease exception

2016-12-22 Thread Phil Yang
https://github.com/apache/hbase/blob/rel/1.1.1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java#L2491

There is a TTL for scanners at server, to prevent client don't close the
scanners and they leak. The TTL is configured by
hbase.client.scanner.timeout.period at server and refreshed when a scan RPC
request comes . The TTLs of all scanners are managed by Lease. Your error
happens when server closes a scanner but in Lease it is already expired. So
I think you can try to increase  hbase.client.scanner.timeout.period at
server or decrease your hbase.client.scanner.timeout.period at client to
prevent the scanner expired before its scanning done.
hbase.client.scanner.timeout.period is used both at client and server, may
be different if you change one of sides.

BTW, I still suggest that you can upgrade your cluster and client. 1.1.1
has some data-loss bugs on scanning.

Thanks,
Phil


2016-12-22 17:26 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:

> can you please explain what is the cause of this lease exception and is
> there any solve this in current version
>
> Thanks
>
> On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang <ud1...@gmail.com> wrote:
>
> > In fact at client the rpc timeout of scan request is also
> > hbase.client.scanner.timeout.period which replaces the
> > deprecated hbase.regionserver.lease.period.
> >
> > Your code that throws LeaseException has been removed by HBASE-16604,
> maybe
> > you can try to upgrade your cluster to 1.1.7? Your client can also
> upgrade
> > to 1.1.7 which will ignore UnknowScannerException and retry when the
> lease
> > is expired at server.
> >
> > Thanks,
> > Phil
> >
> >
> > 2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
> >
> > > Also there is a solution what i have found from hbase user guide that
> > > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > > period.
> > > How these two properties plays a part in the above exception. Please
> can
> > > anyone explain?
> > >
> > > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > > rajeshkumarit8...@gmail.com>
> > > wrote:
> > >
> > > > I am using hbase version 1.1.1
> > > > Also I didn't understand something here. Whenever a scanner.next() is
> > > > called it needs to return rows(based on caching value) within leasing
> > > > period or else scanner client will be closed eventually throwing this
> > > > exception. Correct me as I didn't get the clear understanding of this
> > > issue
> > > >
> > > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > >> Which hbase release are you using ?
> > > >>
> > > >> There is heartbeat support when scanning.
> > > >> Looks like the version you use doesn't have this support.
> > > >>
> > > >> Cheers
> > > >>
> > > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > > rajeshkumarit8...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > Hi,
> > > >> >
> > > >> >   Thanks for the reply. I have properties as below
> > > >> >
> > > >> > 
> > > >> >hbase.regionserver.lease.period
> > > >> >90
> > > >> >  
> > > >> >  
> > > >> >hbase.rpc.timeout
> > > >> >90>/value>
> > > >> >  
> > > >> >
> > > >> >
> > > >> > Correct me If I am wrong.
> > > >> >
> > > >> > I know hbase.regionserver.lease.period, which says how long a
> > scanner
> > > >> > lives between calls to scanner.next().
> > > >> >
> > > >> > As far as I understand when scanner.next() is called it will fetch
> > no
> > > >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > > >> > process takes more than lease period it will close the scanner
> > object.
> > > >> > so this exception occuring?
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Rajeshkumar J
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, Dec 21, 

Re: Lease exception

2016-12-22 Thread Rajeshkumar J
can you please explain what is the cause of this lease exception and is
there any solve this in current version

Thanks

On Thu, Dec 22, 2016 at 2:54 PM, Phil Yang <ud1...@gmail.com> wrote:

> In fact at client the rpc timeout of scan request is also
> hbase.client.scanner.timeout.period which replaces the
> deprecated hbase.regionserver.lease.period.
>
> Your code that throws LeaseException has been removed by HBASE-16604, maybe
> you can try to upgrade your cluster to 1.1.7? Your client can also upgrade
> to 1.1.7 which will ignore UnknowScannerException and retry when the lease
> is expired at server.
>
> Thanks,
> Phil
>
>
> 2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
>
> > Also there is a solution what i have found from hbase user guide that
> > hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> > period.
> > How these two properties plays a part in the above exception. Please can
> > anyone explain?
> >
> > On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com>
> > wrote:
> >
> > > I am using hbase version 1.1.1
> > > Also I didn't understand something here. Whenever a scanner.next() is
> > > called it needs to return rows(based on caching value) within leasing
> > > period or else scanner client will be closed eventually throwing this
> > > exception. Correct me as I didn't get the clear understanding of this
> > issue
> > >
> > > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > >> Which hbase release are you using ?
> > >>
> > >> There is heartbeat support when scanning.
> > >> Looks like the version you use doesn't have this support.
> > >>
> > >> Cheers
> > >>
> > >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com>
> > >> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> >   Thanks for the reply. I have properties as below
> > >> >
> > >> > 
> > >> >hbase.regionserver.lease.period
> > >> >90
> > >> >  
> > >> >  
> > >> >hbase.rpc.timeout
> > >> >90>/value>
> > >> >  
> > >> >
> > >> >
> > >> > Correct me If I am wrong.
> > >> >
> > >> > I know hbase.regionserver.lease.period, which says how long a
> scanner
> > >> > lives between calls to scanner.next().
> > >> >
> > >> > As far as I understand when scanner.next() is called it will fetch
> no
> > >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > >> > process takes more than lease period it will close the scanner
> object.
> > >> > so this exception occuring?
> > >> >
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Rajeshkumar J
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> > >> richardstar...@outlook.com
> > >> >> wrote:
> > >> >
> > >> >> It means your lease on a region server has expired during a call to
> > >> >> resultscanner.next(). This happens on a slow call to next(). You
> can
> > >> either
> > >> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> > >> >> hbase.regionserver.lease.period.
> > >> >>
> > >> >> https://richardstartin.com
> > >> >>
> > >> >> On 21 Dec 2016, at 11:30, Rajeshkumar J <
> rajeshkumarit8...@gmail.com
> > <
> > >> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> > >> >>
> > >> >> Hi,
> > >> >>
> > >> >>  I have faced below issue in our production cluster
> > >> >>
> > >> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> > >> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease
> '166881'
> > >> does
> > >> >> not exist
> > >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> > >> >> removeLease(Leases.java:221)
> > >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> > >> >> cancelLease(Leases.java:206)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> > >> >> scan(RSRpcServices.java:2491)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> > >> ClientService$2.
> > >> >> callBlockingMethod(ClientProtos.java:32205)
> > >> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> > >> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> > >> >> at
> > >> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
> > >> utor.java:130)
> > >> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> > java:107)
> > >> >> at java.lang.Thread.run(Thread.java:744)
> > >> >>
> > >> >>
> > >> >> Can any one explain what is lease exception
> > >> >>
> > >> >> Thanks,
> > >> >> Rajeshkumar J
> > >> >>
> > >>
> > >
> > >
> >
>


Re: Lease exception

2016-12-22 Thread Phil Yang
In fact at client the rpc timeout of scan request is also
hbase.client.scanner.timeout.period which replaces the
deprecated hbase.regionserver.lease.period.

Your code that throws LeaseException has been removed by HBASE-16604, maybe
you can try to upgrade your cluster to 1.1.7? Your client can also upgrade
to 1.1.7 which will ignore UnknowScannerException and retry when the lease
is expired at server.

Thanks,
Phil


2016-12-22 16:51 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:

> Also there is a solution what i have found from hbase user guide that
> hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.
> period.
> How these two properties plays a part in the above exception. Please can
> anyone explain?
>
> On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <
> rajeshkumarit8...@gmail.com>
> wrote:
>
> > I am using hbase version 1.1.1
> > Also I didn't understand something here. Whenever a scanner.next() is
> > called it needs to return rows(based on caching value) within leasing
> > period or else scanner client will be closed eventually throwing this
> > exception. Correct me as I didn't get the clear understanding of this
> issue
> >
> > On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> >> Which hbase release are you using ?
> >>
> >> There is heartbeat support when scanning.
> >> Looks like the version you use doesn't have this support.
> >>
> >> Cheers
> >>
> >> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <
> rajeshkumarit8...@gmail.com>
> >> wrote:
> >> >
> >> > Hi,
> >> >
> >> >   Thanks for the reply. I have properties as below
> >> >
> >> > 
> >> >hbase.regionserver.lease.period
> >> >90
> >> >  
> >> >  
> >> >hbase.rpc.timeout
> >> >90>/value>
> >> >  
> >> >
> >> >
> >> > Correct me If I am wrong.
> >> >
> >> > I know hbase.regionserver.lease.period, which says how long a scanner
> >> > lives between calls to scanner.next().
> >> >
> >> > As far as I understand when scanner.next() is called it will fetch no
> >> > of rows as in *hbase.client.scanner.caching. *When this fetching
> >> > process takes more than lease period it will close the scanner object.
> >> > so this exception occuring?
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > Rajeshkumar J
> >> >
> >> >
> >> >
> >> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> >> richardstar...@outlook.com
> >> >> wrote:
> >> >
> >> >> It means your lease on a region server has expired during a call to
> >> >> resultscanner.next(). This happens on a slow call to next(). You can
> >> either
> >> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> >> >> hbase.regionserver.lease.period.
> >> >>
> >> >> https://richardstartin.com
> >> >>
> >> >> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com
> <
> >> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >>  I have faced below issue in our production cluster
> >> >>
> >> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> >> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
> >> does
> >> >> not exist
> >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> >> removeLease(Leases.java:221)
> >> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> >> cancelLease(Leases.java:206)
> >> >> at
> >> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >> >> scan(RSRpcServices.java:2491)
> >> >> at
> >> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.
> >> >> callBlockingMethod(ClientProtos.java:32205)
> >> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> >> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> >> >> at
> >> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
> >> utor.java:130)
> >> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:107)
> >> >> at java.lang.Thread.run(Thread.java:744)
> >> >>
> >> >>
> >> >> Can any one explain what is lease exception
> >> >>
> >> >> Thanks,
> >> >> Rajeshkumar J
> >> >>
> >>
> >
> >
>


Re: Lease exception

2016-12-22 Thread Rajeshkumar J
Also there is a solution what i have found from hbase user guide that
hbase.rpc.timeout must be greater than hbase.client.scanner.timeout.period.
How these two properties plays a part in the above exception. Please can
anyone explain?

On Wed, Dec 21, 2016 at 9:39 PM, Rajeshkumar J <rajeshkumarit8...@gmail.com>
wrote:

> I am using hbase version 1.1.1
> Also I didn't understand something here. Whenever a scanner.next() is
> called it needs to return rows(based on caching value) within leasing
> period or else scanner client will be closed eventually throwing this
> exception. Correct me as I didn't get the clear understanding of this issue
>
> On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Which hbase release are you using ?
>>
>> There is heartbeat support when scanning.
>> Looks like the version you use doesn't have this support.
>>
>> Cheers
>>
>> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> >   Thanks for the reply. I have properties as below
>> >
>> > 
>> >hbase.regionserver.lease.period
>> >90
>> >  
>> >  
>> >hbase.rpc.timeout
>> >90>/value>
>> >  
>> >
>> >
>> > Correct me If I am wrong.
>> >
>> > I know hbase.regionserver.lease.period, which says how long a scanner
>> > lives between calls to scanner.next().
>> >
>> > As far as I understand when scanner.next() is called it will fetch no
>> > of rows as in *hbase.client.scanner.caching. *When this fetching
>> > process takes more than lease period it will close the scanner object.
>> > so this exception occuring?
>> >
>> >
>> > Thanks,
>> >
>> > Rajeshkumar J
>> >
>> >
>> >
>> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
>> richardstar...@outlook.com
>> >> wrote:
>> >
>> >> It means your lease on a region server has expired during a call to
>> >> resultscanner.next(). This happens on a slow call to next(). You can
>> either
>> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> >> hbase.regionserver.lease.period.
>> >>
>> >> https://richardstartin.com
>> >>
>> >> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com<
>> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
>> >>
>> >> Hi,
>> >>
>> >>  I have faced below issue in our production cluster
>> >>
>> >> org.apache.hadoop.hbase.regionserver.LeaseException:
>> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
>> does
>> >> not exist
>> >> at org.apache.hadoop.hbase.regionserver.Leases.
>> >> removeLease(Leases.java:221)
>> >> at org.apache.hadoop.hbase.regionserver.Leases.
>> >> cancelLease(Leases.java:206)
>> >> at
>> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> >> scan(RSRpcServices.java:2491)
>> >> at
>> >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.
>> >> callBlockingMethod(ClientProtos.java:32205)
>> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> >> at
>> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExec
>> utor.java:130)
>> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> >> at java.lang.Thread.run(Thread.java:744)
>> >>
>> >>
>> >> Can any one explain what is lease exception
>> >>
>> >> Thanks,
>> >> Rajeshkumar J
>> >>
>>
>
>


Re: Lease exception

2016-12-21 Thread Rajeshkumar J
I am using hbase version 1.1.1
Also I didn't understand something here. Whenever a scanner.next() is
called it needs to return rows(based on caching value) within leasing
period or else scanner client will be closed eventually throwing this
exception. Correct me as I didn't get the clear understanding of this issue

On Wed, Dec 21, 2016 at 7:31 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which hbase release are you using ?
>
> There is heartbeat support when scanning.
> Looks like the version you use doesn't have this support.
>
> Cheers
>
> > On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com>
> wrote:
> >
> > Hi,
> >
> >   Thanks for the reply. I have properties as below
> >
> > 
> >hbase.regionserver.lease.period
> >90
> >  
> >  
> >hbase.rpc.timeout
> >90>/value>
> >  
> >
> >
> > Correct me If I am wrong.
> >
> > I know hbase.regionserver.lease.period, which says how long a scanner
> > lives between calls to scanner.next().
> >
> > As far as I understand when scanner.next() is called it will fetch no
> > of rows as in *hbase.client.scanner.caching. *When this fetching
> > process takes more than lease period it will close the scanner object.
> > so this exception occuring?
> >
> >
> > Thanks,
> >
> > Rajeshkumar J
> >
> >
> >
> > On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <
> richardstar...@outlook.com
> >> wrote:
> >
> >> It means your lease on a region server has expired during a call to
> >> resultscanner.next(). This happens on a slow call to next(). You can
> either
> >> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> >> hbase.regionserver.lease.period.
> >>
> >> https://richardstartin.com
> >>
> >> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com<
> >> mailto:rajeshkumarit8...@gmail.com>> wrote:
> >>
> >> Hi,
> >>
> >>  I have faced below issue in our production cluster
> >>
> >> org.apache.hadoop.hbase.regionserver.LeaseException:
> >> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881'
> does
> >> not exist
> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> removeLease(Leases.java:221)
> >> at org.apache.hadoop.hbase.regionserver.Leases.
> >> cancelLease(Leases.java:206)
> >> at
> >> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> >> scan(RSRpcServices.java:2491)
> >> at
> >> org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.
> >> callBlockingMethod(ClientProtos.java:32205)
> >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
> >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> >> at java.lang.Thread.run(Thread.java:744)
> >>
> >>
> >> Can any one explain what is lease exception
> >>
> >> Thanks,
> >> Rajeshkumar J
> >>
>


Re: Lease exception

2016-12-21 Thread Ted Yu
Which hbase release are you using ?

There is heartbeat support when scanning. 
Looks like the version you use doesn't have this support. 

Cheers

> On Dec 21, 2016, at 4:02 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> 
> wrote:
> 
> Hi,
> 
>   Thanks for the reply. I have properties as below
> 
> 
>hbase.regionserver.lease.period
>90
>  
>  
>hbase.rpc.timeout
>90>/value>
>  
> 
> 
> Correct me If I am wrong.
> 
> I know hbase.regionserver.lease.period, which says how long a scanner
> lives between calls to scanner.next().
> 
> As far as I understand when scanner.next() is called it will fetch no
> of rows as in *hbase.client.scanner.caching. *When this fetching
> process takes more than lease period it will close the scanner object.
> so this exception occuring?
> 
> 
> Thanks,
> 
> Rajeshkumar J
> 
> 
> 
> On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <richardstar...@outlook.com
>> wrote:
> 
>> It means your lease on a region server has expired during a call to
>> resultscanner.next(). This happens on a slow call to next(). You can either
>> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> hbase.regionserver.lease.period.
>> 
>> https://richardstartin.com
>> 
>> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com<
>> mailto:rajeshkumarit8...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>>  I have faced below issue in our production cluster
>> 
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
>> not exist
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> removeLease(Leases.java:221)
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> cancelLease(Leases.java:206)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> scan(RSRpcServices.java:2491)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
>> callBlockingMethod(ClientProtos.java:32205)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:744)
>> 
>> 
>> Can any one explain what is lease exception
>> 
>> Thanks,
>> Rajeshkumar J
>> 


Re: Lease exception

2016-12-21 Thread Richard Startin
If your client caching is set to a large value, you will need to do a long scan 
occasionally, and the rpc itself will be expensive in terms of IO. So it's 
worth looking at hbase.client.scanner.caching to see if it is too large. If 
you're scanning the whole table check you aren't churning the block cache.

The XML below looks wrong, was that copied verbatim from your site file?

https://richardstartin.com

> On 21 Dec 2016, at 12:02, Rajeshkumar J <rajeshkumarit8...@gmail.com> wrote:
> 
> Hi,
> 
>   Thanks for the reply. I have properties as below
> 
> 
>hbase.regionserver.lease.period
>90
>  
>  
>hbase.rpc.timeout
>90>/value>
>  
> 
> 
> Correct me If I am wrong.
> 
> I know hbase.regionserver.lease.period, which says how long a scanner
> lives between calls to scanner.next().
> 
> As far as I understand when scanner.next() is called it will fetch no
> of rows as in *hbase.client.scanner.caching. *When this fetching
> process takes more than lease period it will close the scanner object.
> so this exception occuring?
> 
> 
> Thanks,
> 
> Rajeshkumar J
> 
> 
> 
> On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <richardstar...@outlook.com
>> wrote:
> 
>> It means your lease on a region server has expired during a call to
>> resultscanner.next(). This happens on a slow call to next(). You can either
>> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
>> hbase.regionserver.lease.period.
>> 
>> https://richardstartin.com
>> 
>> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com<
>> mailto:rajeshkumarit8...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>>  I have faced below issue in our production cluster
>> 
>> org.apache.hadoop.hbase.regionserver.LeaseException:
>> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
>> not exist
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> removeLease(Leases.java:221)
>> at org.apache.hadoop.hbase.regionserver.Leases.
>> cancelLease(Leases.java:206)
>> at
>> org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> scan(RSRpcServices.java:2491)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
>> callBlockingMethod(ClientProtos.java:32205)
>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>> at
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>> at java.lang.Thread.run(Thread.java:744)
>> 
>> 
>> Can any one explain what is lease exception
>> 
>> Thanks,
>> Rajeshkumar J
>> 


Re: Lease exception

2016-12-21 Thread Rajeshkumar J
Hi,

   Thanks for the reply. I have properties as below


hbase.regionserver.lease.period
90
  
  
hbase.rpc.timeout
90>/value>
  


Correct me If I am wrong.

I know hbase.regionserver.lease.period, which says how long a scanner
lives between calls to scanner.next().

As far as I understand when scanner.next() is called it will fetch no
of rows as in *hbase.client.scanner.caching. *When this fetching
process takes more than lease period it will close the scanner object.
so this exception occuring?


Thanks,

Rajeshkumar J



On Wed, Dec 21, 2016 at 5:07 PM, Richard Startin <richardstar...@outlook.com
> wrote:

> It means your lease on a region server has expired during a call to
> resultscanner.next(). This happens on a slow call to next(). You can either
> embrace it or "fix" it by making sure hbase.rpc.timeout exceeds
> hbase.regionserver.lease.period.
>
> https://richardstartin.com
>
> On 21 Dec 2016, at 11:30, Rajeshkumar J <rajeshkumarit8...@gmail.com<
> mailto:rajeshkumarit8...@gmail.com>> wrote:
>
> Hi,
>
>   I have faced below issue in our production cluster
>
> org.apache.hadoop.hbase.regionserver.LeaseException:
> org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
> not exist
> at org.apache.hadoop.hbase.regionserver.Leases.
> removeLease(Leases.java:221)
> at org.apache.hadoop.hbase.regionserver.Leases.
> cancelLease(Leases.java:206)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.
> scan(RSRpcServices.java:2491)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:32205)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:744)
>
>
> Can any one explain what is lease exception
>
> Thanks,
> Rajeshkumar J
>


Re: Lease exception

2016-12-21 Thread Richard Startin
It means your lease on a region server has expired during a call to 
resultscanner.next(). This happens on a slow call to next(). You can either 
embrace it or "fix" it by making sure hbase.rpc.timeout exceeds 
hbase.regionserver.lease.period.

https://richardstartin.com

On 21 Dec 2016, at 11:30, Rajeshkumar J 
<rajeshkumarit8...@gmail.com<mailto:rajeshkumarit8...@gmail.com>> wrote:

Hi,

  I have faced below issue in our production cluster

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)


Can any one explain what is lease exception

Thanks,
Rajeshkumar J


Lease exception

2016-12-21 Thread Rajeshkumar J
Hi,

   I have faced below issue in our production cluster

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease '166881' does
not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2491)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:744)


Can any one explain what is lease exception

Thanks,
Rajeshkumar J


Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
Silly question… 

Why does the idea of using versioning to capture temporal changes to data keep 
being propagated? 

Seriously this issue keeps popping up… 

If you want to capture data over time… use a timestamp as part of the column 
name.  Don’t abuse the cell’s version.



On Apr 11, 2014, at 11:03 AM, gortiz gor...@pragsis.com wrote:

 Yes, I have tried with two different values for that value of versions, 1000 
 and maximum value for integers.
 
 But, I want to keep those versions. I don't want to keep just 3 versions. 
 Imagine that I want to record a new version each minute and store a day, 
 those are 1440 versions.
 
 Why is HBase going to read all the versions?? , I thought, if you don't 
 indicate any versions it's just read the newest and skip the rest. It doesn't 
 make too much sense to read all of them if data is sorted, plus the newest 
 version is stored in the top.
 
 
 On 11/04/14 11:54, Anoop John wrote:
 What is the max version setting u have done for ur table cf?  When u set
 some a value, HBase has to keep all those versions.  During a scan it will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?
 
 -Anoop-
 
 On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
 
 Last test I have done it's to reduce the number of versions to 100.
 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.
 
 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.
 
 What's the reasons of this? I guess HBase is enough smart for not consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.
 
 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}
 
 
 
 On 11/04/14 09:04, gortiz wrote:
 
 Well, I guessed that, what it doesn't make too much sense because it's so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it spend
 too many time checking this quantity of data?
 
 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.
 
 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?
 
 On 10/04/14 18:08, Ted Yu wrote:
 
 It should be newest version of each value.
 
 Cheers
 
 
 On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:
 
 Another little question is, when the filter I'm using, Do I check all the
 versions? or just the newest? Because, I'm wondering if when I do a scan
 over all the table, I look for the value 5 in all the dataset or I'm
 just
 looking for in one newest version of each value.
 
 
 On 10/04/14 16:52, gortiz wrote:
 
 I was trying to check the behaviour of HBase. The cluster is a group of
 old computers, one master, five slaves, each one with 2Gb, so, 12gb in
 total.
 The table has a column family with 1000 columns and each column with
 100
 versions.
 There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
 The table is partitioned manually in all the slaves, so data are
 balanced
 in the cluster.
 
 I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
 'binary:5')* in HBase 0.94.6
 My time for lease and rpc is three minutes.
 Since, it's a full scan of the table, I have been playing with the
 BLOCKCACHE as well (just disable and enable, not about the size of
 it). I
 thought that it was going to have too much calls to the GC. I'm not
 sure
 about this point.
 
 I know that it's not the best way to use HBase, it's just a test. I
 think
 that it's not working because the hardware isn't enough, although, I
 would
 like to try some kind of tunning to improve it.
 
 
 
 
 
 
 
 
 On 10/04/14 14:21, Ted Yu wrote:
 
 Can you give us a bit more information:
 HBase release you're running
 What filters are used for the scan
 
 Thanks
 
 On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:
 
   I got this error when I execute a full scan with filters about a
 table.
 
 Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
 regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4165751462641113359' does not exist
  at 
 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Guillermo Ortiz
Well, It was just a example why I could keep a thousand versions or a cell.
I didn't know that HBase was checking each version when I do a scan, it's a
little weird when data is sorted.

You get my attention with your comment, that it's better to store data over
time with new columns that with versions. Why is it better?
Versions looks that there're very convenient for that use case. So, does it
work better a rowkey with 3600 columns, that a rowkey with a column with
3600 versions? What's the reason for avoiding a massive use of versions?


2014-04-12 15:07 GMT+02:00 Michael Segel michael_se...@hotmail.com:

 Silly question...

 Why does the idea of using versioning to capture temporal changes to data
 keep being propagated?

 Seriously this issue keeps popping up...

 If you want to capture data over time... use a timestamp as part of the
 column name.  Don't abuse the cell's version.



 On Apr 11, 2014, at 11:03 AM, gortiz gor...@pragsis.com wrote:

  Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.
 
  But, I want to keep those versions. I don't want to keep just 3
 versions. Imagine that I want to record a new version each minute and store
 a day, those are 1440 versions.
 
  Why is HBase going to read all the versions?? , I thought, if you don't
 indicate any versions it's just read the newest and skip the rest. It
 doesn't make too much sense to read all of them if data is sorted, plus the
 newest version is stored in the top.
 
 
  On 11/04/14 11:54, Anoop John wrote:
  What is the max version setting u have done for ur table cf?  When u set
  some a value, HBase has to keep all those versions.  During a scan it
 will
  read all those versions. In 94 version the default value for the max
  versions is 3.  I guess you have set some bigger value.   If u have not,
  mind testing after a major compaction?
 
  -Anoop-
 
  On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
 
  Last test I have done it's to reduce the number of versions to 100.
  So, right now, I have 100 rows with 100 versions each one.
  Times are: (I got the same times for blocksize of 64Ks and 1Mb)
  100row-1000versions + blockcache- 80s.
  100row-1000versions + No blockcache- 70s.
 
  100row-*100*versions + blockcache- 7.3s.
  100row-*100*versions + No blockcache- 6.1s.
 
  What's the reasons of this? I guess HBase is enough smart for not
 consider
  old versions, so, it just checks the newest. But, I reduce 10 times the
  size (in versions) and I got a 10x of performance.
 
  The filter is scan 'filters', {FILTER = ValueFilter(=,
  'binary:5'),STARTROW = '10100101',
  STOPROW = '60100201'}
 
 
 
  On 11/04/14 09:04, gortiz wrote:
 
  Well, I guessed that, what it doesn't make too much sense because
 it's so
  slow. I only have right now 100 rows with 1000 versions each row.
  I have checked the size of the dataset and each row is about 700Kbytes
  (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
 x
  700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
  too many time checking this quantity of data?
 
  I'm generating again the dataset with a bigger blocksize (previously
 was
  64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
  baching parameters, but I don't think they're going to affect too
 much.
 
  Another test I want to do, it's generate the same dataset with just
  100versions, It should spend around the same time, right? Or am I
 wrong?
 
  On 10/04/14 18:08, Ted Yu wrote:
 
  It should be newest version of each value.
 
  Cheers
 
 
  On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:
 
  Another little question is, when the filter I'm using, Do I check
 all the
  versions? or just the newest? Because, I'm wondering if when I do a
 scan
  over all the table, I look for the value 5 in all the dataset or
 I'm
  just
  looking for in one newest version of each value.
 
 
  On 10/04/14 16:52, gortiz wrote:
 
  I was trying to check the behaviour of HBase. The cluster is a
 group of
  old computers, one master, five slaves, each one with 2Gb, so,
 12gb in
  total.
  The table has a column family with 1000 columns and each column
 with
  100
  versions.
  There's another column faimily with four columns an one image of
 100kb.
(I've tried without this column family as well.)
  The table is partitioned manually in all the slaves, so data are
  balanced
  in the cluster.
 
  I'm executing this sentence *scan 'table1', {FILTER =
 ValueFilter(=,
  'binary:5')* in HBase 0.94.6
  My time for lease and rpc is three minutes.
  Since, it's a full scan of the table, I have been playing with the
  BLOCKCACHE as well (just disable and enable, not about the size of
  it). I
  thought that it was going to have too much calls to the GC. I'm not
  sure
  about this point.
 
  I know that it's not the best way to use HBase, 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
Since you asked… 

Simplest answer… your schema should not rely upon internal features of the 
system.  Since you are tracking your data along the lines of a temporal 
attribute it should be part of the schema. In terms of a good design, by making 
it a part of the schema, you’re defining that the data has a temporal 
property/attribute. 

Cell versioning is an internal feature of HBase. Its there for a reason. 
Perhaps one of the committers should expand on why its there.  (When I asked 
this earlier, never got an answer. ) 


Longer answer… review how HBase stores the rows, including the versions of the 
cell. 
You’re putting an unnecessary stress on the system. 

Its just not Zen… ;-) 

The reason I’m a bit short on this topic is that its an issue that keeps coming 
up, over and over again because some idiot keeps looking to take a shortcut 
without understanding the implications of their decision. Just like salting the 
key. (Note:  prepending a truncated hash isn’t the same as using a salt.  
Salting has a specific meaning and the salt is orthogonal to the underlying 
key. Any relationship between the salt and the key is purely random luck.) 

Does that help? 
(BTW, this should be part of any schema design talk… yet somehow I think its 
not covered… ) 

-Mike

PS. Its not weird that the cell versions are checked. It makes perfect sense. 

On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz konstt2...@gmail.com wrote:

 Well, It was just a example why I could keep a thousand versions or a cell.
 I didn't know that HBase was checking each version when I do a scan, it's a
 little weird when data is sorted.
 
 You get my attention with your comment, that it's better to store data over
 time with new columns that with versions. Why is it better?
 Versions looks that there're very convenient for that use case. So, does it
 work better a rowkey with 3600 columns, that a rowkey with a column with
 3600 versions? What's the reason for avoiding a massive use of versions?
 
 
 2014-04-12 15:07 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Silly question...
 
 Why does the idea of using versioning to capture temporal changes to data
 keep being propagated?
 
 Seriously this issue keeps popping up...
 
 If you want to capture data over time... use a timestamp as part of the
 column name.  Don't abuse the cell's version.
 
 
 
 On Apr 11, 2014, at 11:03 AM, gortiz gor...@pragsis.com wrote:
 
 Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.
 
 But, I want to keep those versions. I don't want to keep just 3
 versions. Imagine that I want to record a new version each minute and store
 a day, those are 1440 versions.
 
 Why is HBase going to read all the versions?? , I thought, if you don't
 indicate any versions it's just read the newest and skip the rest. It
 doesn't make too much sense to read all of them if data is sorted, plus the
 newest version is stored in the top.
 
 
 On 11/04/14 11:54, Anoop John wrote:
 What is the max version setting u have done for ur table cf?  When u set
 some a value, HBase has to keep all those versions.  During a scan it
 will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?
 
 -Anoop-
 
 On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
 
 Last test I have done it's to reduce the number of versions to 100.
 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.
 
 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.
 
 What's the reasons of this? I guess HBase is enough smart for not
 consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.
 
 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}
 
 
 
 On 11/04/14 09:04, gortiz wrote:
 
 Well, I guessed that, what it doesn't make too much sense because
 it's so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
 x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity of data?
 
 I'm generating again the dataset with a bigger blocksize (previously
 was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too
 much.
 
 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Brian Jeltema
I don't want to be argumentative here, but by definition is's not an internal 
feature because it's part of the
public API. We use versioning in a way that makes me somewhat uncomfortable, 
but it's been quite
useful. I'd like to see a clear explanation of why it exists and what use cases 
it was intended to support.

Brian

 Since you asked… 
 
 Simplest answer… your schema should not rely upon internal features of the 
 system.  Since you are tracking your data along the lines of a temporal 
 attribute it should be part of the schema. In terms of a good design, by 
 making it a part of the schema, you’re defining that the data has a temporal 
 property/attribute. 
 
 Cell versioning is an internal feature of HBase. Its there for a reason. 
 Perhaps one of the committers should expand on why its there.  (When I asked 
 this earlier, never got an answer. ) 
 
 
 Longer answer… review how HBase stores the rows, including the versions of 
 the cell. 
 You’re putting an unnecessary stress on the system. 
 
 Its just not Zen… ;-) 
 
 The reason I’m a bit short on this topic is that its an issue that keeps 
 coming up, over and over again because some idiot keeps looking to take a 
 shortcut without understanding the implications of their decision. Just like 
 salting the key. (Note:  prepending a truncated hash isn’t the same as using 
 a salt.  Salting has a specific meaning and the salt is orthogonal to the 
 underlying key. Any relationship between the salt and the key is purely 
 random luck.) 
 
 Does that help? 
 (BTW, this should be part of any schema design talk… yet somehow I think its 
 not covered… ) 
 
 -Mike
 
 PS. Its not weird that the cell versions are checked. It makes perfect sense. 
 
 On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz konstt2...@gmail.com wrote:
 
 Well, It was just a example why I could keep a thousand versions or a cell.
 I didn't know that HBase was checking each version when I do a scan, it's a
 little weird when data is sorted.
 
 You get my attention with your comment, that it's better to store data over
 time with new columns that with versions. Why is it better?
 Versions looks that there're very convenient for that use case. So, does it
 work better a rowkey with 3600 columns, that a rowkey with a column with
 3600 versions? What's the reason for avoiding a massive use of versions?
 
 
 2014-04-12 15:07 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Silly question...
 
 Why does the idea of using versioning to capture temporal changes to data
 keep being propagated?
 
 Seriously this issue keeps popping up...
 
 If you want to capture data over time... use a timestamp as part of the
 column name.  Don't abuse the cell's version.
 
 
 
 On Apr 11, 2014, at 11:03 AM, gortiz gor...@pragsis.com wrote:
 
 Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.
 
 But, I want to keep those versions. I don't want to keep just 3
 versions. Imagine that I want to record a new version each minute and store
 a day, those are 1440 versions.
 
 Why is HBase going to read all the versions?? , I thought, if you don't
 indicate any versions it's just read the newest and skip the rest. It
 doesn't make too much sense to read all of them if data is sorted, plus the
 newest version is stored in the top.
 
 
 On 11/04/14 11:54, Anoop John wrote:
 What is the max version setting u have done for ur table cf?  When u set
 some a value, HBase has to keep all those versions.  During a scan it
 will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?
 
 -Anoop-
 
 On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
 
 Last test I have done it's to reduce the number of versions to 100.
 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.
 
 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.
 
 What's the reasons of this? I guess HBase is enough smart for not
 consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.
 
 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}
 
 
 
 On 11/04/14 09:04, gortiz wrote:
 
 Well, I guessed that, what it doesn't make too much sense because
 it's so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
 x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Michael Segel
You do realize that it is an internal feature and that the public API can 
change to not present access to it.
However, that wouldn’t be a good idea because you would want to be able to 
change it and in some cases review the versions of a cell.  How else do you 
describe versioning which is unique to HBase and/or other specific databases, 
yet temporal modeling is not? 

In fact if memory servers… going back to 2009-10 IIRC the ‘old API’ vs the ‘new 
API’ for Hadoop where the ‘new API’ had a subset of the exposed classes / 
methods than the old API? (It was an attempt to simplify the API… ) So again, 
APIs can change. 

The point is that you should be modeling your data on time if it is time 
sensitive data. Using versioning bypasses this with bad consequences. 

By all means keep abusing the cell’s versioning. 
Just don’t complain about poor performance and your HBase tossing exceptions 
left and right. I mean I can’t stop you from mixing booze, coke and meth. All I 
can do is tell you that its not a good idea and not recommended. 

If you want a good definition of why HBase has versioning… go ask StAck, Ted, 
Nick or one of the committers since they are more familiar with the internal 
workings of HBase than I. When you get a good answer, then have the online 
HBase book updated.

-Mike

PS… if you want a really good example of why not to use versioning to store 
temporal data… 
What happens if you’re storing 100 versions of a cell and you find out that you 
have a duplicate entry with the wrong timestamp and you want to delete that one 
version.
How do you do that? Going from memory, and I could very well be wrong, but the 
tombstone marker is on the cell, not the version, right? 

If it is on the version, what happens to the versions of the cell that are 
older than the tombstone marker?
Sorry, its been a while since I’ve been intimate with HBase. Doing a bit of 
other things at the moment, and I’m already overtaxing my last remaining living 
brain cell.  ;-) 


On Apr 12, 2014, at 9:14 PM, Brian Jeltema bdjelt...@gmail.com wrote:

 I don't want to be argumentative here, but by definition is's not an internal 
 feature because it's part of the
 public API. We use versioning in a way that makes me somewhat uncomfortable, 
 but it's been quite
 useful. I'd like to see a clear explanation of why it exists and what use 
 cases it was intended to support.
 
 Brian
 
 Since you asked… 
 
 Simplest answer… your schema should not rely upon internal features of the 
 system.  Since you are tracking your data along the lines of a temporal 
 attribute it should be part of the schema. In terms of a good design, by 
 making it a part of the schema, you’re defining that the data has a temporal 
 property/attribute. 
 
 Cell versioning is an internal feature of HBase. Its there for a reason. 
 Perhaps one of the committers should expand on why its there.  (When I asked 
 this earlier, never got an answer. ) 
 
 
 Longer answer… review how HBase stores the rows, including the versions of 
 the cell. 
 You’re putting an unnecessary stress on the system. 
 
 Its just not Zen… ;-) 
 
 The reason I’m a bit short on this topic is that its an issue that keeps 
 coming up, over and over again because some idiot keeps looking to take a 
 shortcut without understanding the implications of their decision. Just like 
 salting the key. (Note:  prepending a truncated hash isn’t the same as using 
 a salt.  Salting has a specific meaning and the salt is orthogonal to the 
 underlying key. Any relationship between the salt and the key is purely 
 random luck.) 
 
 Does that help? 
 (BTW, this should be part of any schema design talk… yet somehow I think its 
 not covered… ) 
 
 -Mike
 
 PS. Its not weird that the cell versions are checked. It makes perfect 
 sense. 
 
 On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz konstt2...@gmail.com wrote:
 
 Well, It was just a example why I could keep a thousand versions or a cell.
 I didn't know that HBase was checking each version when I do a scan, it's a
 little weird when data is sorted.
 
 You get my attention with your comment, that it's better to store data over
 time with new columns that with versions. Why is it better?
 Versions looks that there're very convenient for that use case. So, does it
 work better a rowkey with 3600 columns, that a rowkey with a column with
 3600 versions? What's the reason for avoiding a massive use of versions?
 
 
 2014-04-12 15:07 GMT+02:00 Michael Segel michael_se...@hotmail.com:
 
 Silly question...
 
 Why does the idea of using versioning to capture temporal changes to data
 keep being propagated?
 
 Seriously this issue keeps popping up...
 
 If you want to capture data over time... use a timestamp as part of the
 column name.  Don't abuse the cell's version.
 
 
 
 On Apr 11, 2014, at 11:03 AM, gortiz gor...@pragsis.com wrote:
 
 Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.
 
 But, I 

Re: Lease exception when I execute large scan with filters.

2014-04-12 Thread Ted Yu
HBase refguide has some explanation on internals w.r.t. versions:
http://hbase.apache.org/book.html#versions

bq. why HBase has versioning

This came from Bigtable. See the paragraph on page 3 of osdi paper:
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

The example use case from the above paper was to store 3 versions (i.e.
timestamps) of contents column. The timestamps are

bq. the times at which these page versions were actually crawled.

Cheers


On Sat, Apr 12, 2014 at 2:14 PM, Michael Segel michael_se...@hotmail.comwrote:

 You do realize that it is an internal feature and that the public API can
 change to not present access to it.
 However, that wouldn't be a good idea because you would want to be able to
 change it and in some cases review the versions of a cell.  How else do you
 describe versioning which is unique to HBase and/or other specific
 databases, yet temporal modeling is not?

 In fact if memory servers... going back to 2009-10 IIRC the 'old API' vs the
 'new API' for Hadoop where the 'new API' had a subset of the exposed
 classes / methods than the old API? (It was an attempt to simplify the API...
 ) So again, APIs can change.

 The point is that you should be modeling your data on time if it is time
 sensitive data. Using versioning bypasses this with bad consequences.

 By all means keep abusing the cell's versioning.
 Just don't complain about poor performance and your HBase tossing
 exceptions left and right. I mean I can't stop you from mixing booze, coke
 and meth. All I can do is tell you that its not a good idea and not
 recommended.

 If you want a good definition of why HBase has versioning... go ask StAck,
 Ted, Nick or one of the committers since they are more familiar with the
 internal workings of HBase than I. When you get a good answer, then have
 the online HBase book updated.

 -Mike

 PS... if you want a really good example of why not to use versioning to
 store temporal data...
 What happens if you're storing 100 versions of a cell and you find out
 that you have a duplicate entry with the wrong timestamp and you want to
 delete that one version.
 How do you do that? Going from memory, and I could very well be wrong, but
 the tombstone marker is on the cell, not the version, right?

 If it is on the version, what happens to the versions of the cell that are
 older than the tombstone marker?
 Sorry, its been a while since I've been intimate with HBase. Doing a bit
 of other things at the moment, and I'm already overtaxing my last remaining
 living brain cell.  ;-)


 On Apr 12, 2014, at 9:14 PM, Brian Jeltema bdjelt...@gmail.com wrote:

  I don't want to be argumentative here, but by definition is's not an
 internal feature because it's part of the
  public API. We use versioning in a way that makes me somewhat
 uncomfortable, but it's been quite
  useful. I'd like to see a clear explanation of why it exists and what
 use cases it was intended to support.
 
  Brian
 
  Since you asked...
 
  Simplest answer... your schema should not rely upon internal features of
 the system.  Since you are tracking your data along the lines of a temporal
 attribute it should be part of the schema. In terms of a good design, by
 making it a part of the schema, you're defining that the data has a
 temporal property/attribute.
 
  Cell versioning is an internal feature of HBase. Its there for a reason.
  Perhaps one of the committers should expand on why its there.  (When I
 asked this earlier, never got an answer. )
 
 
  Longer answer... review how HBase stores the rows, including the versions
 of the cell.
  You're putting an unnecessary stress on the system.
 
  Its just not Zen... ;-)
 
  The reason I'm a bit short on this topic is that its an issue that
 keeps coming up, over and over again because some idiot keeps looking to
 take a shortcut without understanding the implications of their decision.
 Just like salting the key. (Note:  prepending a truncated hash isn't the
 same as using a salt.  Salting has a specific meaning and the salt is
 orthogonal to the underlying key. Any relationship between the salt and the
 key is purely random luck.)
 
  Does that help?
  (BTW, this should be part of any schema design talk... yet somehow I
 think its not covered... )
 
  -Mike
 
  PS. Its not weird that the cell versions are checked. It makes perfect
 sense.
 
  On Apr 12, 2014, at 2:55 PM, Guillermo Ortiz konstt2...@gmail.com
 wrote:
 
  Well, It was just a example why I could keep a thousand versions or a
 cell.
  I didn't know that HBase was checking each version when I do a scan,
 it's a
  little weird when data is sorted.
 
  You get my attention with your comment, that it's better to store data
 over
  time with new columns that with versions. Why is it better?
  Versions looks that there're very convenient for that use case. So,
 does it
  work better a rowkey with 3600 columns, that a rowkey with a column
 with
  3600 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x 
700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously was 
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and 
baching parameters, but I don't think they're going to affect too much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:


Another little question is, when the filter I'm using, Do I check all the
versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value 5 in all the dataset or I'm just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:


I was trying to check the behaviour of HBase. The cluster is a group of
old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with 100
versions.
There's another column faimily with four columns an one image of 100kb.
  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of it). I
thought that it was going to have too much calls to the GC. I'm not sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I think
that it's not working because the hardware isn't enough, although, I would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

  I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have been
checking the logs from GC, HMaster and some RegionServers and I didn't see
anything weird. I tried as well to try with a couple of caching values.




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_




--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz

Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not 
consider old versions, so, it just checks the newest. But, I reduce 10 
times the size (in versions) and I got a 10x of performance.


The filter is scan 'filters', {FILTER = ValueFilter(=, 
'binary:5'),STARTROW = '10100101', 
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:
Well, I guessed that, what it doesn't make too much sense because it's 
so slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes 
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows 
x 700Kbytes = 70Mb, since it just check the newest version. How can it 
spend too many time checking this quantity of data?


I'm generating again the dataset with a bigger blocksize (previously 
was 64Kb, now, it's going to be 1Mb). I could try tunning the scanning 
and baching parameters, but I don't think they're going to affect too 
much.


Another test I want to do, it's generate the same dataset with just 
100versions, It should spend around the same time, right? Or am I wrong?


On 10/04/14 18:08, Ted Yu wrote:

It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check 
all the
versions? or just the newest? Because, I'm wondering if when I do a 
scan
over all the table, I look for the value 5 in all the dataset or 
I'm just

looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a 
group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column 
with 100

versions.
There's another column faimily with four columns an one image of 
100kb.

  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced

in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I
thought that it was going to have too much calls to the GC. I'm not 
sure

about this point.

I know that it's not the best way to use HBase, it's just a test. I 
think
that it's not working because the hardware isn't enough, although, 
I would

like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:


Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

  I got this error when I execute a full scan with filters about a 
table.
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException:

org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
 at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 



 at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not
working.. what else could I try?? The table isn't too big. I have 
been
checking the logs from GC, HMaster and some RegionServers and I 
didn't see
anything weird. I tried as well to try with a couple of caching 
values.





--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_







--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:

 Last test I have done it's to reduce the number of versions to 100.
 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.

 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.

 What's the reasons of this? I guess HBase is enough smart for not consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.

 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}



 On 11/04/14 09:04, gortiz wrote:

 Well, I guessed that, what it doesn't make too much sense because it's so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it spend
 too many time checking this quantity of data?

 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.

 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?

 On 10/04/14 18:08, Ted Yu wrote:

 It should be newest version of each value.

 Cheers


 On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

 Another little question is, when the filter I'm using, Do I check all the
 versions? or just the newest? Because, I'm wondering if when I do a scan
 over all the table, I look for the value 5 in all the dataset or I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group of
 old computers, one master, five slaves, each one with 2Gb, so, 12gb in
 total.
 The table has a column family with 1000 columns and each column with
 100
 versions.
 There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
 The table is partitioned manually in all the slaves, so data are
 balanced
 in the cluster.

 I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
 'binary:5')* in HBase 0.94.6
 My time for lease and rpc is three minutes.
 Since, it's a full scan of the table, I have been playing with the
 BLOCKCACHE as well (just disable and enable, not about the size of
 it). I
 thought that it was going to have too much calls to the GC. I'm not
 sure
 about this point.

 I know that it's not the best way to use HBase, it's just a test. I
 think
 that it's not working because the hardware isn't enough, although, I
 would
 like to try some kind of tunning to improve it.








 On 10/04/14 14:21, Ted Yu wrote:

 Can you give us a bit more information:

 HBase release you're running
 What filters are used for the scan

 Thanks

 On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

   I got this error when I execute a full scan with filters about a
 table.

 Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
 regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4165751462641113359' does not exist
  at 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)


  at org.apache.hadoop.hbase.regionserver.HRegionServer.
 next(HRegionServer.java:2482)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
 WritableRpcEngine.java:320)
  at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
 HBaseServer.java:1428)

 I have read about increase the lease time and rpc time, but it's not
 working.. what else could I try?? The table isn't too big. I have
 been
 checking the logs from GC, HMaster and some RegionServers and I
 didn't see
 anything weird. I tried as well to try with a couple of caching
 values.


 --
 *Guillermo Ortiz*
 /Big Data Developer/

 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Yes, I have tried with two different values for that value of versions, 
1000 and maximum value for integers.


But, I want to keep those versions. I don't want to keep just 3 
versions. Imagine that I want to record a new version each minute and 
store a day, those are 1440 versions.


Why is HBase going to read all the versions?? , I thought, if you don't 
indicate any versions it's just read the newest and skip the rest. It 
doesn't make too much sense to read all of them if data is sorted, plus 
the newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:

What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:


Last test I have done it's to reduce the number of versions to 100.
So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER = ValueFilter(=,
'binary:5'),STARTROW = '10100101',
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:


Well, I guessed that, what it doesn't make too much sense because it's so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:


It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check all the

versions? or just the newest? Because, I'm wondering if when I do a scan
over all the table, I look for the value 5 in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group of

old computers, one master, five slaves, each one with 2Gb, so, 12gb in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of 100kb.
   (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

   I got this error when I execute a full scan with filters about a
table.


Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4165751462641113359' does not exist
  at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)


  at org.apache.hadoop.hbase.regionserver.HRegionServer.
next(HRegionServer.java:2482)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Anoop John
In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:

 Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.

 But, I want to keep those versions. I don't want to keep just 3 versions.
 Imagine that I want to record a new version each minute and store a day,
 those are 1440 versions.

 Why is HBase going to read all the versions?? , I thought, if you don't
 indicate any versions it's just read the newest and skip the rest. It
 doesn't make too much sense to read all of them if data is sorted, plus the
 newest version is stored in the top.



 On 11/04/14 11:54, Anoop John wrote:

  What is the max version setting u have done for ur table cf?  When u set
 some a value, HBase has to keep all those versions.  During a scan it will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?

 -Anoop-

 On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:

  Last test I have done it's to reduce the number of versions to 100.
 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.

 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.

 What's the reasons of this? I guess HBase is enough smart for not
 consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.

 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}



 On 11/04/14 09:04, gortiz wrote:

  Well, I guessed that, what it doesn't make too much sense because it's
 so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity of data?

 I'm generating again the dataset with a bigger blocksize (previously was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too much.

 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I wrong?

 On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.

 Cheers


 On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

 Another little question is, when the filter I'm using, Do I check all
 the

  versions? or just the newest? Because, I'm wondering if when I do a
 scan
 over all the table, I look for the value 5 in all the dataset or I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group
 of

 old computers, one master, five slaves, each one with 2Gb, so, 12gb
 in
 total.
 The table has a column family with 1000 columns and each column with
 100
 versions.
 There's another column faimily with four columns an one image of
 100kb.
(I've tried without this column family as well.)
 The table is partitioned manually in all the slaves, so data are
 balanced
 in the cluster.

 I'm executing this sentence *scan 'table1', {FILTER =
 ValueFilter(=,
 'binary:5')* in HBase 0.94.6
 My time for lease and rpc is three minutes.
 Since, it's a full scan of the table, I have been playing with the
 BLOCKCACHE as well (just disable and enable, not about the size of
 it). I
 thought that it was going to have too much calls to the GC. I'm not
 sure
 about this point.

 I know that it's not the best way to use HBase, it's just a test. I
 think
 that it's not working because the hardware isn't enough, although, I
 would
 like to try some kind of tunning to improve it.








 On 10/04/14 14:21, Ted Yu wrote:

 Can you give us a bit more information:

 HBase release you're running
 What filters are used for the scan

 Thanks

 On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

I got this error when I execute a full scan with filters about a
 table.

 Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.
 regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4165751462641113359' does not exist
   at org.apache.hadoop.hbase.regionserver.Leases.
 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread gortiz
Sorry, I didn't get it why it should read all the timestamps and not 
just the newest it they're sorted and you didn't specific any timestamp 
in your filter.



On 11/04/14 12:13, Anoop John wrote:

In the storage layer (HFiles in HDFS) all versions of a particular cell
will be staying together.  (Yes it has to be lexicographically ordered
KVs). So during a scan we will have to read all the version data.  At this
storage layer it doesn't know the versions stuff etc.

-Anoop-

On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:


Yes, I have tried with two different values for that value of versions,
1000 and maximum value for integers.

But, I want to keep those versions. I don't want to keep just 3 versions.
Imagine that I want to record a new version each minute and store a day,
those are 1440 versions.

Why is HBase going to read all the versions?? , I thought, if you don't
indicate any versions it's just read the newest and skip the rest. It
doesn't make too much sense to read all of them if data is sorted, plus the
newest version is stored in the top.



On 11/04/14 11:54, Anoop John wrote:


  What is the max version setting u have done for ur table cf?  When u set
some a value, HBase has to keep all those versions.  During a scan it will
read all those versions. In 94 version the default value for the max
versions is 3.  I guess you have set some bigger value.   If u have not,
mind testing after a major compaction?

-Anoop-

On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:

  Last test I have done it's to reduce the number of versions to 100.

So, right now, I have 100 rows with 100 versions each one.
Times are: (I got the same times for blocksize of 64Ks and 1Mb)
100row-1000versions + blockcache- 80s.
100row-1000versions + No blockcache- 70s.

100row-*100*versions + blockcache- 7.3s.
100row-*100*versions + No blockcache- 6.1s.

What's the reasons of this? I guess HBase is enough smart for not
consider
old versions, so, it just checks the newest. But, I reduce 10 times the
size (in versions) and I got a 10x of performance.

The filter is scan 'filters', {FILTER = ValueFilter(=,
'binary:5'),STARTROW = '10100101',
STOPROW = '60100201'}



On 11/04/14 09:04, gortiz wrote:

  Well, I guessed that, what it doesn't make too much sense because it's

so
slow. I only have right now 100 rows with 1000 versions each row.
I have checked the size of the dataset and each row is about 700Kbytes
(around 7Gb, 100rowsx1000versions). So, it should only check 100 rows x
700Kbytes = 70Mb, since it just check the newest version. How can it
spend
too many time checking this quantity of data?

I'm generating again the dataset with a bigger blocksize (previously was
64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
baching parameters, but I don't think they're going to affect too much.

Another test I want to do, it's generate the same dataset with just
100versions, It should spend around the same time, right? Or am I wrong?

On 10/04/14 18:08, Ted Yu wrote:

  It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

Another little question is, when the filter I'm using, Do I check all
the


  versions? or just the newest? Because, I'm wondering if when I do a
scan
over all the table, I look for the value 5 in all the dataset or I'm
just
looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:

I was trying to check the behaviour of HBase. The cluster is a group
of


old computers, one master, five slaves, each one with 2Gb, so, 12gb
in
total.
The table has a column family with 1000 columns and each column with
100
versions.
There's another column faimily with four columns an one image of
100kb.
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are
balanced
in the cluster.

I'm executing this sentence *scan 'table1', {FILTER =
ValueFilter(=,
'binary:5')* in HBase 0.94.6
My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the
BLOCKCACHE as well (just disable and enable, not about the size of
it). I
thought that it was going to have too much calls to the GC. I'm not
sure
about this point.

I know that it's not the best way to use HBase, it's just a test. I
think
that it's not working because the hardware isn't enough, although, I
would
like to try some kind of tunning to improve it.








On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:


HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

I got this error when I execute a full scan with filters about a
table.

Caused by: java.lang.RuntimeException: org.apache.hadoop.hbase.

regionserver.LeaseException:

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
I read something interesting about it in HBase TDG.

Page 344:
The StoreScanner class combines the store files and memstore that the
Store instance
contains. It is also where the exclusion happens, based on the Bloom
filter, or the timestamp. If you are asking for versions that are not more
than 30 minutes old, for example, you can skip all storage files that are
older than one hour: they will not contain anything of interest. See Key
Design on page 357 for details on the exclusion, and how to make use of
it.

So, I guess that it doesn't have to read all the HFiles?? But, I don't know
if HBase really uses the timestamp of each row or the date of the file. I
guess when I execute the scan, it reads everything, but, I don't know why.
I think there's something else that I don't see so that everything works to
me.


2014-04-11 13:05 GMT+02:00 gortiz gor...@pragsis.com:

 Sorry, I didn't get it why it should read all the timestamps and not just
 the newest it they're sorted and you didn't specific any timestamp in your
 filter.



 On 11/04/14 12:13, Anoop John wrote:

 In the storage layer (HFiles in HDFS) all versions of a particular cell
 will be staying together.  (Yes it has to be lexicographically ordered
 KVs). So during a scan we will have to read all the version data.  At this
 storage layer it doesn't know the versions stuff etc.

 -Anoop-

 On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:

  Yes, I have tried with two different values for that value of versions,
 1000 and maximum value for integers.

 But, I want to keep those versions. I don't want to keep just 3 versions.
 Imagine that I want to record a new version each minute and store a day,
 those are 1440 versions.

 Why is HBase going to read all the versions?? , I thought, if you don't
 indicate any versions it's just read the newest and skip the rest. It
 doesn't make too much sense to read all of them if data is sorted, plus
 the
 newest version is stored in the top.



 On 11/04/14 11:54, Anoop John wrote:

What is the max version setting u have done for ur table cf?  When u
 set
 some a value, HBase has to keep all those versions.  During a scan it
 will
 read all those versions. In 94 version the default value for the max
 versions is 3.  I guess you have set some bigger value.   If u have not,
 mind testing after a major compaction?

 -Anoop-

 On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:

   Last test I have done it's to reduce the number of versions to 100.

 So, right now, I have 100 rows with 100 versions each one.
 Times are: (I got the same times for blocksize of 64Ks and 1Mb)
 100row-1000versions + blockcache- 80s.
 100row-1000versions + No blockcache- 70s.

 100row-*100*versions + blockcache- 7.3s.
 100row-*100*versions + No blockcache- 6.1s.

 What's the reasons of this? I guess HBase is enough smart for not
 consider
 old versions, so, it just checks the newest. But, I reduce 10 times the
 size (in versions) and I got a 10x of performance.

 The filter is scan 'filters', {FILTER = ValueFilter(=,
 'binary:5'),STARTROW = '10100101',
 STOPROW = '60100201'}



 On 11/04/14 09:04, gortiz wrote:

   Well, I guessed that, what it doesn't make too much sense because
 it's

 so
 slow. I only have right now 100 rows with 1000 versions each row.
 I have checked the size of the dataset and each row is about 700Kbytes
 (around 7Gb, 100rowsx1000versions). So, it should only check 100 rows
 x
 700Kbytes = 70Mb, since it just check the newest version. How can it
 spend
 too many time checking this quantity of data?

 I'm generating again the dataset with a bigger blocksize (previously
 was
 64Kb, now, it's going to be 1Mb). I could try tunning the scanning and
 baching parameters, but I don't think they're going to affect too
 much.

 Another test I want to do, it's generate the same dataset with just
 100versions, It should spend around the same time, right? Or am I
 wrong?

 On 10/04/14 18:08, Ted Yu wrote:

   It should be newest version of each value.

 Cheers


 On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

 Another little question is, when the filter I'm using, Do I check all
 the

versions? or just the newest? Because, I'm wondering if when I do
 a
 scan
 over all the table, I look for the value 5 in all the dataset or
 I'm
 just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group
 of

  old computers, one master, five slaves, each one with 2Gb, so, 12gb
 in
 total.
 The table has a column family with 1000 columns and each column
 with
 100
 versions.
 There's another column faimily with four columns an one image of
 100kb.
 (I've tried without this column family as well.)
 The table is partitioned manually in all the slaves, so data are
 balanced
 in the cluster.

 I'm executing this sentence *scan 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Ted Yu
In your previous example:
scan 'table1', {FILTER = ValueFilter(=, 'binary:5')}

there was no expression w.r.t. timestamp. See the following javadoc from
Scan.java:

 * To only retrieve columns within a specific range of version timestamps,

 * execute {@link #setTimeRange(long, long) setTimeRange}.

 * p

 * To only retrieve columns with a specific timestamp, execute

 * {@link #setTimeStamp(long) setTimestamp}.

You can use one of the above methods to make your scan more selective.


ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
ReturnCode. You can refer to:

https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html

You can take a look at SingleColumnValueFilter#filterKeyValue() for example
of how various ReturnCode's are used to speed up scan.

Cheers


On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz konstt2...@gmail.comwrote:

 I read something interesting about it in HBase TDG.

 Page 344:
 The StoreScanner class combines the store files and memstore that the
 Store instance
 contains. It is also where the exclusion happens, based on the Bloom
 filter, or the timestamp. If you are asking for versions that are not more
 than 30 minutes old, for example, you can skip all storage files that are
 older than one hour: they will not contain anything of interest. See Key
 Design on page 357 for details on the exclusion, and how to make use of
 it.

 So, I guess that it doesn't have to read all the HFiles?? But, I don't know
 if HBase really uses the timestamp of each row or the date of the file. I
 guess when I execute the scan, it reads everything, but, I don't know why.
 I think there's something else that I don't see so that everything works to
 me.


 2014-04-11 13:05 GMT+02:00 gortiz gor...@pragsis.com:

  Sorry, I didn't get it why it should read all the timestamps and not just
  the newest it they're sorted and you didn't specific any timestamp in
 your
  filter.
 
 
 
  On 11/04/14 12:13, Anoop John wrote:
 
  In the storage layer (HFiles in HDFS) all versions of a particular cell
  will be staying together.  (Yes it has to be lexicographically ordered
  KVs). So during a scan we will have to read all the version data.  At
 this
  storage layer it doesn't know the versions stuff etc.
 
  -Anoop-
 
  On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:
 
   Yes, I have tried with two different values for that value of versions,
  1000 and maximum value for integers.
 
  But, I want to keep those versions. I don't want to keep just 3
 versions.
  Imagine that I want to record a new version each minute and store a
 day,
  those are 1440 versions.
 
  Why is HBase going to read all the versions?? , I thought, if you don't
  indicate any versions it's just read the newest and skip the rest. It
  doesn't make too much sense to read all of them if data is sorted, plus
  the
  newest version is stored in the top.
 
 
 
  On 11/04/14 11:54, Anoop John wrote:
 
 What is the max version setting u have done for ur table cf?  When u
  set
  some a value, HBase has to keep all those versions.  During a scan it
  will
  read all those versions. In 94 version the default value for the max
  versions is 3.  I guess you have set some bigger value.   If u have
 not,
  mind testing after a major compaction?
 
  -Anoop-
 
  On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
 
Last test I have done it's to reduce the number of versions to 100.
 
  So, right now, I have 100 rows with 100 versions each one.
  Times are: (I got the same times for blocksize of 64Ks and 1Mb)
  100row-1000versions + blockcache- 80s.
  100row-1000versions + No blockcache- 70s.
 
  100row-*100*versions + blockcache- 7.3s.
  100row-*100*versions + No blockcache- 6.1s.
 
  What's the reasons of this? I guess HBase is enough smart for not
  consider
  old versions, so, it just checks the newest. But, I reduce 10 times
 the
  size (in versions) and I got a 10x of performance.
 
  The filter is scan 'filters', {FILTER = ValueFilter(=,
  'binary:5'),STARTROW = '10100101',
  STOPROW = '60100201'}
 
 
 
  On 11/04/14 09:04, gortiz wrote:
 
Well, I guessed that, what it doesn't make too much sense because
  it's
 
  so
  slow. I only have right now 100 rows with 1000 versions each row.
  I have checked the size of the dataset and each row is about
 700Kbytes
  (around 7Gb, 100rowsx1000versions). So, it should only check 100
 rows
  x
  700Kbytes = 70Mb, since it just check the newest version. How can it
  spend
  too many time checking this quantity of data?
 
  I'm generating again the dataset with a bigger blocksize (previously
  was
  64Kb, now, it's going to be 1Mb). I could try tunning the scanning
 and
  baching parameters, but I don't think they're going to affect too
  much.
 
  Another test I want to do, it's generate the same dataset with just
  100versions, It should spend around the same 

Re: Lease exception when I execute large scan with filters.

2014-04-11 Thread Guillermo Ortiz
Okay, thank you, I'll check it this Monday. I didn't know that Scan checks
all the versions.
So, I was checking each column and each version although it just showed me
the newest version because I didn't indicate anything about the VERSIONS
attribute. It makes sense that it takes so long.


2014-04-11 16:57 GMT+02:00 Ted Yu yuzhih...@gmail.com:

 In your previous example:
 scan 'table1', {FILTER = ValueFilter(=, 'binary:5')}

 there was no expression w.r.t. timestamp. See the following javadoc from
 Scan.java:

  * To only retrieve columns within a specific range of version timestamps,

  * execute {@link #setTimeRange(long, long) setTimeRange}.

  * p

  * To only retrieve columns with a specific timestamp, execute

  * {@link #setTimeStamp(long) setTimestamp}.

 You can use one of the above methods to make your scan more selective.


 ValueFilter#filterKeyValue(Cell) doesn't utilize advanced feature of
 ReturnCode. You can refer to:


 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.ReturnCode.html

 You can take a look at SingleColumnValueFilter#filterKeyValue() for example
 of how various ReturnCode's are used to speed up scan.

 Cheers


 On Fri, Apr 11, 2014 at 8:40 AM, Guillermo Ortiz konstt2...@gmail.com
 wrote:

  I read something interesting about it in HBase TDG.
 
  Page 344:
  The StoreScanner class combines the store files and memstore that the
  Store instance
  contains. It is also where the exclusion happens, based on the Bloom
  filter, or the timestamp. If you are asking for versions that are not
 more
  than 30 minutes old, for example, you can skip all storage files that are
  older than one hour: they will not contain anything of interest. See Key
  Design on page 357 for details on the exclusion, and how to make use of
  it.
 
  So, I guess that it doesn't have to read all the HFiles?? But, I don't
 know
  if HBase really uses the timestamp of each row or the date of the file. I
  guess when I execute the scan, it reads everything, but, I don't know
 why.
  I think there's something else that I don't see so that everything works
 to
  me.
 
 
  2014-04-11 13:05 GMT+02:00 gortiz gor...@pragsis.com:
 
   Sorry, I didn't get it why it should read all the timestamps and not
 just
   the newest it they're sorted and you didn't specific any timestamp in
  your
   filter.
  
  
  
   On 11/04/14 12:13, Anoop John wrote:
  
   In the storage layer (HFiles in HDFS) all versions of a particular
 cell
   will be staying together.  (Yes it has to be lexicographically ordered
   KVs). So during a scan we will have to read all the version data.  At
  this
   storage layer it doesn't know the versions stuff etc.
  
   -Anoop-
  
   On Fri, Apr 11, 2014 at 3:33 PM, gortiz gor...@pragsis.com wrote:
  
Yes, I have tried with two different values for that value of
 versions,
   1000 and maximum value for integers.
  
   But, I want to keep those versions. I don't want to keep just 3
  versions.
   Imagine that I want to record a new version each minute and store a
  day,
   those are 1440 versions.
  
   Why is HBase going to read all the versions?? , I thought, if you
 don't
   indicate any versions it's just read the newest and skip the rest. It
   doesn't make too much sense to read all of them if data is sorted,
 plus
   the
   newest version is stored in the top.
  
  
  
   On 11/04/14 11:54, Anoop John wrote:
  
  What is the max version setting u have done for ur table cf?
  When u
   set
   some a value, HBase has to keep all those versions.  During a scan
 it
   will
   read all those versions. In 94 version the default value for the max
   versions is 3.  I guess you have set some bigger value.   If u have
  not,
   mind testing after a major compaction?
  
   -Anoop-
  
   On Fri, Apr 11, 2014 at 1:01 PM, gortiz gor...@pragsis.com wrote:
  
 Last test I have done it's to reduce the number of versions to
 100.
  
   So, right now, I have 100 rows with 100 versions each one.
   Times are: (I got the same times for blocksize of 64Ks and 1Mb)
   100row-1000versions + blockcache- 80s.
   100row-1000versions + No blockcache- 70s.
  
   100row-*100*versions + blockcache- 7.3s.
   100row-*100*versions + No blockcache- 6.1s.
  
   What's the reasons of this? I guess HBase is enough smart for not
   consider
   old versions, so, it just checks the newest. But, I reduce 10 times
  the
   size (in versions) and I got a 10x of performance.
  
   The filter is scan 'filters', {FILTER = ValueFilter(=,
   'binary:5'),STARTROW =
 '10100101',
   STOPROW = '60100201'}
  
  
  
   On 11/04/14 09:04, gortiz wrote:
  
 Well, I guessed that, what it doesn't make too much sense because
   it's
  
   so
   slow. I only have right now 100 rows with 1000 versions each row.
   I have checked the size of the dataset and each row is about
  700Kbytes
   (around 7Gb, 100rowsx1000versions). So, it should only check 100
 

Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz

I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have been 
checking the logs from GC, HMaster and some RegionServers and I didn't 
see anything weird. I tried as well to try with a couple of caching values.


Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread Ted Yu
Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

 I got this error when I execute a full scan with filters about a table.
 
 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hbase.regionserver.LeaseException: 
 org.apache.hadoop.hbase.regionserver.LeaseException: lease 
 '-4165751462641113359' does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)
 
 I have read about increase the lease time and rpc time, but it's not 
 working.. what else could I try?? The table isn't too big. I have been 
 checking the logs from GC, HMaster and some RegionServers and I didn't see 
 anything weird. I tried as well to try with a couple of caching values.


Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
I was trying to check the behaviour of HBase. The cluster is a group of 
old computers, one master, five slaves, each one with 2Gb, so, 12gb in 
total.
The table has a column family with 1000 columns and each column with 100 
versions.
There's another column faimily with four columns an one image of 100kb.  
(I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 
'binary:5')* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of it). 
I thought that it was going to have too much calls to the GC. I'm not 
sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, although, 
I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)

I have read about increase the lease time and rpc time, but it's not working.. 
what else could I try?? The table isn't too big. I have been checking the logs 
from GC, HMaster and some RegionServers and I didn't see anything weird. I 
tried as well to try with a couple of caching values.



--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread gortiz
Another little question is, when the filter I'm using, Do I check all 
the versions? or just the newest? Because, I'm wondering if when I do a 
scan over all the table, I look for the value 5 in all the dataset or 
I'm just looking for in one newest version of each value.


On 10/04/14 16:52, gortiz wrote:
I was trying to check the behaviour of HBase. The cluster is a group 
of old computers, one master, five slaves, each one with 2Gb, so, 12gb 
in total.
The table has a column family with 1000 columns and each column with 
100 versions.
There's another column faimily with four columns an one image of 
100kb.  (I've tried without this column family as well.)
The table is partitioned manually in all the slaves, so data are 
balanced in the cluster.


I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=, 
'binary:5')* in HBase 0.94.6

My time for lease and rpc is three minutes.
Since, it's a full scan of the table, I have been playing with the 
BLOCKCACHE as well (just disable and enable, not about the size of 
it). I thought that it was going to have too much calls to the GC. I'm 
not sure about this point.


I know that it's not the best way to use HBase, it's just a test. I 
think that it's not working because the hardware isn't enough, 
although, I would like to try some kind of tunning to improve it.









On 10/04/14 14:21, Ted Yu wrote:

Can you give us a bit more information:

HBase release you're running
What filters are used for the scan

Thanks

On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:


I got this error when I execute a full scan with filters about a table.

Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hbase.regionserver.LeaseException: 
org.apache.hadoop.hbase.regionserver.LeaseException: lease 
'-4165751462641113359' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231) 

at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2482)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1428)


I have read about increase the lease time and rpc time, but it's not 
working.. what else could I try?? The table isn't too big. I have 
been checking the logs from GC, HMaster and some RegionServers and I 
didn't see anything weird. I tried as well to try with a couple of 
caching values.






--
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_



Re: Lease exception when I execute large scan with filters.

2014-04-10 Thread Ted Yu
It should be newest version of each value.

Cheers


On Thu, Apr 10, 2014 at 9:55 AM, gortiz gor...@pragsis.com wrote:

 Another little question is, when the filter I'm using, Do I check all the
 versions? or just the newest? Because, I'm wondering if when I do a scan
 over all the table, I look for the value 5 in all the dataset or I'm just
 looking for in one newest version of each value.


 On 10/04/14 16:52, gortiz wrote:

 I was trying to check the behaviour of HBase. The cluster is a group of
 old computers, one master, five slaves, each one with 2Gb, so, 12gb in
 total.
 The table has a column family with 1000 columns and each column with 100
 versions.
 There's another column faimily with four columns an one image of 100kb.
  (I've tried without this column family as well.)
 The table is partitioned manually in all the slaves, so data are balanced
 in the cluster.

 I'm executing this sentence *scan 'table1', {FILTER = ValueFilter(=,
 'binary:5')* in HBase 0.94.6
 My time for lease and rpc is three minutes.
 Since, it's a full scan of the table, I have been playing with the
 BLOCKCACHE as well (just disable and enable, not about the size of it). I
 thought that it was going to have too much calls to the GC. I'm not sure
 about this point.

 I know that it's not the best way to use HBase, it's just a test. I think
 that it's not working because the hardware isn't enough, although, I would
 like to try some kind of tunning to improve it.








 On 10/04/14 14:21, Ted Yu wrote:

 Can you give us a bit more information:

 HBase release you're running
 What filters are used for the scan

 Thanks

 On Apr 10, 2014, at 2:36 AM, gortiz gor...@pragsis.com wrote:

  I got this error when I execute a full scan with filters about a table.

 Caused by: java.lang.RuntimeException: 
 org.apache.hadoop.hbase.regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4165751462641113359' does not exist
 at 
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)

 at org.apache.hadoop.hbase.regionserver.HRegionServer.
 next(HRegionServer.java:2482)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
 WritableRpcEngine.java:320)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
 HBaseServer.java:1428)

 I have read about increase the lease time and rpc time, but it's not
 working.. what else could I try?? The table isn't too big. I have been
 checking the logs from GC, HMaster and some RegionServers and I didn't see
 anything weird. I tried as well to try with a couple of caching values.





 --
 *Guillermo Ortiz*
 /Big Data Developer/

 Telf.: +34 917 680 490
 Fax: +34 913 833 301
 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

 _http://www.bidoop.es_




Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:

property
!-- Loaded from hbase-site.xml --
namehbase.regionserver.lease.period/name
value90/value
/property

property
!-- Loaded from hbase-site.xml --
namehbase.rpc.timeout/name
value90/value
/property


We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Dhaval Shah
Couple of things:
- Can you check the resources on the region server for which you get the lease 
exception? It seems like the server is heavily thrashed
- What are your values for scan.setCaching and scan.setBatch? 



The lease does not exist exception generally happens when the client goes back 
to the region server after the lease expires (in your case 90). If you 
setCaching is really high for example, the client gets enough data in one call 
to scanner.next and keeps processing it for  90 ms and when it eventually 
goes back to the region server, the lease on the region server has already 
expired. Setting your setCaching value lower might help in this case

Regards,
Dhaval



From: Ameya Kanitkar am...@groupon.com
To: user@hbase.apache.org 
Sent: Wednesday, 28 August 2013 11:00 AM
Subject: Lease Exception Errors When Running Heavy Map Reduce Job


HI All,

We have a very heavy map reduce job that goes over entire table with over
1TB+ data in HBase and exports all data (Similar to Export job but with
some additional custom code built in) to HDFS.

However this job is not very stable, and often times we get following error
and job fails:

org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-4456594242606811626' does not exist
    at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
    at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
    at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
    at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.


Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

We have changed following settings in HBase to counter this problem
but issue persists:

property
!-- Loaded from hbase-site.xml --
namehbase.regionserver.lease.period/name
value90/value
/property

property
!-- Loaded from hbase-site.xml --
namehbase.rpc.timeout/name
value90/value
/property


We also reduced number of mappers per RS less than available CPU's on the box.

We also observed that problem once happens, happens multiple times on
the same RS. All other regions are unaffected. But different RS
observes this problem on different days. There is no particular region
causing this either.

We are running: 0.94.2 with cdh4.2.0

Any ideas?


Ameya 


Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ted Yu
From the log you posted on pastebin, I see the following.
Can you check namenode log to see what went wrong ?


   1. Caused by:
   org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
   
/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
   File does not exist. [Lease.  Holder:
   
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
   pendingcreates: 1]



On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com wrote:

 HI All,

 We have a very heavy map reduce job that goes over entire table with over
 1TB+ data in HBase and exports all data (Similar to Export job but with
 some additional custom code built in) to HDFS.

 However this job is not very stable, and often times we get following error
 and job fails:

 org.apache.hadoop.hbase.regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease
 '-4456594242606811626' does not exist
 at
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
 at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)

 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.


 Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb

 We have changed following settings in HBase to counter this problem
 but issue persists:

 property
 !-- Loaded from hbase-site.xml --
 namehbase.regionserver.lease.period/name
 value90/value
 /property

 property
 !-- Loaded from hbase-site.xml --
 namehbase.rpc.timeout/name
 value90/value
 /property


 We also reduced number of mappers per RS less than available CPU's on the
 box.

 We also observed that problem once happens, happens multiple times on
 the same RS. All other regions are unaffected. But different RS
 observes this problem on different days. There is no particular region
 causing this either.

 We are running: 0.94.2 with cdh4.2.0

 Any ideas?


 Ameya



Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Thanks for your response.

I checked namenode logs and I find following:

2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
lease [Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
from client
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
2013-08-28 15:25:24,025 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder:
DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
pendingcreates: 1],
src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
closed.

There are LeaseException errors on namenode as well:
http://pastebin.com/4feVcL1F Not sure why its happening.

I do not think I am ending up with any timeouts, as my jobs fail within
couple of minutes, while all my time outs are 10 minutes+
Not sure why above would

Ameya



On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote:

 From the log you posted on pastebin, I see the following.
 Can you check namenode log to see what went wrong ?


1. Caused by:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
 on

  
 /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
File does not exist. [Lease.  Holder:

  
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
pendingcreates: 1]



 On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com wrote:

  HI All,
 
  We have a very heavy map reduce job that goes over entire table with over
  1TB+ data in HBase and exports all data (Similar to Export job but with
  some additional custom code built in) to HDFS.
 
  However this job is not very stable, and often times we get following
 error
  and job fails:
 
  org.apache.hadoop.hbase.regionserver.LeaseException:
  org.apache.hadoop.hbase.regionserver.LeaseException: lease
  '-4456594242606811626' does not exist
  at
  org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
  at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.
 
 
  Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
 
  We have changed following settings in HBase to counter this problem
  but issue persists:
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.regionserver.lease.period/name
  value90/value
  /property
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.rpc.timeout/name
  value90/value
  /property
 
 
  We also reduced number of mappers per RS less than available CPU's on the
  box.
 
  We also observed that problem once happens, happens multiple times on
  the same RS. All other regions are unaffected. But different RS
  observes this problem on different days. There is no particular region
  causing this either.
 
  We are running: 0.94.2 with cdh4.2.0
 
  Any ideas?
 
 
  Ameya
 



Re: Lease Exception Errors When Running Heavy Map Reduce Job

2013-08-28 Thread Ameya Kanitkar
Any ideas? Anyone?


On Wed, Aug 28, 2013 at 9:36 AM, Ameya Kanitkar am...@groupon.com wrote:

 Thanks for your response.

 I checked namenode logs and I find following:

 2013-08-28 15:25:24,025 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: recoverLease: recover
 lease [Lease.  Holder:
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
 pendingcreates: 1],
 src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
 from client
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25
 2013-08-28 15:25:24,025 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
 lease=[Lease.  Holder:
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1377700014053_-346895658_25,
 pendingcreates: 1],
 src=/hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1377700014053-splitting/smartdeals-hbase14-snc1.snc1%2C60020%2C1377700014053.1377700015413
 2013-08-28 15:25:24,025 WARN org.apache.hadoop.hdfs.StateChange: BLOCK*
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file
 closed.

 There are LeaseException errors on namenode as well:
 http://pastebin.com/4feVcL1F Not sure why its happening.

 I do not think I am ending up with any timeouts, as my jobs fail within
 couple of minutes, while all my time outs are 10 minutes+
 Not sure why above would

 Ameya



 On Wed, Aug 28, 2013 at 9:00 AM, Ted Yu yuzhih...@gmail.com wrote:

 From the log you posted on pastebin, I see the following.
 Can you check namenode log to see what went wrong ?


1. Caused by:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease
 on

  
 /hbase/.logs/smartdeals-hbase14-snc1.snc1,60020,1376944419197/smartdeals-hbase14-snc1.snc1%2C60020%2C1376944419197.1377699297514
File does not exist. [Lease.  Holder:

  
 DFSClient_hb_rs_smartdeals-hbase14-snc1.snc1,60020,1376944419197_-413917755_25,
pendingcreates: 1]



 On Wed, Aug 28, 2013 at 8:00 AM, Ameya Kanitkar am...@groupon.com
 wrote:

  HI All,
 
  We have a very heavy map reduce job that goes over entire table with
 over
  1TB+ data in HBase and exports all data (Similar to Export job but with
  some additional custom code built in) to HDFS.
 
  However this job is not very stable, and often times we get following
 error
  and job fails:
 
  org.apache.hadoop.hbase.regionserver.LeaseException:
  org.apache.hadoop.hbase.regionserver.LeaseException: lease
  '-4456594242606811626' does not exist
  at
  org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:231)
  at
 
 org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2429)
  at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
  at
 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1400)
 
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
  Method)
  at
 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
  at java.lang.reflect.Constructor.newInstance(Constructor.
 
 
  Here are more detailed logs on the RS: http://pastebin.com/xaHF4ksb
 
  We have changed following settings in HBase to counter this problem
  but issue persists:
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.regionserver.lease.period/name
  value90/value
  /property
 
  property
  !-- Loaded from hbase-site.xml --
  namehbase.rpc.timeout/name
  value90/value
  /property
 
 
  We also reduced number of mappers per RS less than available CPU's on
 the
  box.
 
  We also observed that problem once happens, happens multiple times on
  the same RS. All other regions are unaffected. But different RS
  observes this problem on different days. There is no particular region
  causing this either.
 
  We are running: 0.94.2 with cdh4.2.0
 
  Any ideas?
 
 
  Ameya