Loading HBase table into HDFS
Can any one please guide me to load the HBase table in to HDFS with specific columnfamily. thank, karthik
Re: Loading HBase table into HDFS
You can use export command, but not sure if u can export just one column family http://hbase.apache.org/0.94/book/ops_mgt.html#export On Sep 21, 2016 10:42 PM, "Dima Spivak"wrote: > Hey Karthik, > > This blog post [1] by our very own JD Cryans is a good place to start > understanding bulk load. > > 1. > http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk- > loading-and-why/ > > On Wednesday, September 21, 2016, karthi keyan > wrote: > > > Can any one please guide me to load the HBase table in to HDFS with > > specific columnfamily. > > > > thank, > > karthik > > > > > -- > -Dima >
Re: Loading HBase table into HDFS
By specifying "hbase.mapreduce.scan.column.family", you can export selected column family. On Wed, Sep 21, 2016 at 8:49 AM, sudhir patilwrote: > You can use export command, but not sure if u can export just one column > family http://hbase.apache.org/0.94/book/ops_mgt.html#export > > On Sep 21, 2016 10:42 PM, "Dima Spivak" wrote: > > > Hey Karthik, > > > > This blog post [1] by our very own JD Cryans is a good place to start > > understanding bulk load. > > > > 1. > > http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk- > > loading-and-why/ > > > > On Wednesday, September 21, 2016, karthi keyan < > karthi93.san...@gmail.com> > > wrote: > > > > > Can any one please guide me to load the HBase table in to HDFS with > > > specific columnfamily. > > > > > > thank, > > > karthik > > > > > > > > > -- > > -Dima > > >
Re: Loading HBase table into HDFS
Can you clarify your scenario ? Normally hbase is backed by hdfs - the table is already stored on hdfs. Cheers On Wed, Sep 21, 2016 at 4:46 AM, karthi keyanwrote: > Can any one please guide me to load the HBase table in to HDFS with > specific columnfamily. > > thank, > karthik >
Re: Loading HBase table into HDFS
Hey Karthik, This blog post [1] by our very own JD Cryans is a good place to start understanding bulk load. 1. http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ On Wednesday, September 21, 2016, karthi keyanwrote: > Can any one please guide me to load the HBase table in to HDFS with > specific columnfamily. > > thank, > karthik > -- -Dima
Hbase throttling issues
Hi Hbase experts, Our application is unable to scan or read from hbase tables, when throttling is set. We are getting ThrottlingException every time. The error is seen more frequently when the number of hbase pre splits is increased. The size tables for which this error is showing is empty ( during some runs it was very low in the order of few kbs). Have tried both rate limiters already - average and fixed. Can't understand why read rate limit exceeds when there is hardly any data in hbase. Anyone has faced this issue before? Setup Details: Hbase version : 1.1.2 Number of region servers :4 Number of regions : 116 HeapMemory for Region Server : 2GB Quotas set : TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT => 10G/sec, SCOPE => MACHINE Following is the error we faced. Pasting debug logs of regions server: 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] quotas.RegionServerQuotaManager: Throttling exception for user=root table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit exceeded - wait 0.00sec 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler=55,queue=1,port=58526] ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: 52 service: ClientService methodName: Scan size: 28 connection: 10.65.141.170:42806 org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit exceeded - wait 0.00sec at org.apache.hadoop.hbase.quotas.ThrottlingException.throwThrottlingException(ThrottlingException.java:107) at org.apache.hadoop.hbase.quotas.ThrottlingException.throwReadSizeExceeded(ThrottlingException.java:101) at org.apache.hadoop.hbase.quotas.TimeBasedLimiter.checkQuota(TimeBasedLimiter.java:139) at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota(DefaultOperationQuota.java:59) at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(RegionServerQuotaManager.java:180) at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota(RegionServerQuotaManager.java:125) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2265) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) Thanks, Sumit
Re: Hbase throttling issues
Did you try to use REQUEST_NUMBER as throttle type? 2016-09-22 11:33 GMT+08:00 Sumit Nigam: > Hi Guanghao, > > This throttling error shows up as soon as I start hbase. So, ideally there > shouldn't be too many prior operations in play here. Plus, the error shows > up even when my table has hardly any data (possibly in kbs) and I have set > throttling limit to be ~10GB. > > Thanks,Sumit > > From: Guanghao Zhang > To: user@hbase.apache.org; Sumit Nigam > Sent: Thursday, September 22, 2016 7:37 AM > Subject: Re: Hbase throttling issues > > All scan operations were throttled? Now it use the avg size of all previous > operations to check quota. Maybe the previous scan operation read too much > data. > > 2016-09-22 1:18 GMT+08:00 Sumit Nigam : > > > Hi Hbase experts, > > > > Our application is unable to scan or read from hbase tables, when > > throttling is set. We are getting ThrottlingException every time. The > error > > is seen more frequently when the number of hbase pre splits is > > increased. The size tables for which this error is showing is empty ( > > during some runs it was very low in the order of few kbs). Have tried > both > > rate limiters already - average and fixed. Can't understand why read rate > > limit exceeds when there is hardly any data in hbase. Anyone has faced > this > > issue before? > > > > Setup Details: > > > > Hbase version : 1.1.2 > > Number of region servers :4 > > Number of regions : 116 > > HeapMemory for Region Server : 2GB > > > > Quotas set : > > TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, > LIMIT > > => 10G/sec, SCOPE => MACHINE > > TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, > LIMIT > > => 10G/sec, SCOPE => MACHINE > > > > Following is the error we faced. Pasting debug logs of regions server: > > > > 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler= > 55,queue=1,port=58526] > > quotas.RegionServerQuotaManager: Throttling exception for user=root > > table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit > > exceeded - wait 0.00sec > > > > 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler= > 55,queue=1,port=58526] > > ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: > > 52 service: ClientService methodName: Scan size: 28 connection: > > 10.65.141.170:42806 > > > > org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit > > exceeded - wait 0.00sec > > > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > > throwThrottlingException(ThrottlingException.java:107) > > > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > > throwReadSizeExceeded(ThrottlingException.java:101) > > > > at org.apache.hadoop.hbase.quotas.TimeBasedLimiter. > > checkQuota(TimeBasedLimiter.java:139) > > > > at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota( > > DefaultOperationQuota.java:59) > > > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > > RegionServerQuotaManager.java:180) > > > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > > RegionServerQuotaManager.java:125) > > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > scan(RSRpcServices.java:2265) > > > > at org.apache.hadoop.hbase.protobuf.generated. > > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) > > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > > > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > > RpcExecutor.java:130) > > > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > Thanks, > > > > Sumit > > > >
Re: Hbase throttling issues
No, did not try request number. I want to use size as my throttling factor. Thanks again! From: Guanghao ZhangTo: user@hbase.apache.org; Sumit Nigam Sent: Thursday, September 22, 2016 9:28 AM Subject: Re: Hbase throttling issues Did you try to use REQUEST_NUMBER as throttle type? 2016-09-22 11:33 GMT+08:00 Sumit Nigam : > Hi Guanghao, > > This throttling error shows up as soon as I start hbase. So, ideally there > shouldn't be too many prior operations in play here. Plus, the error shows > up even when my table has hardly any data (possibly in kbs) and I have set > throttling limit to be ~10GB. > > Thanks,Sumit > > From: Guanghao Zhang > To: user@hbase.apache.org; Sumit Nigam > Sent: Thursday, September 22, 2016 7:37 AM > Subject: Re: Hbase throttling issues > > All scan operations were throttled? Now it use the avg size of all previous > operations to check quota. Maybe the previous scan operation read too much > data. > > 2016-09-22 1:18 GMT+08:00 Sumit Nigam : > > > Hi Hbase experts, > > > > Our application is unable to scan or read from hbase tables, when > > throttling is set. We are getting ThrottlingException every time. The > error > > is seen more frequently when the number of hbase pre splits is > > increased. The size tables for which this error is showing is empty ( > > during some runs it was very low in the order of few kbs). Have tried > both > > rate limiters already - average and fixed. Can't understand why read rate > > limit exceeds when there is hardly any data in hbase. Anyone has faced > this > > issue before? > > > > Setup Details: > > > > Hbase version : 1.1.2 > > Number of region servers :4 > > Number of regions : 116 > > HeapMemory for Region Server : 2GB > > > > Quotas set : > > TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, > LIMIT > > => 10G/sec, SCOPE => MACHINE > > TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, > LIMIT > > => 10G/sec, SCOPE => MACHINE > > > > Following is the error we faced. Pasting debug logs of regions server: > > > > 2016-09-17 22:35:40,674 DEBUG [B.defaultRpcServer.handler= > 55,queue=1,port=58526] > > quotas.RegionServerQuotaManager: Throttling exception for user=root > > table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit > > exceeded - wait 0.00sec > > > > 2016-09-17 22:35:40,676 DEBUG [B.defaultRpcServer.handler= > 55,queue=1,port=58526] > > ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: > > 52 service: ClientService methodName: Scan size: 28 connection: > > 10.65.141.170:42806 > > > > org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit > > exceeded - wait 0.00sec > > > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > > throwThrottlingException(ThrottlingException.java:107) > > > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > > throwReadSizeExceeded(ThrottlingException.java:101) > > > > at org.apache.hadoop.hbase.quotas.TimeBasedLimiter. > > checkQuota(TimeBasedLimiter.java:139) > > > > at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota( > > DefaultOperationQuota.java:59) > > > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > > RegionServerQuotaManager.java:180) > > > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > > RegionServerQuotaManager.java:125) > > > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > > scan(RSRpcServices.java:2265) > > > > at org.apache.hadoop.hbase.protobuf.generated. > > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) > > > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > > > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > > RpcExecutor.java:130) > > > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > Thanks, > > > > Sumit > > > >
[ANNOUNCE] YCSB 0.11.0 released
On behalf of the development community, I'm pleased to announce the release of YCSB version 0.11.0 Highlights: * Support for ArangoDB. This is a new binding. * Update to Apache Geode (incubating) to improve memory footprint. * "couchbase" client deprecated in favor of "couchbase2". * Capability to specify TTL for Couchbase2. * Various Elasticsearch improvements. * Kudu binding updated for version 0.9.0. * Fix for issue with hdrhistogram+raw. * Performance optimizations for BasicDB and RandomByteIterator. Full release notes, including links to source and convenience binaries: https://github.com/brianfrankcooper/YCSB/releases/tag/0.11.0 This release covers changes since the beginning of July. Govind
Re: Hbase throttling issues
Hi Guanghao, This throttling error shows up as soon as I start hbase. So, ideally there shouldn't be too many prior operations in play here. Plus, the error shows up even when my table has hardly any data (possibly in kbs) and I have set throttling limit to be ~10GB. Thanks,Sumit From: Guanghao ZhangTo: user@hbase.apache.org; Sumit Nigam Sent: Thursday, September 22, 2016 7:37 AM Subject: Re: Hbase throttling issues All scan operations were throttled? Now it use the avg size of all previous operations to check quota. Maybe the previous scan operation read too much data. 2016-09-22 1:18 GMT+08:00 Sumit Nigam : > Hi Hbase experts, > > Our application is unable to scan or read from hbase tables, when > throttling is set. We are getting ThrottlingException every time. The error > is seen more frequently when the number of hbase pre splits is > increased. The size tables for which this error is showing is empty ( > during some runs it was very low in the order of few kbs). Have tried both > rate limiters already - average and fixed. Can't understand why read rate > limit exceeds when there is hardly any data in hbase. Anyone has faced this > issue before? > > Setup Details: > > Hbase version : 1.1.2 > Number of region servers :4 > Number of regions : 116 > HeapMemory for Region Server : 2GB > > Quotas set : > TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT > => 10G/sec, SCOPE => MACHINE > TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT > => 10G/sec, SCOPE => MACHINE > > Following is the error we faced. Pasting debug logs of regions server: > > 2016-09-17 22:35:40,674 DEBUG > [B.defaultRpcServer.handler=55,queue=1,port=58526] > quotas.RegionServerQuotaManager: Throttling exception for user=root > table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit > exceeded - wait 0.00sec > > 2016-09-17 22:35:40,676 DEBUG > [B.defaultRpcServer.handler=55,queue=1,port=58526] > ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: > 52 service: ClientService methodName: Scan size: 28 connection: > 10.65.141.170:42806 > > org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit > exceeded - wait 0.00sec > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > throwThrottlingException(ThrottlingException.java:107) > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > throwReadSizeExceeded(ThrottlingException.java:101) > > at org.apache.hadoop.hbase.quotas.TimeBasedLimiter. > checkQuota(TimeBasedLimiter.java:139) > > at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota( > DefaultOperationQuota.java:59) > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > RegionServerQuotaManager.java:180) > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > RegionServerQuotaManager.java:125) > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > scan(RSRpcServices.java:2265) > > at org.apache.hadoop.hbase.protobuf.generated. > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > RpcExecutor.java:130) > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > at java.lang.Thread.run(Thread.java:745) > > > Thanks, > > Sumit
Re: Hbase throttling issues
All scan operations were throttled? Now it use the avg size of all previous operations to check quota. Maybe the previous scan operation read too much data. 2016-09-22 1:18 GMT+08:00 Sumit Nigam: > Hi Hbase experts, > > Our application is unable to scan or read from hbase tables, when > throttling is set. We are getting ThrottlingException every time. The error > is seen more frequently when the number of hbase pre splits is > increased. The size tables for which this error is showing is empty ( > during some runs it was very low in the order of few kbs). Have tried both > rate limiters already - average and fixed. Can't understand why read rate > limit exceeds when there is hardly any data in hbase. Anyone has faced this > issue before? > > Setup Details: > > Hbase version : 1.1.2 > Number of region servers :4 > Number of regions : 116 > HeapMemory for Region Server : 2GB > > Quotas set : > TABLE => ns1:table1 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT > => 10G/sec, SCOPE => MACHINE > TABLE => ns2:table2 TYPE => THROTTLE, THROTTLE_TYPE => REQUEST_SIZE, LIMIT > => 10G/sec, SCOPE => MACHINE > > Following is the error we faced. Pasting debug logs of regions server: > > 2016-09-17 22:35:40,674 DEBUG > [B.defaultRpcServer.handler=55,queue=1,port=58526] > quotas.RegionServerQuotaManager: Throttling exception for user=root > table=ns1:table1 numWrites=0 numReads=0 numScans=1: read size limit > exceeded - wait 0.00sec > > 2016-09-17 22:35:40,676 DEBUG > [B.defaultRpcServer.handler=55,queue=1,port=58526] > ipc.RpcServer: B.defaultRpcServer.handler=55,queue=1,port=58526: callId: > 52 service: ClientService methodName: Scan size: 28 connection: > 10.65.141.170:42806 > > org.apache.hadoop.hbase.quotas.ThrottlingException: read size limit > exceeded - wait 0.00sec > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > throwThrottlingException(ThrottlingException.java:107) > > at org.apache.hadoop.hbase.quotas.ThrottlingException. > throwReadSizeExceeded(ThrottlingException.java:101) > > at org.apache.hadoop.hbase.quotas.TimeBasedLimiter. > checkQuota(TimeBasedLimiter.java:139) > > at org.apache.hadoop.hbase.quotas.DefaultOperationQuota.checkQuota( > DefaultOperationQuota.java:59) > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > RegionServerQuotaManager.java:180) > > at org.apache.hadoop.hbase.quotas.RegionServerQuotaManager.checkQuota( > RegionServerQuotaManager.java:125) > > at org.apache.hadoop.hbase.regionserver.RSRpcServices. > scan(RSRpcServices.java:2265) > > at org.apache.hadoop.hbase.protobuf.generated. > ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > RpcExecutor.java:130) > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > > at java.lang.Thread.run(Thread.java:745) > > > Thanks, > > Sumit
Re: Increased response time of hbase calls
Which hbase release are you using ? Can you tell us the values for handler related config such as hbase.regionserver.handler.count ? How many regions does m7 have ? Have many servers does m7 span ? Are regions of m7 spread evenly ? Thanks
Increased response time of hbase calls
Hi all I am facing an issue while accessing data from an hbase m7 table which has about 50 million records. In a single Api request, we make 3 calls to hbase m7. 1. Single Multi get to fetch about 30 records 2. Single multi-put to update about 500 records 3. Single multi-get to fetch about 15 records We consistently get the response in less than 200 seconds for approx 99%calls. We have a tps of about 200 with 8vm's. But we get issue everyday between 4pm and 6pm when Api response time gets significant increase to from 200ms to 7-8sec. This happens because we have a daily batch load That runs between 4and 6pm that puts multiple entries into same hbase table. We are trying to find a solution to this problem that why response time increases when batch load runs. We cannot change the time of batch job. Is there anything we could do to resolve this issue?any help or pointers would be much appreciated. Thanks