Re: Too many connections from / - max is 60
Thanks for sharing insights. Moving hbase mailing list to cc. Sorry, forgot to mention that we are using Phoenix4.7(HDP 2.6.3). This cluster is mostly being queried via Phoenix apart from few pure NoSql cases that uses raw HBase api's. I looked further into zk logs and found that only 6/15 RS are running into max connection problems(no other ip/hosts of our client apps were found) constantly. One of those RS is getting 3-4x the connections errors as compared to others, this RS is hosting hbase:meta <http://ip-10-74-10-228.us-west-2.compute.internal:16030/region.jsp?name=1588230740>, regions of phoenix secondary indexes and region of Phoenix and HBase tables. I also looked into other 5 RS that are getting max connection errors, for me nothing really stands out since all of them are hosting regions of phoenix secondary indexes and region of Phoenix and HBase tables. I also tried to run netstat and tcpdump on zk host to find out anomaly but couldn't find anything apart from above mentioned analysis. Also ran hbck and it reported that things are fine. I am still unable to pin point exact problem(maybe something with phoenix secondary index?). Any other pointer to further debug the problem will be appreciated. Lastly, I constantly see following zk connection loss logs in above mentioned 6 RS: *2020-06-03 06:40:30,859 WARN [RpcServer.FifoWFPBQ.default.handler=123,queue=3,port=16020-SendThread(ip-10-74-0-120.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x0 for server ip-10-74-0-120.us-west-2.compute.internal/10.74.0.120:2181 <http://10.74.0.120:2181>, unexpected error, closing socket connection and attempting reconnectjava.io.IOException: Connection reset by peerat sun.nio.ch.FileDispatcherImpl.read0(Native Method)at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)at sun.nio.ch.IOUtil.read(IOUtil.java:192)at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)2020-06-03 06:40:30,861 INFO [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Opening socket connection to server ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <http://10.74.9.182:2181>. Will not attempt to authenticate using SASL (unknown error)2020-06-03 06:40:30,861 INFO [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Socket connection established, initiating session, client: /10.74.10.228:60012 <http://10.74.10.228:60012>, server: ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <http://10.74.9.182:2181>2020-06-03 06:40:30,861 WARN [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)] zookeeper.ClientCnxn: Session 0x0 for server ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <http://10.74.9.182:2181>, unexpected error, closing socket connection and attempting reconnectjava.io.IOException: Connection reset by peerat sun.nio.ch.FileDispatcherImpl.read0(Native Method)at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)at sun.nio.ch.IOUtil.read(IOUtil.java:192)at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)* Thanks! On Tue, Jun 2, 2020 at 6:57 AM Josh Elser wrote: > HBase (daemons) try to use a single connection for themselves. A RS also > does not need to mutate state in ZK to handle things like gets and puts. > > Phoenix is probably the thing you need to look at more closely > (especially if you're using an old version of Phoenix that matches the > old HBase 1.1 version). Internally, Phoenix acts like an HBase client > which results in a new ZK connection. There have certainly been bugs > like that in the past (speaking generally, not specifically). > > On 6/1/20 5:59 PM, anil gupta wrote: > > Hi Folks, > > > > We are running in HBase problems due to hitting the limit of ZK > > connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x on I3en > ec2 > > instance type in AWS. Almost all our Region server are listed in zk logs > > with "Too many connec
Too many connections from / - max is 60
Hi Folks, We are running in HBase problems due to hitting the limit of ZK connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x on I3en ec2 instance type in AWS. Almost all our Region server are listed in zk logs with "Too many connections from / - max is 60". 2020-06-01 21:42:08,375 - WARN [NIOServerCxn.Factory: 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from / - max is 60 On a average each RegionServer has ~250 regions. We are also running Phoenix on this cluster. Most of the queries are short range scans but sometimes we are doing full table scans too. It seems like one of the simple fix is to increase maxClientCnxns property in zoo.cfg to 300, 500, 700, etc. I will probably do that. But, i am just curious to know In what scenarios these connections are created/used(Scans/Puts/Delete or during other RegionServer operations)? Are these also created by hbase clients/apps(my guess is NO)? How can i calculate optimal value of maxClientCnxns for my cluster/usage? -- Thanks & Regards, Anil Gupta
Re: Does Dropping the Source HBase Table Affect Its Snapshots and Cloned Tables from Snapshots?
Cloned table and snapshots should not have any impact if you drop source table. Sent from my iPhone > On Nov 28, 2018, at 5:23 PM, William Shen wrote: > > Hi, > > I understand that changes made to the tables cloned using snapshot will not > affect the snapshot nor the source data table the snapshot is based on. > However, I could not find information on whether or not a snapshot or a > cloned table will be affected by the source table getting dropped. Can > someone chime in on the HBase behavior in this case? > > Thank you!
Re: question on reducing number of versions
You should see a smaller t2 after major compaction if your table actually had versions over 18k.(as Ted mentioned) Sent from my iPhone > On Aug 26, 2018, at 5:20 PM, Ted Yu wrote: > > This depends on how far down you revise the max versions for table t2. > If your data normally only reaches 15000 versions and you lower max > versions to ~15000, there wouldn't be much saving. > > FYI > >> On Sun, Aug 26, 2018 at 3:52 PM Antonio Si wrote: >> >> Thanks Anil. >> >> We are using hbase on s3. Yes, I understand 18000 is very high. We are in >> the process of reducing it. >> >> If I have a snapshot and I restore the table from this snapshot. Let's call >> this table t1. >> I then clone another table from the same snapshot, call it t2. >> >> If I reduce the max versions of t2 and run a major compaction on t2, will I >> see the decrease in table size for t2? If I compare the size of t2 and t1, >> I should see a smaller size for t2? >> >> Thanks. >> >> Antonio. >> >>> On Sun, Aug 26, 2018 at 3:33 PM Anil Gupta wrote: >>> >>> You will need to do major compaction on table for the table to >>> clean/delete up extra version. >>> Btw, 18000 max version is a unusually high value. >>> >>> Are you using hbase on s3 or hbase on hdfs? >>> >>> Sent from my iPhone >>> >>>> On Aug 26, 2018, at 2:34 PM, Antonio Si wrote: >>>> >>>> Hello, >>>> >>>> I have a hbase table whose definition has a max number of versions set >> to >>>> 36000. >>>> I have verified that there are rows which have more than 2 versions >>>> saved. >>>> >>>> Now, I change the definition of the table and reduce the max number of >>>> versions to 18000. Will I see the size of the table being reduced as I >> am >>>> not seeing that? >>>> >>>> Also, after I reduce the max number of versions, I try to create a >>>> snapshot, but I am getting a >>>> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo >>>> >>>> del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; >>>> Error Code: 404 Not Found; >>>> >>>> >>>> What may be the cause of that? >>>> >>>> I am using s3 as my storage. >>>> >>>> >>>> Thanks in advance for your suggestions. >>>> >>>> >>>> Antonio. >>> >>
Re: question on reducing number of versions
You will need to do major compaction on table for the table to clean/delete up extra version. Btw, 18000 max version is a unusually high value. Are you using hbase on s3 or hbase on hdfs? Sent from my iPhone > On Aug 26, 2018, at 2:34 PM, Antonio Si wrote: > > Hello, > > I have a hbase table whose definition has a max number of versions set to > 36000. > I have verified that there are rows which have more than 2 versions > saved. > > Now, I change the definition of the table and reduce the max number of > versions to 18000. Will I see the size of the table being reduced as I am > not seeing that? > > Also, after I reduce the max number of versions, I try to create a > snapshot, but I am getting a > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo > > del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; > Error Code: 404 Not Found; > > > What may be the cause of that? > > I am using s3 as my storage. > > > Thanks in advance for your suggestions. > > > Antonio.
Re: Unable to read from Kerberised HBase
As per error message, your scan ran for more than 1 minute but the timeout is set for 1 minute. Hence the error. Try doing smaller scans or increasing timeout.(PS: HBase is mostly good for short scan not for full table scans.) On Mon, Jul 9, 2018 at 8:37 PM, Lalit Jadhav wrote: > While connecting to remote HBase cluster, I can create Table and get Table > Listing. But unable to scan Table using Java API. Below is code > > configuration.set("hbase.zookeeper.quorum", "QUARAM"); > configuration.set("hbase.master", "MASTER"); > configuration.set("hbase.zookeeper.property.clientPort", "2181"); > configuration.set("hadoop.security.authentication", "kerberos"); > configuration.set("hbase.security.authentication", "kerberos"); > configuration.set("zookeeper.znode.parent", "/hbase-secure"); > configuration.set("hbase.cluster.distributed", "true"); > configuration.set("hbase.rpc.protection", "authentication"); > configuration.set("hbase.regionserver.kerberos.principal", > "hbase/Principal@realm"); > configuration.set("hbase.regionserver.keytab.file", > "/home/developers/Desktop/hbase.service.keytab3"); > configuration.set("hbase.master.kerberos.principal", > "hbase/HbasePrincipal@realm"); > configuration.set("hbase.master.keytab.file", > "/etc/security/keytabs/hbase.service.keytab"); > > System.setProperty("java.security.krb5.conf","/etc/krb5.conf"); > > String principal = System.getProperty("kerberosPrincipal", > "hbase/HbasePrincipal@realm"); > String keytabLocation = System.getProperty("kerberosKeytab", > "/etc/security/keytabs/hbase.service.keytab"); > UserGroupInformation.setconfiguration(configuration); > UserGroupInformation.loginUserFromKeytab(principal, keytabLocation); > UserGroupInformation userGroupInformation = > UserGroupInformation.loginUserFromKeytabAndReturnUG > I("hbase/HbasePrincipal@realm", > "/etc/security/keytabs/hbase.service.keytab"); > UserGroupInformation.setLoginUser(userGroupInformation); > > I am getting bellow errors, > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=36, exceptions: Mon Jul 09 18:45:57 IST 2018, null, > java.net.SocketTimeoutException: callTimeout=6, callDuration=64965: > row > '' on table 'DEMO_TABLE' at > region=DEMO_TABLE,,1529819280641.40f0e7dc4159937619da237915be8b11., > hostname=dn1-devup.mstorm.com,60020,1531051433899, seqNum=526190 > > Exception : java.io.IOException: Failed to get result within timeout, > timeout=6ms > > > -- > Regards, > Lalit Jadhav > Network Component Private Limited. > -- Thanks & Regards, Anil Gupta
Re: can not write to HBase
It seems you might have a write hotspot. Are your writes evenly distributed across the cluster? Do you have more than 15-20 regions for that table? Sent from my iPhone > On May 22, 2018, at 9:52 PM, Kang Minwoowrote: > > I think hbase flush is too slow. > so memstore reached upper limit. > > flush took about 30min. > I don't know why flush is too long. > > Best regards, > Minwoo Kang > > > 보낸 사람: 张铎(Duo Zhang) > 보낸 날짜: 2018년 5월 23일 수요일 11:37 > 받는 사람: hbase-user > 제목: Re: can not write to HBase > > org.apache.hadoop.hbase.RegionTooBusyException: > org.apache.hadoop.hbase.RegionTooBusyException: > Above memstore limit, regionName={region}, server={server}, > memstoreSize=2600502128, blockingMemStoreSize=2600468480 > > This means that you're writing too fast and memstore has reached its upper > limit. Is the flush and compaction fine at RS side? > > 2018-05-23 10:20 GMT+08:00 Kang Minwoo : > >> attach client exception and stacktrace. >> >> I've looked more. >> It seems to be the reason why it takes 1290 seconds to flush in the Region >> Server. >> >> 2018-05-23T07:24:31.202 [INFO] Call exception, tries=34, retries=35, >> started=513393 ms ago, cancelled=false, msg=row '{row}' on table '{table}' >> at region={region}, hostname={host}, seqNum=155455658 >> 2018-05-23T07:24:31.208 [ERROR] >> java.lang.RuntimeException: com.google.protobuf.ServiceException: Error >> calling method MultiRowMutationService.MutateRows >>at com.google.common.base.Throwables.propagate(Throwables.java:160) >> ~[stormjar.jar:?] >>at ... >>at org.apache.storm.daemon.executor$fn__8058$tuple_ >> action_fn__8060.invoke(executor.clj:731) [storm-core-1.0.2.jar:1.0.2] >>at >> org.apache.storm.daemon.executor$mk_task_receiver$fn__7979.invoke(executor.clj:464) >> [storm-core-1.0.2.jar:1.0.2] >>at >> org.apache.storm.disruptor$clojure_handler$reify__7492.onEvent(disruptor.clj:40) >> [storm-core-1.0.2.jar:1.0.2] >>at >> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451) >> [storm-core-1.0.2.jar:1.0.2] >>at org.apache.storm.utils.DisruptorQueue. >> consumeBatchWhenAvailable(DisruptorQueue.java:430) >> [storm-core-1.0.2.jar:1.0.2] >>at >> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73) >> [storm-core-1.0.2.jar:1.0.2] >>at >> org.apache.storm.daemon.executor$fn__8058$fn__8071$fn__8124.invoke(executor.clj:850) >> [storm-core-1.0.2.jar:1.0.2] >>at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:484) >> [storm-core-1.0.2.jar:1.0.2] >>at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?] >>at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80] >> Caused by: com.google.protobuf.ServiceException: Error calling method >> MultiRowMutationService.MutateRows >>at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel. >> callBlockingMethod(CoprocessorRpcChannel.java:75) ~[stormjar.jar:?] >>at org.apache.hadoop.hbase.protobuf.generated. >> MultiRowMutationProtos$MultiRowMutationService$BlockingStub.mutateRows( >> MultiRowMutationProtos.java:2149) ~[stormjar.jar:?] >>at ... >>... 13 more >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> Failed after attempts=35, exceptions: >> Wed May 23 07:15:57 KST 2018, >> RpcRetryingCaller{globalStartTime=1527027357808, >> pause=100, retries=35}, org.apache.hadoop.hbase.RegionTooBusyException: >> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, >> regionName={region}, server={server}, memstoreSize=2600502128, >> blockingMemStoreSize=2600468480 >>at org.apache.hadoop.hbase.regionserver.HRegion. >> checkResources(HRegion.java:3649) >>at org.apache.hadoop.hbase.regionserver.HRegion. >> processRowsWithLocks(HRegion.java:6935) >>at org.apache.hadoop.hbase.regionserver.HRegion. >> mutateRowsWithLocks(HRegion.java:6885) >>at org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint. >> mutateRows(MultiRowMutationEndpoint.java:116) >>at org.apache.hadoop.hbase.protobuf.generated. >> MultiRowMutationProtos$MultiRowMutationService.callMethod( >> MultiRowMutationProtos.java:2053) >>at org.apache.hadoop.hbase.regionserver.HRegion. >> execService(HRegion.java:7875) >>at org.apache.hadoop.hbase.regionserver.RSRpcServices. >> execServiceOnRegion(RSRpcServices.java:2008) >> >> >> Best regards, >> Minwoo Kang >> >> >> 보낸 사람: 张铎(Duo Zhang) >> 보낸 날짜: 2018년 5월 23일 수요일 09:22 >> 받는 사람: hbase-user >> 제목: Re: can not write to HBase >> >> What is the exception? And the stacktrace? >> >> 2018-05-23 8:17 GMT+08:00 Kang Minwoo : >> >>> Hello, Users >>> >>> My HBase client does not work after print the following logs. >>>
Re: Want to change key structure
Hi Marcell, Since key is changing you will need to rewrite the entire table. I think generating HFlies(rather than doing puts) will be the most efficient here. IIRC, you will need to use HFileOutputFormat in your MR job. For locality, i dont think you should worry that much because major compaction usually takes care of it. If you want very high locality from beginning then you can run a major compaction on new table after your initial load. HTH, Anil Gupta On Mon, Feb 19, 2018 at 11:46 PM, Marcell Ortutay <mortu...@23andme.com> wrote: > I have a large HBase table (~10 TB) that has an existing key structure. > Based on some recent analysis, the key structure is causing performance > problems for our current query load. I would like to re-write the table > with a new key structure that performs substantially better. > > What is the best way to go about re-writing this table? Since they key > structure will change, it will affect locality, so all the data will have > to move to a new location. If anyone can point to examples of code that > does something like this, that would be very helpful. > > Thanks, > Marcell > -- Thanks & Regards, Anil Gupta
Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?
Thanks Ted. Will try to do the clean-up. Unfortunately, we ran out of support for this cluster since its nearing End-of-life. For our new clusters we are in process of getting support. PS: IMO, I agree that i should use vendor forum/list for any vendor specific stuff but i think its appropriate to use this mailing Apache HBase questions/issues related to HBase. As per my understanding, Apache projects are supposed to encourage collaboration rather building boundaries around vendors.("collaboration and openness" is one of the reason i like Apache Projects) On Sat, Feb 10, 2018 at 10:11 AM, Ted Yu <yuzhih...@gmail.com> wrote: > You can cleanup oldwal directory beginning with oldest data. > > Please open support case with the vendor. > > On Sat, Feb 10, 2018 at 10:02 AM, anil gupta <anilgupt...@gmail.com> > wrote: > > > Hi Ted, > > > > We cleaned up all the snaphsots around Feb 7-8th. You were right that i > > dont see the CorruptedSnapshotException since then. Nice observation! > > So, i am again back to square one. Not really, sure why oldwals and > > recovered.edits are not getting cleaned up. I have already removed all > the > > replication peer and deleted all the snapshots. > > Is it ok if i just ahead and cleanup oldwal directory manually? Can i > also > > clean up recovered.edits? > > > > Thanks, > > Anil > > > > On Sat, Feb 10, 2018 at 9:37 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > Can you clarify whether /apps/hbase/data/.hbase-snapshot/.tmp/ became > > > empty > > > after 2018-02-07 09:10:08 ? > > > > > > Do you see CorruptedSnapshotException for file outside of > > > /apps/hbase/data/.hbase-snapshot/.tmp/ ? > > > > > > Cheers > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?
Hi Ted, We cleaned up all the snaphsots around Feb 7-8th. You were right that i dont see the CorruptedSnapshotException since then. Nice observation! So, i am again back to square one. Not really, sure why oldwals and recovered.edits are not getting cleaned up. I have already removed all the replication peer and deleted all the snapshots. Is it ok if i just ahead and cleanup oldwal directory manually? Can i also clean up recovered.edits? Thanks, Anil On Sat, Feb 10, 2018 at 9:37 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Can you clarify whether /apps/hbase/data/.hbase-snapshot/.tmp/ became > empty > after 2018-02-07 09:10:08 ? > > Do you see CorruptedSnapshotException for file outside of > /apps/hbase/data/.hbase-snapshot/.tmp/ ? > > Cheers > -- Thanks & Regards, Anil Gupta
Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?
Hi Ted, Thanks for your reply. I read the comment of jira. But, in my case "/apps/hbase/data/.hbase-snapshot/.tmp/" is already empty. So, i am not really sure what i can sideline. Please let me know if i am missing something. ~Anil On Sat, Feb 10, 2018 at 8:35 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Please the first few review comments of HBASE-16464. > > You can sideline the corrupt snapshots (according to master log). > > You can also contact the vendor for a HOTFIX. > > Cheers > > On Sat, Feb 10, 2018 at 8:13 AM, anil gupta <anilgupt...@gmail.com> wrote: > > > Hi Folks, > > > > We are running HBase1.1.2. It seems like we are hitting > > https://issues.apache.org/jira/browse/HBASE-16464 in our Production > > cluster. Our oldwals folder has grown to 9.5Tb. I am aware that this is > > fixed in releases after 2016 but unfortunately we need to operate this > > production cluster for few more months. (We are already migrating to a > > newer version of HBase). > > > > I have verified that we dont have any snapshots in this cluster. Also, we > > removed all the replication_peers from that cluster. We have already > > restarted HBase master a few days ago but it didnt help. We have TB's of > > oldwal and tens of thousand of recovered edit files.(assuming recovered > > edits files are cleaned up by chore cleaner). Seems like the problem > > started happening around mid december but at that time we didnt do any > > major thing on this cluster. > > > > I would like to see if there is a workaround for HBASE-16464? Is there > any > > references left to those deleted snapshots in hdfs or zk? If yes, how > can i > > clean up? > > > > I keep on seeing this in HMaster logs: > > 2018-02-07 09:10:08,514 ERROR > > [hdpmaster6.bigdataprod1.wh.truecarcorp.com,6, > > 1517601353645_ChoreService_3] > > snapshot.SnapshotHFileCleaner: Exception while checking if files were > > valid, keeping them just in case. > > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't > read > > snapshot info > > from:hdfs://PRODNN/apps/hbase/data/.hbase-snapshot/.tmp/ > > LEAD_SALES-1517979610/.snapshotinfo > > at > > org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils. > > readSnapshotInfo(SnapshotDescriptionUtils.java:313) > > at > > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames( > > SnapshotReferenceUtil.java:328) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1. > > filesUnderSnapshot(SnapshotHFileCleaner.java:85) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache. > > getSnapshotsInProgress(SnapshotFileCache.java:303) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache. > > getUnreferencedFiles(SnapshotFileCache.java:194) > > at > > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner. > > getDeletableFiles(SnapshotHFileCleaner.java:62) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles( > > CleanerChore.java:233) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:157) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > > checkAndDeleteDirectory(CleanerChore.java:180) > > at > > org.apache.hadoop.hbase.master.cleaner.CleanerChore. > checkAndDeleteEntries( > > CleanerChore.java:149) > > at > >
[Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?
(ClientNamenodeProtocolServerSideTranslatorPB.java:365) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1215) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:303) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:269) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:261) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1540) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303) at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:306) ... 26 more -- Thanks & Regards, Anil Gupta
Frequent Region Server Failures with namenode.LeaseExpiredException
Hi Folks, We are running a 60 Node MapReduce/HBase HDP cluster. HBase 1.1.2 , HDP: 2.3.4.0-3485. Phoenix is enabled on this cluster. Each slave has ~120gb ram. RS has 20 Gb heap, 12 disk of 2Tb each and 24 cores. This cluster has been running OK for last 2 years but recently with few disk failures(we unmounted those disks) it hasnt been running fine. I have checked hbck and hdfs fsck. Both of them report no inconsistency. Some our RegionServers keeps on aborting with following error: 1 ==> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /apps/hbase/data/data/default/DE.TABLE_NAME/35aa0de96715c33e1f0664aa4d9292ba/recovered.edits/03948161445.temp (inode 420864666): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-64710857_1, pendingcreates: 1] 2 ==> 2018-02-08 03:09:51,653 ERROR [regionserver/ hdpslave26.bigdataprod1.com/1.16.6.56:16020] regionserver.HRegionServer: Shutdown / close of WAL failed: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /apps/hbase/data/oldWALs/hdpslave26.bigdataprod1.com%2C16020%2C1518027416930.default.1518085177903 (inode 420996935): File is not open for writing. Holder DFSClient_NONMAPREDUCE_649736540_1 does not have any open files. All the LeaseExpiredException are happening for recovered.edits and oldWALs. HDFS is around 48% full. Most of the DN's have 30-40% space left on them. NN heap is at 60% use. I have tried googling around but cant find anything concrete to fix this problem. Currently, 15/60 nodes are already down in last 2 days. Can someone please point out what might be causing these RegionServer failures? -- Thanks & Regards, Anil Gupta
Re: hbase data migration from one cluster to another cluster on different versions
> > >> > at org.apache.hadoop.mapred. > YarnChild.main(YarnChild.java: > > > 158) > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > can anyone suggest how to migrate data? > > > >> > > > > >> > Thanks > > > >> > Manjeet Singh > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > Hi All, > > > >> > > > > >> > I have query regarding hbase data migration from one cluster to > > > another > > > >> > cluster in same N/W, but with a different version of hbase one is > > > >> 0.94.27 > > > >> > (source cluster hbase) and another is destination cluster hbase > > > version > > > >> is > > > >> > 1.2.1. > > > >> > > > > >> > I have used below command to take backup of hbase table on source > > > >> cluster > > > >> > is: > > > >> > ./hbase org.apache.hadoop.hbase.mapreduce.Export SPDBRebuild > > > >> > /data/backupData/ > > > >> > > > > >> > below files were genrated by using above command:- > > > >> > > > > >> > > > > >> > drwxr-xr-x 3 root root4096 Dec 9 2016 _logs > > > >> > -rw-r--r-- 1 root root 788227695 Dec 16 2016 part-m-0 > > > >> > -rw-r--r-- 1 root root 1098757026 Dec 16 2016 part-m-1 > > > >> > -rw-r--r-- 1 root root 906973626 Dec 16 2016 part-m-2 > > > >> > -rw-r--r-- 1 root root 1981769314 Dec 16 2016 part-m-3 > > > >> > -rw-r--r-- 1 root root 2099785782 Dec 16 2016 part-m-4 > > > >> > -rw-r--r-- 1 root root 4118835540 Dec 16 2016 part-m-5 > > > >> > -rw-r--r-- 1 root root 14217981341 Dec 16 2016 part-m-6 > > > >> > -rw-r--r-- 1 root root 0 Dec 16 2016 _SUCCESS > > > >> > > > > >> > > > > >> > in order to restore these files I am assuming I have to move these > > > >> files in > > > >> > destination cluster and have to run below command > > > >> > > > > >> > hbase org.apache.hadoop.hbase.mapreduce.Import > > > >> > /data/backupData/ > > > >> > > > > >> > Please suggest if I am on correct direction, second if anyone have > > > >> another > > > >> > option. > > > >> > I have tryed this with test data but above command took very long > > time > > > >> and > > > >> > at end it gets fails > > > >> > > > > >> > 17/10/23 11:54:21 INFO mapred.JobClient: map 0% reduce 0% > > > >> > 17/10/23 12:04:24 INFO mapred.JobClient: Task Id : > > > >> > attempt_201710131340_0355_m_02_0, Status : FAILED > > > >> > Task attempt_201710131340_0355_m_02_0 failed to report status > > for > > > >> 600 > > > >> > seconds. Killing! > > > >> > > > > >> > > > > >> > Thanks > > > >> > Manjeet Singh > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > luv all > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > luv all > > > > > > > > > > > > > > > > -- > > > luv all > > > > > > -- > > > -- Enrico Olivelli > -- Thanks & Regards, Anil Gupta
Re: HBASE data been deleted! Please HELP
AFAIK, in order to recover data, user has to react within minutes or seconds. But, have you checked ".Trash" folder in hdfs under hbase user or the user that issued the rmr command. On Thu, Sep 28, 2017 at 5:53 AM, hua beatls <bea...@gmail.com> wrote: > Hello, I have a big problem > We deleted hbase data with " hdfs dfs -rmr -skipTrash /hbase", > > Is there any way to recovery the deleted date? > > Thanks a lot! > -- Thanks & Regards, Anil Gupta
Re: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689
Thanks for the pointers Aaron. We checked hdfs. Its reporting 0 underreplicated or corrupted blocks. @Ted: we are using Hadoop 2.7.3(EMR5.7.2) On Thu, Jul 6, 2017 at 4:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which hadoop release are you using ? > > In FSOutputSummer.java, I see the following around line 106: > > checkClosed(); > > if (off < 0 || len < 0 || off > b.length - len) { > throw new ArrayIndexOutOfBoundsException(); > > You didn't get ArrayIndexOutOfBoundsException - maybe b was null ? > > On Thu, Jul 6, 2017 at 2:08 PM, anil gupta <anilgupt...@gmail.com> wrote: > >> Hey Ted, >> >> This is what i see in one of region server log(NPE at the bottom): >> 2017-07-06 19:07:07,778 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 155739ms >> 2017-07-06 19:07:17,853 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 132731ms >> 2017-07-06 19:07:28,038 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 4316ms >> 2017-07-06 19:07:37,819 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 190960ms >> 2017-07-06 19:07:47,767 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 41231ms >> 2017-07-06 19:07:57,767 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 222748ms >> 2017-07-06 19:08:07,973 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 245966ms >> 2017-07-06 19:08:18,669 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 76257ms >> 2017-07-06 19:08:28,029 INFO >> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605 >> 01_ChoreService_1] >> regionserver.HRegionServer: >> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050 >> 1-MemstoreFlusherChore >> requesting flush of >> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0 >> has an old edit so flush to free WALs after random delay 78310ms >> 2017-07-06 19:08:38,459 INFO >> [ip-10-74-5-15
Re: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689
(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-07-06 19:48:39,255 WARN [regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller] wal.FSHLog: Failed sync-before-close but no outstanding appends; closing WAL: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=7846, requesting roll of WAL 2017-07-06 19:48:39,261 INFO [regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller] wal.FSHLog: Rolled WAL /user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086 with entries=0, filesize=174 B; new WAL /user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370519235 2017-07-06 19:48:39,261 INFO [regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller] wal.FSHLog: Archiving hdfs://ip-10-74-31-169.us-west-2.compute.internal:8020/user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086 to hdfs://ip-10-74-31-169.us-west-2.compute.internal:8020/user/hbase/oldWALs/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086 2017-07-06 19:48:40,322 WARN [regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.append-pool1-t1] wal.FSHLog: Append sequenceId=7847, requesting roll of WAL java.lang.NullPointerException at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:106) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60) at java.io.DataOutputStream.write(DataOutputStream.java:107) at org.apache.hadoop.hbase.KeyValue.oswrite(KeyValue.java:2571) at org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:623) at org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:338) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:122) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1909) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1773) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1695) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Thu, Jul 6, 2017 at 1:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: > HBASE-16960 mentioned the following : > > Caused by: java.net.SocketTimeoutException: 2 millis timeout while > waiting for channel to be ready for read > > Do you see similar line in region server log ? > > Cheers > > On Thu, Jul 6, 2017 at 1:48 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > Hi All, > > > > We are running HBase/Phoenix on EMR5.2(HBase1.2.3 and Phoenix4.7) and we > running into following exception when we are trying to load data into one > of our Phoenix table: > > 2017-07-06 19:57:57,507 INFO [hconnection-0x60e5272-shared--pool2-t249] > org.apache.hadoop.hbase.client.AsyncProcess: #1, table=DE.CONFIG_DATA, > attempt=30/35 failed=38ops, last exception: org.apache.hadoop.hbase. > regionserver.wal.DamagedWALException: org.apache.hadoop.hbase. > regionserver.wal.DamagedWALException: Append sequenceId=8689, requesting > roll of WAL > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog$ > RingBufferEventHandler.append(FSHLog.java:1921) > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog$ > RingBufferEventHandler.onEvent(FSHLog.java:1773) > > at org.apache.hadoop.hbase.regionserver.wal.FSHLog$ > RingBufferEventHandler.onEvent(FSHLog.java:1695) > > at com.lmax.disruptor.BatchEventProcessor.run( > BatchEventProcessor.java:128) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > > > We are OK with wiping out this table and rebuilding the dataset. We > tried to drop the table and recreate the table but it didnt fix it. > > Can anyone please let us know how can we get rid of above problem? Are > we running into https://issues.apache.org/jira/browse/HBASE-16960? > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689
Hi All, We are running HBase/Phoenix on EMR5.2(HBase1.2.3 and Phoenix4.7) and we running into following exception when we are trying to load data into one of our Phoenix table: 2017-07-06 19:57:57,507 INFO [hconnection-0x60e5272-shared--pool2-t249] org.apache.hadoop.hbase.client.AsyncProcess: #1, table=DE.CONFIG_DATA, attempt=30/35 failed=38ops, last exception: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689, requesting roll of WAL at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1921) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1773) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1695) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) We are OK with wiping out this table and rebuilding the dataset. We tried to drop the table and recreate the table but it didnt fix it. Can anyone please let us know how can we get rid of above problem? Are we running into https://issues.apache.org/jira/browse/HBASE-16960? -- Thanks & Regards, Anil Gupta
mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles
Cross posting since this seems to be an HBase issue. I think completeBulkLoad step is failing. Please refer to the mail below. -- Forwarded message -- From: anil gupta <anilgupt...@gmail.com> Date: Thu, May 25, 2017 at 4:38 PM Subject: [IndexTool NOT working] mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles To: "u...@phoenix.apache.org" <u...@phoenix.apache.org> Hi, We are using HDP2.3.2(Phoenix 4.4 and HBase 1.1), we created a secondary index on an already existing table. We paused all writes to Primary table. Then we invoked IndexTool to populate secondary index table. We have tried same steps many times but we keep on getting following error(we have also tried drop the index and adding it again): 2017-05-24 18:00:10,281 WARN [LoadIncrementalHFiles-2] util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size 2017-05-24 18:00:10,340 WARN [LoadIncrementalHFiles-12] util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is deprecated by hbase.regionserver.global.memstore.size 2017-05-24 18:00:10,342 INFO [LoadIncrementalHFiles-11] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ c79ae6d27824424f99523dad586e86b1 first=JF2GPADC8GH331037\x00\ x80\x00\x1A0\x80\x00\x01Wj\x03r1defc4d301e4ec172b49be4a7ea33c2f7 last=JTHBK1GG4E2122477\x00\x80\x00$\xE4\x80\x00\x01[\xAD`{\ x17901d036d588292854ac5b1d4c29d8e1e 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-14] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ f0e97b218aed4abf9949cf49a57e559b first=5NPEB4AC3DH620091\x00\ x80\x00\xE0\x16\x80\x00\x01X\xE5g\xD6\x0B81d210ac753ed281e8627e5edb7eb59f last=JF2GPADC8GH331037\x00\x80\x00\x1A0\x80\x00\x01W]&\ xE54f37d636104f6cd916b2b07bf3aa94d3f 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-2] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ 27c0a905ee174c9898d324acf1554bf9 first=WMWZP3C58FT708786\x00\ x80\x00\xE0\x16\x80\x00\x01Y\xB8\x95U\xA0d21d32aed18af976dd53735705c728cd last=`1GCRCPE05BZ430377\x00\x80\x00}\x05\x80\x00\x01[\xDEE\ x91L383768c6ac5f306fa99f68964b4f18aa 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-12] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ d0a6c4b727bb416f840ed254658f3982 first=1N4BZ0CP4GC308715\x00\ x80\x01T\xFC\x80\x00\x01U\xE3\x7FL\x9A37b77d47941e99e430fcb0e0657f5558 last=2GKALMEK7H6220949\x00\x80\x00!\x1A\x80\x00\x01Y\x18\xE6\xB3\ xB42e72036f7e7e03078f41fc82712c5de7 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-0] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ 11627a2861e3446e9d6f684ab534563e first=3C7WRNAJ6GG313342\x00\ x80\x00NB\x80\x00\x01V}\xFD\xE4+65bbebdd06dedd8466a31ebd33841a51 last=3N1CE2CP2FL407481\x00\x80\x00\xE0\x16\x80\x00\x01W\x1B\x0A\x02\ xC1fc95d4114d5e91197a5e41bf37c9e8c7 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-1] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ 23df78bafd304ff887385a2b6becf06d first=1C4RJFLT6HC742023\x00\ x80\x00x\xFF\x80\x00\x01[J8\x0Ac8b65a80fe1662fb25d80798a66cc83dc last=1FMCU9J90EUB68140\x00\x80\x01X\xA4\x80\x00\x01[\x1B\xDD\xB2\ x1C577502512ec987844b0108738a9ec6ba 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-3] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ 39dc73882bec49a0bdd5d787b06ac032 first=1G1JD5SB5H4136951\x00\ x80\x00!\x8A\x80\x00\x01Z%\xF6\x7Ffef0b8faeeeb4a10103e1a67ea5ebdbec last=1GNKVHKD7HJ275239\x00\x80\x00$\x87\x80\x00\x01Z%\xF6s\ xDC0961566a370af3b7da440e9705bc4c8c 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-8] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ a37a2a56ff5c48399cf1abd92f99662f first=###239824\x00\ x80\x01(\xFE\x80\x00\x01Z\xAE\xD6\xE0Xe5a45a2beab337228bdba90c06f34a12 last=1C4RJFLT6HC742023\x00\x80\x00x\xFF\x80\x00\x01[H\xF9w\ x8D60edb518c27ef80f8a751701926d9174 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-10] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ b5843a62a6bd47fbbfc29303bee158e3 first=5FNRL5H4XFB033259\x00\ x80\x00\x1EZ\x80\x00\x01\x5C"\x87s\xF5ce24ec7e2a3698836386bccabc1265af last=5NPEB4AC3DH620091\x00\x80\x00\xE0\x16\x80\x00\x01X\xE4\x1B\ x9Dq95568f371c1ebd06c497df7129f248a2 2017-05-24 18:00:10,343 INFO [LoadIncrementalHFiles-5] mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/ 595e89b2fae8494d8a878bc6ba306e2f first=JTHBK1GG4E2122477\x00\ x80\x00$\xE4\x80\x00\x01[\xAF\xF0]%d306ddb81ea3bc093c40efe9f198f03a last=KNDPMCAC1H7201793\x00\x80\x00\x
Re: HBASE and MOB
Backporting MOB wont be a trivial task. AFAIK, Cloudera backported MOB to HBase1.x branch for CDH(its not in apache HBase1.x branch yet). It might be easier to just use CDH for MOB. On Fri, May 12, 2017 at 8:51 AM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Thanks for those details. > > How big are you PDF? Are they all small size? If they are not above 1MB, > MOBs will not really be 100% mandatory. Even if few of them are above. > > If you want to apply the patch on another branch,this is what is called a > back port (like Ted said before) and will require a pretty good amount of > work. You can jump on that, but if you are not used to the HBase code, it > might be a pretty big challenge... > > Another way is to look for an HBase distribution that already includes the > MOB code already. > > JMS > > 2017-05-12 11:21 GMT-04:00 F. T. <bibo...@hotmail.fr>: > > > Hi Jean Marc > > > > I'm using a 1.2.3 version. I downloaded a "bin" version from Apache > > official web site. Maybe I've to install it from the "src" option with > mvn ? > > > > I would like index PDF into Hbase and use it in a Solr collection. > > > > In fact I would like reproduce this process : > > http://blog.cloudera.com/blog/2015/10/how-to-index-scanned- > > pdfs-at-scale-using-fewer-than-50-lines-of-code/ > > > > > > But maybe is there another solution to reproduce it . > > > > Fred > > > > > > > > De : Jean-Marc Spaggiari <jean-m...@spaggiari.org> > > Envoyé : vendredi 12 mai 2017 17:06 > > À : user > > Objet : Re: HBASE and MOB > > > > Hi Fred, > > > > Can you please confirm the following information? > > > > 1) What exact version of HBase are you using? From a distribution, build > by > > yourself, from the JARs, etc. > > 2) Why do you think you need the MOB feature > > 3) Is an upgrade an option for you or not really. > > > > Thanks, > > > > JMS > > > > > > 2017-05-12 11:02 GMT-04:00 Ted Yu <yuzhih...@gmail.com>: > > > > > It is defined here in > > > hbase-client/src/main/java/org/apache/hadoop/hbase/ > > HColumnDescriptor.java: > > > public static final String IS_MOB = "IS_MOB"; > > > > > > MOB feature hasn't been backported to branch-1 (or earlier releases). > > > > > > Looks like you're using a vendor's release. > > > > > > Consider contacting the corresponding mailing list if you are stuck. > > > > > > On Fri, May 12, 2017 at 7:59 AM, F. T. <bibo...@hotmail.fr> wrote: > > > > > > > Hi all, > > > > > > > > I'd like to use MOB in HBase to store PDF files. I'm using Hbase > 1.2.3 > > > but > > > > I'get this error creating a table with MOB column : NameError: > > > > uninitialized constant IS_MOB. > > > > > > > > A lot of web sites (including Apache official web site) talk about > the > > > > patch 11339 or HBase 2.0.0, but, I don't find any explanation about > the > > > way > > > > to install this patch and > > > > > > > > I can't find the 2.0.0 version anywhere. So I'm completly lost. Could > > you > > > > help me please ? > > > > > > > > > > > > > > -- Thanks & Regards, Anil Gupta
Re: limiting user threads on client
I think you need to set that property before you make HBaseConfiguration object. Have you tried that? On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohm <henning.bl...@zfabrik.de> wrote: > Unfortunately it doesn't seem to make a difference. > > I see that the configuration has hbase.htable.threads.max=1 right before > setting up the Connection but then I still get hundreds of > > hconnection-*** > > threads. Is that actually Zookeeper? > > Thanks, > Henning > > On 13.03.2017 17:28, Ted Yu wrote: > >> Are you using Java client ? >> See the following in HTable : >> >>public static ThreadPoolExecutor getDefaultExecutor(Configuration >> conf) { >> >> int maxThreads = conf.getInt("hbase.htable.threads.max", Integer. >> MAX_VALUE); >> >> FYI >> >> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm <henning.bl...@zfabrik.de> >> wrote: >> >> Hi, >>> >>> I am running an HBase client on a very resource limited machine. In >>> particular numproc is limited so that I frequently get "Cannot create >>> native thread" OOMs. I noticed that, in particular in write situations, >>> the >>> hconnection pool grows into the hundreds of threads - even when at most >>> writing with less than ten application threads. Threads are discarded >>> again >>> after some minutes. >>> >>> In conjunction with other programs running on that machine, this >>> sometimes >>> leads to an "overload" situation. >>> >>> Is there a way to keep thread pool usage limited - or in some closer >>> relation with the actual concurrency required? >>> >>> Thanks, >>> >>> Henning >>> >>> >>> >>> > -- Thanks & Regards, Anil Gupta
Re: Hbase Row key lock
As per my experience, in normal case lock wont be held for 60 seconds. How many writes/sec per node you are doing? Seems like there is some hotspotting in your use case or cluster might need some tuning/tweaking. Have you verified that your writes/reads are evenly spread out. Do u have a time component as prefix of your rowkey? On Sun, Oct 23, 2016 at 7:01 PM, Manjeet Singh <manjeet.chand...@gmail.com> wrote: > Anil its written it can hold lock upto 60 second. In my case my job get > stuck and many update for same rowkey cause fir bead health of hbase in cdh > 5.8 > > On 24 Oct 2016 06:26, "anil gupta" <anilgupt...@gmail.com> wrote: > > Writes/Updates usually takes few milliseconds in HBase. So, in normal cases > lock wont be held for seconds. > > On Sun, Oct 23, 2016 at 12:57 PM, Manjeet Singh < > manjeet.chand...@gmail.com> > wrote: > > > Anil all information are correct I am talking about suppose I didn't set > > any version and I have very simple requirement to update if I found xyz > > record and if I hv few ETL process which are responsible for aggregate > the > > data which is very common. ... why my hbase stuck if I try to update same > > rowkey... its mean its hold the lock for few second > > > > On 24 Oct 2016 00:46, "anil gupta" <anilgupt...@gmail.com> wrote: > > > > > Writes within a HBase row are atomic. Now, whichever write becomes the > > > latest write(with the help of timestamp value) will prevail as the > > default > > > value. If you set versions to more than 1 in column family, then you > will > > > be able to see both the values if you query for multiple versions. > > > > > > HTH, > > > Anil Gupta > > > > > > On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh < > > > manjeet.chand...@gmail.com> > > > wrote: > > > > > > > Till now what i understand their is no update > > > > > > > > if two different thread try to update same record what happen > > > > > > > > first record insert with some version > > > > second thread comes and change the version and its like a new insert > > with > > > > some version > > > > this process called MVCC > > > > > > > > If I am correct how hbase support MVCC mean any configuration for > > > handlling > > > > multiple thread at same time? > > > > > > > > On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh < > > > > manjeet.chand...@gmail.com> > > > > wrote: > > > > > > > > > No I don't have 50 clients? I want to understand internal working > of > > > > Hbase > > > > > in my usecase I have bulk update operation from spark job we have 7 > > > > > different kafka pipeline and 7 spark job > > > > > it might happen that my 2 0r 3 spark job have same rowkey for > update > > > > > > > > > > > > > > > > > > > > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak < > dimaspi...@apache.org > > > > > > > > wrote: > > > > > > > > > >> If your typical use case sees 50 clients simultaneously trying to > > > update > > > > >> the same row, then a strongly consistent data store that writes to > > > disk > > > > >> for > > > > >> fault tolerance may not be for you. That said, such a use case > seems > > > > >> extremely unusual to me and I'd ask why you're trying to update > the > > > same > > > > >> row in such a manner. > > > > >> > > > > >> On Sunday, October 23, 2016, Manjeet Singh < > > > manjeet.chand...@gmail.com> > > > > >> wrote: > > > > >> > > > > >> > Hi Dima, > > > > >> > > > > > >> > I didn't get ? point is assume I have 50 different client all > > having > > > > >> same > > > > >> > rowkey all want to update on same rowkey at same time now just > > tell > > > > what > > > > >> > will happen? who will get what value? > > > > >> > > > > > >> > Thanks > > > > >> > Manjeet > > > > >> > > > > > >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak < > > > dimaspi...@apache.org > > > > >> > <javascript:;>> wrote: > >
Re: Hbase Row key lock
Writes/Updates usually takes few milliseconds in HBase. So, in normal cases lock wont be held for seconds. On Sun, Oct 23, 2016 at 12:57 PM, Manjeet Singh <manjeet.chand...@gmail.com> wrote: > Anil all information are correct I am talking about suppose I didn't set > any version and I have very simple requirement to update if I found xyz > record and if I hv few ETL process which are responsible for aggregate the > data which is very common. ... why my hbase stuck if I try to update same > rowkey... its mean its hold the lock for few second > > On 24 Oct 2016 00:46, "anil gupta" <anilgupt...@gmail.com> wrote: > > > Writes within a HBase row are atomic. Now, whichever write becomes the > > latest write(with the help of timestamp value) will prevail as the > default > > value. If you set versions to more than 1 in column family, then you will > > be able to see both the values if you query for multiple versions. > > > > HTH, > > Anil Gupta > > > > On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh < > > manjeet.chand...@gmail.com> > > wrote: > > > > > Till now what i understand their is no update > > > > > > if two different thread try to update same record what happen > > > > > > first record insert with some version > > > second thread comes and change the version and its like a new insert > with > > > some version > > > this process called MVCC > > > > > > If I am correct how hbase support MVCC mean any configuration for > > handlling > > > multiple thread at same time? > > > > > > On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh < > > > manjeet.chand...@gmail.com> > > > wrote: > > > > > > > No I don't have 50 clients? I want to understand internal working of > > > Hbase > > > > in my usecase I have bulk update operation from spark job we have 7 > > > > different kafka pipeline and 7 spark job > > > > it might happen that my 2 0r 3 spark job have same rowkey for update > > > > > > > > > > > > > > > > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak <dimaspi...@apache.org > > > > > > wrote: > > > > > > > >> If your typical use case sees 50 clients simultaneously trying to > > update > > > >> the same row, then a strongly consistent data store that writes to > > disk > > > >> for > > > >> fault tolerance may not be for you. That said, such a use case seems > > > >> extremely unusual to me and I'd ask why you're trying to update the > > same > > > >> row in such a manner. > > > >> > > > >> On Sunday, October 23, 2016, Manjeet Singh < > > manjeet.chand...@gmail.com> > > > >> wrote: > > > >> > > > >> > Hi Dima, > > > >> > > > > >> > I didn't get ? point is assume I have 50 different client all > having > > > >> same > > > >> > rowkey all want to update on same rowkey at same time now just > tell > > > what > > > >> > will happen? who will get what value? > > > >> > > > > >> > Thanks > > > >> > Manjeet > > > >> > > > > >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak < > > dimaspi...@apache.org > > > >> > <javascript:;>> wrote: > > > >> > > > > >> > > Unless told not to, HBase will always write to memory and append > > to > > > >> the > > > >> > WAL > > > >> > > on disk before returning and saying the write succeeded. That's > by > > > >> design > > > >> > > and the same write pattern that companies like Apple and > Facebook > > > have > > > >> > > found works for them at scale. So what's there to solve? > > > >> > > > > > >> > > On Sunday, October 23, 2016, Manjeet Singh < > > > >> manjeet.chand...@gmail.com > > > >> > <javascript:;>> > > > >> > > wrote: > > > >> > > > > > >> > > > Hi All, > > > >> > > > > > > >> > > > I have read below mention blog and it also said Hbase holds > the > > > >> lock on > > > >> > > > rowkey level > > > >> > > > h
Re: Hbase Row key lock
Writes within a HBase row are atomic. Now, whichever write becomes the latest write(with the help of timestamp value) will prevail as the default value. If you set versions to more than 1 in column family, then you will be able to see both the values if you query for multiple versions. HTH, Anil Gupta On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh <manjeet.chand...@gmail.com> wrote: > Till now what i understand their is no update > > if two different thread try to update same record what happen > > first record insert with some version > second thread comes and change the version and its like a new insert with > some version > this process called MVCC > > If I am correct how hbase support MVCC mean any configuration for handlling > multiple thread at same time? > > On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh < > manjeet.chand...@gmail.com> > wrote: > > > No I don't have 50 clients? I want to understand internal working of > Hbase > > in my usecase I have bulk update operation from spark job we have 7 > > different kafka pipeline and 7 spark job > > it might happen that my 2 0r 3 spark job have same rowkey for update > > > > > > > > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak <dimaspi...@apache.org> > > wrote: > > > >> If your typical use case sees 50 clients simultaneously trying to update > >> the same row, then a strongly consistent data store that writes to disk > >> for > >> fault tolerance may not be for you. That said, such a use case seems > >> extremely unusual to me and I'd ask why you're trying to update the same > >> row in such a manner. > >> > >> On Sunday, October 23, 2016, Manjeet Singh <manjeet.chand...@gmail.com> > >> wrote: > >> > >> > Hi Dima, > >> > > >> > I didn't get ? point is assume I have 50 different client all having > >> same > >> > rowkey all want to update on same rowkey at same time now just tell > what > >> > will happen? who will get what value? > >> > > >> > Thanks > >> > Manjeet > >> > > >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak <dimaspi...@apache.org > >> > <javascript:;>> wrote: > >> > > >> > > Unless told not to, HBase will always write to memory and append to > >> the > >> > WAL > >> > > on disk before returning and saying the write succeeded. That's by > >> design > >> > > and the same write pattern that companies like Apple and Facebook > have > >> > > found works for them at scale. So what's there to solve? > >> > > > >> > > On Sunday, October 23, 2016, Manjeet Singh < > >> manjeet.chand...@gmail.com > >> > <javascript:;>> > >> > > wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > I have read below mention blog and it also said Hbase holds the > >> lock on > >> > > > rowkey level > >> > > > https://blogs.apache.org/hbase/entry/apache_hbase_ > >> > internals_locking_and > >> > > > (0) Obtain Row Lock > >> > > > (1) Write to Write-Ahead-Log (WAL) > >> > > > (2) Update MemStore: write each cell to the memstore > >> > > > (3) Release Row Lock > >> > > > > >> > > > > >> > > > SO question is how to solve this if I have very frequent update on > >> > Hbase > >> > > > > >> > > > Thanks > >> > > > Manjeet > >> > > > > >> > > > On Wed, Aug 17, 2016 at 9:54 AM, Manjeet Singh < > >> > > manjeet.chand...@gmail.com <javascript:;> > >> > > > <javascript:;>> > >> > > > wrote: > >> > > > > >> > > > > Hi All > >> > > > > > >> > > > > Can anyone help me about how and in which version of Hbase > support > >> > > Rowkey > >> > > > > lock ? > >> > > > > I have seen article about rowkey lock but it was about .94 > >> version it > >> > > > said > >> > > > > that if row key not exist and any update request come and that > >> rowkey > >> > > not > >> > > > > exist then in this case Hbase hold the lock for 60 sec. > >> > > > > > >> > > > > currently I am using Hbase 1.2.2 version > >> > > > > > >> > > > > Thanks > >> > > > > Manjeet > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > luv all > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > luv all > >> > > > > >> > > > >> > > > >> > > -- > >> > > -Dima > >> > > > >> > > >> > > >> > > >> > -- > >> > luv all > >> > > >> > >> > >> -- > >> -Dima > >> > > > > > > > > -- > > luv all > > > > > > -- > luv all > -- Thanks & Regards, Anil Gupta
Re: CopyTable fails on copying between two secured clusters
Hi Frank, I dont know your exact use case. But, I have successfully run copyTable across *2 secure* clusters back in 2013-2014 on a CDH distro cluster. Unfortunately, I dont remember the settings or command that we ran to do that since it was at my previous job. Thanks, Anil Gupta On Fri, Sep 9, 2016 at 10:22 AM, Esteban Gutierrez <este...@cloudera.com> wrote: > Hi Frank, > > doesn't looks like the you are pointing the znode base to /hbase-secure, > see the arguments that you provided initially: > > "--peer.adr=zookeeper1, zookeeper2:2181:/hbase", > "--new.name=TargetTable", > "SourceTable" > > if the destination cluster has the base znode under /hbase-secure then you > need to point to the right base znode in --peer.adr, e.g. something > like: --peer.adr=zookeeper1, zookeeper2:2181:/hbase-secure > > or is there something different you have as the arguments for CopyTable? > > esteban. > > > -- > Cloudera, Inc. > > > On Fri, Sep 9, 2016 at 9:40 AM, Frank Luo <j...@merkleinc.com> wrote: > > > I think I know the cause now. > > > > The code tries to get "baseZNode" from the config. and the latter is > > obtained from Connection#getConfiguration(). Now we have two connections, > > one from local hbase, the other remote. The local hbase's connection has > > the configuration set perfectly, while the one on the remote connection > > barely has anything, hence not able to get a value of "baseZNode". > > > > So based on this theory, CopyTable will never work if the remote is a > > secured cluster, is that a right assessment? Does anyone have luck to get > > it work? > > > > -Original Message- > > From: Frank Luo > > Sent: Thursday, September 08, 2016 6:45 PM > > To: user@hbase.apache.org > > Subject: RE: CopyTable fails on copying between two secured clusters > > > > I don't think they are pointing to different locations. Both of them > > should be /hbase-secure. > > > > However, the debugger shows that ConnectionManager#retrieveClusterId are > > called twice, the first time regards to the source cluster, which works > > fine, and watcher.clusterIdZNode=/hbase-secure/hbaseid, and it is > correct. > > > > The second time for the remote cluster, watcher.clusterIdZNode=/hbase/ > hbaseid, > > which should be incorrect. > > > > What I am suspecting is ZooKeeperWatcher, method setNodeNames. It reads: > > > > private void setNodeNames(Configuration conf) { > > baseZNode = conf.get(HConstants.ZOOKEEPER_ZNODE_PARENT, > > HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT); > > > > I am not sure the conf is corrected fetched from the remote cluster. If > > not, the default value is given, which is /hbase and incorrect. > > > > By the way, below is the root znodes for zookeepers: > > > > The source cluster: > > [hbase-secure, hiveserver2, hive, hbase-unsecure, templeton-hadoop, > > hadoop-ha, zookeeper] > > > > The target cluster: > > [hbase-secure, hive, hiveserver2, hbase-unsecure, hadoop-ha, zookeeper] > > > > -Original Message- > > From: Esteban Gutierrez [mailto:este...@cloudera.com] > > Sent: Thursday, September 08, 2016 1:02 PM > > To: user@hbase.apache.org > > Subject: Re: CopyTable fails on copying between two secured clusters > > > > Is it possible that in your destination cluster zookeeper.znode.parent > > points to a different location than /hbase ? If both clusters are under > the > > same kerberos realm then there is no need to worry about > > zookeeper.security.auth_to_local. > > > > > > > > -- > > Cloudera, Inc. > > > > > > On Thu, Sep 8, 2016 at 10:50 AM, Frank Luo <j...@merkleinc.com> wrote: > > > > > Thanks Esteban for replying. > > > > > > The Kerberos realm is shared between the two clusters. > > > > > > I searched zookeeper config and couldn't find the rule, so where it > > > is set? > > > > > > Having said that, I looked at parameters passed to getData call, and > > > it doesn't look like security related. > > > > > > PS. I am using hbase 1.1.2. > > > > > > Here is the log: > > > > > > com.merkleinc.cr.hbase_maintenance.tableexport.CopyTableTest,testCopyT > > > able Connected to the target VM, address: '127.0.0.1:50669', > > > transport: > > > 'socket' > > > 0[main] WARN org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil > - &
Re: How to backport MOB to Hbase 1.2.2
So, in that case if someone really wants to use MOB without waiting for HBase2.0 release they can take CDH5.4+ for a spin. Right? ~Anil PS: I dont work for Cloudera. On Sun, Aug 21, 2016 at 8:45 AM, Dima Spivak <dspi...@cloudera.com> wrote: > Hey Anil, > > No, you're totally right; CDH 5.4 shipped with MOB, but on an HBase based > on the upstream 1.0 release. I can tell you firsthand that the time and > effort undertaken at Cloudera and Intel to make it production-ready (and > convince ourselves of that through rigorous testing) was pretty > significant, so someone looking to "roll their own" based on an Apache > release is in for some long nights. > > On Sunday, August 21, 2016, anil gupta <anilgupt...@gmail.com> wrote: > > > Hi Dima, > > > > I was under impression that some CDH5.x GA release shipped MOB. Is that > > wrong? > > > > Thanks, > > Anil > > > > On Sat, Aug 20, 2016 at 10:48 PM, Dima Spivak <dspi...@cloudera.com > > <javascript:;>> wrote: > > > > > Nope, you'd be in uncharted territory there, my friend, and definitely > > not > > > in a place that would be production-ready. Sorry to be the bearer of > bad > > > news :(. > > > > > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com > > <javascript:;>> wrote: > > > > > > > I have read HBASE-15370. We have to wait quite a while for HBase > 2.0, > > > > this is the reason why I want to try out MOB now in HBase 1.2.2 in > my > > > test > > > > environment, any steps and guide to do the backport? > > > > > > > > > > > > On Sun, Aug 21, 2016 at 12:44 PM, Dima Spivak <dspi...@cloudera.com > > <javascript:;> > > > > <javascript:;>> wrote: > > > > > > > > > Hi Ascot, > > > > > > > > > > MOB won't be backported into any pre-2.0 HBase branch. HBASE-15370 > > > > tracked > > > > > the effort and an email thread on the dev list ("[DISCUSS] Criteria > > for > > > > > including MOB feature backport in branch-1" started by Ted Yu on > > March > > > > 3rd > > > > > of this year) has additional rationale as to why that is. > > > > > > > > > > Cheers, > > > > > > > > > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com > > <javascript:;> > > > > <javascript:;> > > > > > <javascript:_e(%7B%7D,'cvml','ascot.m...@gmail.com <javascript:;> > > <javascript:;>');>> > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I want to use MOB in Hbase 1.2.2, can anyone advise the step to > > > > backport > > > > > > MOB to HBase 1.2.2? > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > > > > > > -- > > > > > -Dima > > > > > > > > > > > > > > > > > > -- > > > -Dima > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > > > -- > -Dima > -- Thanks & Regards, Anil Gupta
Re: How to backport MOB to Hbase 1.2.2
Hi Dima, I was under impression that some CDH5.x GA release shipped MOB. Is that wrong? Thanks, Anil On Sat, Aug 20, 2016 at 10:48 PM, Dima Spivak <dspi...@cloudera.com> wrote: > Nope, you'd be in uncharted territory there, my friend, and definitely not > in a place that would be production-ready. Sorry to be the bearer of bad > news :(. > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com> wrote: > > > I have read HBASE-15370. We have to wait quite a while for HBase 2.0, > > this is the reason why I want to try out MOB now in HBase 1.2.2 in my > test > > environment, any steps and guide to do the backport? > > > > > > On Sun, Aug 21, 2016 at 12:44 PM, Dima Spivak <dspi...@cloudera.com > > <javascript:;>> wrote: > > > > > Hi Ascot, > > > > > > MOB won't be backported into any pre-2.0 HBase branch. HBASE-15370 > > tracked > > > the effort and an email thread on the dev list ("[DISCUSS] Criteria for > > > including MOB feature backport in branch-1" started by Ted Yu on March > > 3rd > > > of this year) has additional rationale as to why that is. > > > > > > Cheers, > > > > > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com > > <javascript:;> > > > <javascript:_e(%7B%7D,'cvml','ascot.m...@gmail.com <javascript:;>');>> > > wrote: > > > > > > > Hi, > > > > > > > > I want to use MOB in Hbase 1.2.2, can anyone advise the step to > > backport > > > > MOB to HBase 1.2.2? > > > > > > > > Regards > > > > > > > > > > > > > -- > > > -Dima > > > > > > > > -- > -Dima > -- Thanks & Regards, Anil Gupta
Re: Is it ok to store all integers as Strings instead of byte[] in hbase?
Hi Mahesha, I think its not a good idea to store Numbers/Dates as String. If you store numbers as strings then you wont be able to do numerical/date comparison. HBase is Data Type Agnostic. IMO, you will be better off by using Apache Phoenix(http://phoenix.apache.org/). Phoenix is a sql layer on top of HBase. It is ANSI SQL compliant. Currently Phoenix is officially supported by HDP and it is also present in cloudera labs. HTH, Anil Gupta On Fri, Jul 8, 2016 at 5:18 AM, Dima Spivak <dspi...@cloudera.com> wrote: > Hey Mahesha, > > It might be worthwhile to read through the architecture section of our ref > guide: https://hbase.apache.org/book.html#_architecture > > Cheers, > Dima > > On Friday, July 8, 2016, Mahesha999 <abnav...@gmail.com> wrote: > > > I am trying out some hbase code. I realised that when I insert data > through > > hbase shell using put command, then everything (both numeric and string) > is > > put as string: > > > > hbase(main):001:0> create 'employee', {NAME => 'f'} > > hbase(main):003:0> put 'employee', 'ganesh','f:age',30 > > hbase(main):004:0> put 'employee', 'ganesh','f:desg','mngr' > > hbase(main):005:0> scan 'employee' > > ROW COLUMN+CELL > > ganesh column=f:age, timestamp=1467926618738, value=30 > > ganesh column=f:desg, timestamp=1467926639557, value=mngr > > > > However when I put data using Java API, non-string stuff gets serialized > as > > byte[]: > > > > Cluster lNodes = new Cluster(); > > lNodes.add("digitate-VirtualBox:8090"); > > Client lClient= new Client(lNodes); > > RemoteHTable remoteht = new RemoteHTable(lClient, "employee"); > > > > Put lPut = new Put(Bytes.toBytes("mahesh")); > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"), Bytes.toBytes(25)); > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("desg"), > Bytes.toBytes("dev")); > > remoteht.put(lPut); > > > > Scan in hbase shell shows age 25 of mahesh is stored as \x00\x00\x00\x19: > > > > hbase(main):006:0> scan 'employee' > > ROW COLUMN+CELL > > ganesh column=f:age, timestamp=1467926618738, value=30 > > ganesh column=f:desg, timestamp=1467926639557, value=mngr > > mahesh column=f:age, timestamp=1467926707712, > > value=\x00\x00\x00\x19 > > mahesh column=f:desg, timestamp=1467926707712, value=dev > > > > *1.* Considering I will be storing only numeric and string data in hbase, > > what benefits it does provide to store numeric data as byte[] (as in case > > of > > above) or as string: > > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"), Bytes.toBytes("25")); > > //instead of toBytes(25) > > > > *2.*Also why strings are stored as is and are not serialized to byte[] > even > > when put using Java API? > > > > > > > > -- > > View this message in context: > > > http://apache-hbase.679495.n3.nabble.com/Is-it-ok-to-store-all-integers-as-Strings-instead-of-byte-in-hbase-tp4081100.html > > Sent from the HBase User mailing list archive at Nabble.com. > > > -- Thanks & Regards, Anil Gupta
Re: Table/column layout
My 2 cents: #1. HBase version timestamp is purely used for storing & purging historical data on basis of TTL. If you try to build an app toying around timestamps you might run into issues. So, you might need to be very careful with that. #2. Usually HBase suggests that column name to be around 5-6 chars because HBase store data as KV. But, its hard to keep on doing that in **real world apps**. When you use block encoding/compression, the performance penalty of wide columns is reduced. For example, Apache Phoenix uses Fast_Diff encoding by default due to non-short column name. Here is another blogpost that discuss perf of encoding/compression: http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html I have been using user friendly column names(more readable rather than short abbreviation) and i still get decent performance in my apps.(Obviously, YMMV. My apps are performing within our SLA.) In prod, I have a table that has 1100+ columns, column names are not short. Hence, i would recommend you to go ahead with your non-short column naming. You might need to try out different encoding/compression to see what provides you best performance. HTH, Anil Gupta On Fri, Jun 10, 2016 at 8:16 PM, Ken Hampson <hamps...@gmail.com> wrote: > I realize that was probably a bit of a wall of text... =) > > So, TL;DR: I'm wondering: > 1) If people have used and had good experiences with caller-specified > version timestamps (esp. given the caveats in the HBase book doc re: issues > with deletions and TTLs). > > 2) About suggestions for optimal column naming for potentially large > numbers of different column groupings for very wide tables. > > Thanks, > - Ken > > On Tue, Jun 7, 2016 at 10:52 PM Ken Hampson <hamps...@gmail.com> wrote: > > > Hi: > > > > I'm currently using HBase 1.1.2 and am in the process of determining how > > best to proceed with the column layout for an upcoming expansion of our > > data pipeline. > > > > Background: > > > > Table A: billions of rows, 1.3 TB (with snappy compression), rowkey is > sha1 > > Table B: billions of rows (more than Table A), 1.8 TB (with snappy > > compression), rowkey is sha1 > > > > > > These tables represent data obtained via a combination batch/streaming > > process. We want to expand our data pipeline to run an assortment of > > analyses on these tables (both batch and streaming) and be able to store > > the results in each table as appropriate. Table A is a set of unique > > entries with some example data, whereas Table B is correlated to Table A > > (via Table A's sha1), but is not de-duplicated (that is to say, it > contains > > contextual data). > > > > For the expansion of the data pipeline, we want to store the data either > > in Table A if context is not needed, and Table B if context is needed. > > Since we have a theoretically unlimited number of different analyses that > > we may want to perform and store the results for (that is to say, I need > to > > assume there will be a substantial number of data sets that need to be > > stored in these tables, which will grow over time and could each > themselves > > potentially be somewhat wide in terms of columns). > > > > Originally, I had considered storing these in column families, where each > > analysis is grouped together in a different column family. However, I > have > > read in the HBase book documentation that HBase does not perform well > with > > many column families (a few default, ~10 max), so I have discarded this > > option. > > > > The next two options both involve using wide tables with many columns in > a > > separate column family (e.g. "d"), where all the various analysis would > be > > grouped into the same family in a potentially wide amount of columns in > > total. Each of these analyses needs to maintain their own versions so we > > can correlate the data from each one. The variants which come to mind to > > accomplish that, and on which I would appreciate some feedback on are: > > > >1. Use HBase's native versioning to store the version of the analysis > >2. Encode a version in the column name itself > > > > I know the HBase native versions use the server's timestamp by default, > > but can take any long value. So we could assign a particular time value > to > > be a version of a particular analysis. However, the doc also warned that > > there could be negative ramifications of this because HBase uses the > > versions internally for things like TTL for deletes/maintenance. Do > people > > use versions in this way? Are the TTL issues of great concern? (We li
Re: [ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm
Cool, Thanks. Let me send the talk proposal to higher management. On Wed, Apr 27, 2016 at 8:16 AM, James Taylor <jamestay...@apache.org> wrote: > Yes, that sounds great - please let me know when I can add you to the > agenda. > > James > > On Tuesday, April 26, 2016, Anil Gupta <anilgupt...@gmail.com> wrote: > > > Hi James, > > I spoke to my manager and he is fine with the idea of giving the talk. > > Now, he is gonna ask higher management for final approval. I am assuming > > there is still a slot for my talk in use case srction. I should go ahead > > with my approval process. Correct? > > > > Thanks, > > Anil Gupta > > Sent from my iPhone > > > > > On Apr 26, 2016, at 5:56 PM, James Taylor <jamestay...@apache.org > > <javascript:;>> wrote: > > > > > > We invite you to attend the inaugural PhoenixCon on Wed, May 25th > 9am-1pm > > > (the day after HBaseCon) hosted by Salesforce.com in San Francisco. > There > > > will be two tracks: one for use cases and one for internals. Drop me a > > note > > > if you're interested in giving a talk. To RSVP and for more details, > see > > > here[1]. > > > > > > Thanks, > > > James > > > > > > [1] > > http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182 > > > -- Thanks & Regards, Anil Gupta
Re: [ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm
Hi James, I spoke to my manager and he is fine with the idea of giving the talk. Now, he is gonna ask higher management for final approval. I am assuming there is still a slot for my talk in use case srction. I should go ahead with my approval process. Correct? Thanks, Anil Gupta Sent from my iPhone > On Apr 26, 2016, at 5:56 PM, James Taylor <jamestay...@apache.org> wrote: > > We invite you to attend the inaugural PhoenixCon on Wed, May 25th 9am-1pm > (the day after HBaseCon) hosted by Salesforce.com in San Francisco. There > will be two tracks: one for use cases and one for internals. Drop me a note > if you're interested in giving a talk. To RSVP and for more details, see > here[1]. > > Thanks, > James > > [1] http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182
Re: is it a good idea to disable tables not currently hot?
ncer will be confused when regions come and go. And I cannot > > > > afford not to have it running in case of region server crashes and > > > > come back. So doesn’t anyone have good ideas how to handle it? > > > > > > > > I already doing compact myself so that is not an issue. > > > > > > > > Another related question, if a region is enabled but not active > > > > read/write, how much resources it takes in terms of region server? > > > > > > > > Thanks! > > > > > > > > Frank Luo > > > > > > > > > > Merkle was named a leader in Customer Insights Services Providers by > > > Forrester Research < > > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing- > > > ag > > > ency/awards-recognition/merkle-named-leader-forrester?utm_source=ema > > > il footer_medium=email_campaign=2016MonthlyEmployeeFooter > > > > > > > > > > Forrester Research report names 500friends, a Merkle Company, a > > > leader in customer Loyalty Solutions for Midsize Organizations< > > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing- > > > ag > > > ency/awards-recognition/500friends-merkle-company-named?utm_source=e > > > ma ilfooter_medium=email_campaign=2016MonthlyEmployeeFooter > > > > > > > This email and any attachments transmitted with it are intended for > > > use by the intended recipient(s) only. If you have received this > > > email in error, please notify the sender immediately and then delete > > > it. If you are not the intended recipient, you must not keep, use, > > > disclose, copy or distribute this email without the author’s prior > permission. > > > We take precautions to minimize the risk of transmitting software > > > viruses, but we advise you to perform your own virus checks on any > > > attachment to this message. We cannot accept liability for any loss > > > or damage caused by software viruses. The information contained in > > > this communication may be confidential and may be subject to the > > attorney-client privilege. > > > > > Merkle was named a leader in Customer Insights Services Providers by > > Forrester Research < > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-ag > > ency/awards-recognition/merkle-named-leader-forrester?utm_source=email > > footer_medium=email_campaign=2016MonthlyEmployeeFooter > > > > > > > Forrester Research report names 500friends, a Merkle Company, a leader > > in customer Loyalty Solutions for Midsize Organizations< > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-ag > > ency/awards-recognition/500friends-merkle-company-named?utm_source=ema > > ilfooter_medium=email_campaign=2016MonthlyEmployeeFooter > > > > > This email and any attachments transmitted with it are intended for > > use by the intended recipient(s) only. If you have received this email > > in error, please notify the sender immediately and then delete it. If > > you are not the intended recipient, you must not keep, use, disclose, > > copy or distribute this email without the author’s prior permission. > > We take precautions to minimize the risk of transmitting software > > viruses, but we advise you to perform your own virus checks on any > > attachment to this message. We cannot accept liability for any loss or > > damage caused by software viruses. The information contained in this > > communication may be confidential and may be subject to the > attorney-client privilege. > > > Merkle was named a leader in Customer Insights Services Providers by > Forrester Research > < > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/merkle-named-leader-forrester?utm_source=emailfooter_medium=email_campaign=2016MonthlyEmployeeFooter > > > > Forrester Research report names 500friends, a Merkle Company, a leader in > customer Loyalty Solutions for Midsize Organizations< > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/500friends-merkle-company-named?utm_source=emailfooter_medium=email_campaign=2016MonthlyEmployeeFooter > > > This email and any attachments transmitted with it are intended for use by > the intended recipient(s) only. If you have received this email in error, > please notify the sender immediately and then delete it. If you are not the > intended recipient, you must not keep, use, disclose, copy or distribute > this email without the author’s prior permission. We take precautions to > minimize the risk of transmitting software viruses, but we advise you to > perform your own virus checks on any attachment to this message. We cannot > accept liability for any loss or damage caused by software viruses. The > information contained in this communication may be confidential and may be > subject to the attorney-client privilege. > -- Thanks & Regards, Anil Gupta
Re: Spark on Hbase
Apart from Phoenix Spark connector. You can also have a look at: https://github.com/Huawei-Spark/Spark-SQL-on-HBase On Wed, Mar 9, 2016 at 4:58 PM, Divya Gehlot <divya.htco...@gmail.com> wrote: > I agree with Talat > As couldn't connect directly with Hbase > Connecting it through Phoenix . > If you are using Hortonworks distribution ,it comes with Phoenix. > > Thanks, > Divya > On Mar 10, 2016 3:04 AM, "Talat Uyarer" <ta...@uyarer.com> wrote: > > > Hi, > > > > Have you ever tried Apache phoenix ? They have spark solution[1]. I > > have just started to use on spark. I haven't tried it with spark > > streaming. > > > > [1] http://phoenix.apache.org/phoenix_spark.html > > > > 2016-03-08 22:04 GMT-08:00 Rachana Srivastava > > <rachanasrivas...@yahoo.com.invalid>: > > > I am trying to integrate SparkStreaming with HBase. I am calling > > following APIs to connect to HBase > > > > > > HConnection hbaseConnection = > > HConnectionManager.createConnection(conf);hBaseTable = > > hbaseConnection.getTable(hbaseTable); > > > Since I cannot get the connection and broadcast the connection each API > > call to get data from HBase is very expensive. I tried using > > JavaHBaseContext (JavaHBaseContext hbaseContext = new > JavaHBaseContext(jsc, > > conf)) by using hbase-spark library in CDH 5.5 but I cannot import the > > library from maven. Has anyone been able to successfully resolve this > > issue. > > > > > > > > -- > > Talat UYARER > > Websitesi: http://talat.uyarer.com > > Twitter: http://twitter.com/talatuyarer > > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 > > > -- Thanks & Regards, Anil Gupta
Re: Ruby gem of Apache Phoenix: https://rubygems.org/gems/ruby-phoenix/versions/0.0.8
My bad. 2nd time in a week i used wrong mailing list. Please ignore. On Tue, Mar 8, 2016 at 5:34 PM, Sean Busbey <bus...@cloudera.com> wrote: > Hi Anil! > > You should contact the Apache Phoenix community for this question. > > Details on subscribing to their user list can be found here: > > http://mail-archives.apache.org/mod_mbox/phoenix-user/ > > On Tue, Mar 8, 2016 at 4:54 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > Hi, > > > > One of our ruby apps might be using this ruby gem( > > https://rubygems.org/gems/ruby-phoenix/versions/0.0.8) to query > Phoenix. I > > dont know programming in Ruby. > > This gem is listing Phoenix4.2 as dependency. We are running Phoenix4.4. > > So, i am curious to know whether we would be able to connect to > Phoenix4.4 > > with a ruby gem of Phoenix4.2? If not, then what we would need to > > do?(upgrade ruby gem to Phoenix4.4?) > > > > Here is the git: https://github.com/wxianfeng/ruby-phoenix > > -- > > Thanks & Regards, > > Anil Gupta > > > > > > -- > busbey > -- Thanks & Regards, Anil Gupta
Re: Database browser tools for Phoenix on Mac
Oh my bad. I m on wrong mailing list. Didn't notice my mistake. Thanks for the reminder, Stack. On Tue, Mar 8, 2016 at 5:10 PM, Stack <st...@duboce.net> wrote: > On Tue, Mar 8, 2016 at 4:57 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > Yeah, i have looked at that. Non-commercial only provides very basic > > feature. > > I have just tried DBeaver(http://dbeaver.jkiss.org/download/). Its based > > on > > Eclipse framework and its UI looks much better. > > DBeaver supports Cassandra and MongoDB out of the box. It would be great > if > > it start supporting Phoenix out of the box. > > > > > You pinged the Phoenix phellows Anil? > St.Ack > > > > > On Sat, Mar 5, 2016 at 12:04 PM, Rohit Jain <rohit.j...@esgyn.com> > wrote: > > > > > You probably already looked at dbVisualizer > > > > > > Rohit > > > > > > On Mar 5, 2016, at 1:25 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > > > > Hi, > > > > > > I have been using SquirrelSql to query Phoenix. For oracle/sql server, > i > > > have been using SQLDeveloper. > > > I feel like SquirrelSql has a lot of room for improvement when i > compare > > it > > > with SQLDeveloper GUI. > > > > > > > > > I tried to register Phoenix JDBC driver with SQLDeveloper, but i > haven't > > > been successful. Has anyone being successful. > > > > > > I would like to know what other Database browser tools people are using > > to > > > connect. > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > PS: I would prefer to use Database browser tools to query a database > that > > > itself has Apache License. :) > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: Database browser tools for Phoenix on Mac
Yeah, i have looked at that. Non-commercial only provides very basic feature. I have just tried DBeaver(http://dbeaver.jkiss.org/download/). Its based on Eclipse framework and its UI looks much better. DBeaver supports Cassandra and MongoDB out of the box. It would be great if it start supporting Phoenix out of the box. On Sat, Mar 5, 2016 at 12:04 PM, Rohit Jain <rohit.j...@esgyn.com> wrote: > You probably already looked at dbVisualizer > > Rohit > > On Mar 5, 2016, at 1:25 PM, anil gupta <anilgupt...@gmail.com> wrote: > > Hi, > > I have been using SquirrelSql to query Phoenix. For oracle/sql server, i > have been using SQLDeveloper. > I feel like SquirrelSql has a lot of room for improvement when i compare it > with SQLDeveloper GUI. > > > I tried to register Phoenix JDBC driver with SQLDeveloper, but i haven't > been successful. Has anyone being successful. > > I would like to know what other Database browser tools people are using to > connect. > > -- > Thanks & Regards, > Anil Gupta > > PS: I would prefer to use Database browser tools to query a database that > itself has Apache License. :) > -- Thanks & Regards, Anil Gupta
Ruby gem of Apache Phoenix: https://rubygems.org/gems/ruby-phoenix/versions/0.0.8
Hi, One of our ruby apps might be using this ruby gem( https://rubygems.org/gems/ruby-phoenix/versions/0.0.8) to query Phoenix. I dont know programming in Ruby. This gem is listing Phoenix4.2 as dependency. We are running Phoenix4.4. So, i am curious to know whether we would be able to connect to Phoenix4.4 with a ruby gem of Phoenix4.2? If not, then what we would need to do?(upgrade ruby gem to Phoenix4.4?) Here is the git: https://github.com/wxianfeng/ruby-phoenix -- Thanks & Regards, Anil Gupta
Database browser tools for Phoenix on Mac
Hi, I have been using SquirrelSql to query Phoenix. For oracle/sql server, i have been using SQLDeveloper. I feel like SquirrelSql has a lot of room for improvement when i compare it with SQLDeveloper GUI. I tried to register Phoenix JDBC driver with SQLDeveloper, but i haven't been successful. Has anyone being successful. I would like to know what other Database browser tools people are using to connect. -- Thanks & Regards, Anil Gupta PS: I would prefer to use Database browser tools to query a database that itself has Apache License. :)
Re: Calling Coprocessor via HBase Thrift or RestService
Also came across this: https://issues.apache.org/jira/browse/HBASE-6790 HBASE-6790 is also unresolved. On Sun, Feb 28, 2016 at 10:26 PM, anil gupta <anilgupt...@gmail.com> wrote: > Hi, > > A non java app would like to use AggregateImplementation( > https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html > ) > Is it possible to use HBase Thrift gateway or Stargate(Rest gateway) to > make calls to AggregateImplementation coprocessor? If yes, can you also > tell me how to make calls. > I came across this: https://issues.apache.org/jira/browse/HBASE-5600 . > But, its unresolved. > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta
Calling Coprocessor via HBase Thrift or RestService
Hi, A non java app would like to use AggregateImplementation( https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html ) Is it possible to use HBase Thrift gateway or Stargate(Rest gateway) to make calls to AggregateImplementation coprocessor? If yes, can you also tell me how to make calls. I came across this: https://issues.apache.org/jira/browse/HBASE-5600 . But, its unresolved. -- Thanks & Regards, Anil Gupta
Re: Two questions about the maximum number of versions of a column family
If its possible to make the timestamps as a suffix of your rowkey(assuming the rowkey is composite) then you would not run into read/write hotspots. Have a look at open tsdb data model that scales really really well. Sent from my iPhone > On Feb 21, 2016, at 10:28 AM, Stephen Durfeywrote: > > I personally don't deal with time series data, so I'm not going to make a > statement on which is better. I would think from a scanning viewpoint putting > the time stamp in the row key is easier, but that will introduce scanning > performance bottlenecks due to the row keys being stored lexicographically. > All data from the same date range will end up in the same region or regions > (this is causes hot spots) reducing the number of tasks you get for reads, > thus increasing extraction time. > One method to deal with this is salting your row keys to get an even > distribution of data around the cluster. Cloudera recently had a good post > about this on their blog: > http://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key-ranges-in-mapreduce/ > > > > > > On Sun, Feb 21, 2016 at 9:47 AM -0800, "Daniel" wrote: > > > > > > > > > > > Thanks for your sharing, Stephen and Ted. The reference guide recommends > "rows" over "versions" concerning time series data. Are there advantages of > using "reversed timestamps" in row keys over the built-in "versions" with > regard to scanning performance? > > -- Original -- > From: "Ted Yu" > Date: Mon, Feb 22, 2016 01:02 AM > To: "user@hbase.apache.org"; > Subject: Re: Two questions about the maximum number of versions of a column > family > > > Thanks for sharing, Stephen. > > bq. scan performance on the region servers needing to scan over all that > data you may not need > > When number of versions is large, try to utilize Filters (where > appropriate) which implements: > > public Cell getNextCellHint(Cell currentKV) { > > See MultiRowRangeFilter for example. > > > Please see hbase-shell/src/main/ruby/shell/commands/alter.rb for syntax on > how to alter table. When "hbase.online.schema.update.enable" is true, table > can stay online during the change. > > Cheers > >> On Sun, Feb 21, 2016 at 8:20 AM, Stephen Durfey wrote: >> >> Someone please correct me if I am wrong. >> I've looked into this recently due to some performance reasons with my >> tables in a production environment. Like the books says, I don't recommend >> keeping this many versions around unless you really need them. Telling >> HBase to keep around a very large number doesn't waste space, that's just a >> value in the table descriptor. So, I wouldn't worry about that. The >> problems are going to come in when you actually write out those versions. >> My tables currently have max_versions set and roughly 40% of the tables >> are due to historical versions. So, one table in particular is around 25 >> TB. I don't have a need to keep this many versions, so I am working on >> changing the max versions to the default of 3 (some cells are hundreds or >> thousands of cells deep). The issue youll run into is scan performance on >> the region servers needing to scan over all that data you may not need (due >> to large store files). This could lead to increased scan time and >> potentially scanner timeouts, depending upon how large your batch size is >> set on the scan. >> I assume this has some performance impact on compactions, both minor and >> major, but I didn't investigate that, and potentially on the write path, >> but also not something I looked into. >> Changing the number of versions after the table has been created doesn't >> have a performance impact due to just being a metadata change. The table >> will need to be disabled, changed, and re-enabled again. If this is done >> through a script the table could be offline for a couple of seconds. The >> only concern around that are users of the table. If they have scheduled job >> runs that hit that table that would break if they try to read from it while >> the table is disabled. The only performance impact I can think of around >> this change would be major compaction of the table, but even that shouldn't >> be an issue. >> >> >>_ >> From: Daniel >> Sent: Sunday, February 21, 2016 9:22 AM >> Subject: Two questions about the maximum number of versions of a column >> family >> To: user >> >> >> Hi, I have two questions about the maximum number of versions of a column >> family: >> >> (1) Is it OK to set a very large (>100,000) maximum number of versions for >> a column family? >> >> The reference guide says "It is not recommended setting the number of max >> versions to an exceedingly high level (e.g., hundreds or more) unless those >> old values are very dear to you because this will greatly increase >> StoreFile size." (Chapter 36.1) >> >> I'm new to the Hadoop
Re: Rename tables or swap alias
I dont think there is any atomic operations in hbase to support ddl across 2 tables. But, maybe you can use hbase snapshots. 1.Create a hbase snapshot. 2.Truncate the table. 3.Write data to the table. 4.Create a table from snapshot taken in step #1 as table_old. Now you have two tables. One with current run data and other with last run data. I think above process will suffice. But, keep in mind that it is not atomic. HTH, Anil Sent from my iPhone > On Feb 15, 2016, at 4:25 PM, Pat Ferrelwrote: > > Any other way to do what I was asking. With Spark this is a very normal thing > to treat a table as immutable and create another to replace the old. > > Can you lock two tables and rename them in 2 actions then unlock in a very > short period of time? > > Or an alias for table names? > > Didn’t see these in any docs or Googling, any help is appreciated. Writing > all this data back to the original table would be a huge load on a table > being written to by external processes and therefore under large load to > begin with. > >> On Feb 14, 2016, at 5:03 PM, Ted Yu wrote: >> >> There is currently no native support for renaming two tables in one atomic >> action. >> >> FYI >> >>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel wrote: >>> >>> I use Spark to take an old table, clean it up to create an RDD of cleaned >>> data. What I’d like to do is write all of the data to a new table in HBase, >>> then rename the table to the old name. If possible it could be done by >>> changing an alias to point to the new table as long as all external code >>> uses the alias, or by a 2 table rename operation. But I don’t see how to do >>> this for HBase. I am dealing with a lot of data so don’t want to do table >>> modifications with deletes and upserts, this would be incredibly slow. >>> Furthermore I don’t want to disable the table for more than a tiny span of >>> time. >>> >>> Is it possible to have 2 tables and rename both in an atomic action, or >>> change some alias to point to the new table in an atomic action. If not >>> what is the quickest way to achieve this to minimize time disabled. >
Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish
I figured out the problem. We have phoenix.upsert.batch.size set to 10 in hbase-site.xml but somehow that property is **not getting picked up in our oozie workflow** When i am explicitly setting phoenix.upsert.batch.size property in my oozie workflow then my job ran successfully. By default, phoenix.upsert.batch.size is 1000. Hence, the commits were failing with a huge batch size of 1000. Thanks, Anil Gupta On Sun, Feb 14, 2016 at 8:03 PM, Heng Chen <heng.chen.1...@gmail.com> wrote: > I am not sure whether "upsert batch size in phoenix" equals HBase Client > batch puts size or not. > > But as log shows, it seems there are 2000 actions send to hbase one time. > > 2016-02-15 11:38 GMT+08:00 anil gupta <anilgupt...@gmail.com>: > >> My phoenix upsert batch size is 50. You mean to say that 50 is also a lot? >> >> However, AsyncProcess is complaining about 2000 actions. >> >> I tried with upsert batch size of 5 also. But it didnt help. >> >> On Sun, Feb 14, 2016 at 7:37 PM, anil gupta <anilgupt...@gmail.com> >> wrote: >> >> > My phoenix upsert batch size is 50. You mean to say that 50 is also a >> lot? >> > >> > However, AsyncProcess is complaining about 2000 actions. >> > >> > I tried with upsert batch size of 5 also. But it didnt help. >> > >> > >> > On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com> >> > wrote: >> > >> >> 2016-02-14 12:34:23,593 INFO [main] >> >> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions to finish >> >> >> >> It means your writes are too many, please decrease the batch size of >> your >> >> puts, and balance your requests on each RS. >> >> >> >> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>: >> >> >> >> > After a while we also get this error: >> >> > 2016-02-14 12:45:10,515 WARN [main] >> >> > org.apache.phoenix.execute.MutationState: Swallowing exception and >> >> > retrying after clearing meta cache on connection. >> >> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached >> index >> >> > metadata. ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find >> >> > cached index metadata. key=-594230549321118802 >> >> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. >> Index >> >> > update failed >> >> > >> >> > We have already set: >> >> > >> >> > >> >> >> phoenix.coprocessor.maxServerCacheTimeToLiveMs18 >> >> > >> >> > Upset batch size is 50. Write are quite frequent so the cache would >> >> > not timeout in 18ms >> >> > >> >> > >> >> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com> >> >> > wrote: >> >> > >> >> > > Hi, >> >> > > >> >> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4). >> >> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on >> >> > failing >> >> > > due to following error: >> >> > > >> >> > > 2016-02-14 12:29:43,182 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:29:53,197 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:30:03,212 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:30:13,225 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:30:23,239 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:30:33,253 INFO [main] >> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> >> actions >> >> > to finish >> >> > > 2016-02-14 12:30:43,266 INFO [main] >> >> > org.apache.had
Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish
My phoenix upsert batch size is 50. You mean to say that 50 is also a lot? However, AsyncProcess is complaining about 2000 actions. I tried with upsert batch size of 5 also. But it didnt help. On Sun, Feb 14, 2016 at 7:37 PM, anil gupta <anilgupt...@gmail.com> wrote: > My phoenix upsert batch size is 50. You mean to say that 50 is also a lot? > > However, AsyncProcess is complaining about 2000 actions. > > I tried with upsert batch size of 5 also. But it didnt help. > > > On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com> > wrote: > >> 2016-02-14 12:34:23,593 INFO [main] >> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions to finish >> >> It means your writes are too many, please decrease the batch size of your >> puts, and balance your requests on each RS. >> >> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>: >> >> > After a while we also get this error: >> > 2016-02-14 12:45:10,515 WARN [main] >> > org.apache.phoenix.execute.MutationState: Swallowing exception and >> > retrying after clearing meta cache on connection. >> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index >> > metadata. ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find >> > cached index metadata. key=-594230549321118802 >> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index >> > update failed >> > >> > We have already set: >> > >> > >> phoenix.coprocessor.maxServerCacheTimeToLiveMs18 >> > >> > Upset batch size is 50. Write are quite frequent so the cache would >> > not timeout in 18ms >> > >> > >> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com> >> > wrote: >> > >> > > Hi, >> > > >> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4). >> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on >> > failing >> > > due to following error: >> > > >> > > 2016-02-14 12:29:43,182 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:29:53,197 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:03,212 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:13,225 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:23,239 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:33,253 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:43,266 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:30:53,279 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:03,293 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:13,305 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:23,318 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:33,331 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:43,345 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:31:53,358 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:32:03,371 INFO [main] >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 >> actions >> > to finish >> > > 2016-02-14 12:32:13,385 INFO [main] >> > org.apa
Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish
My phoenix upsert batch size is 50. You mean to say that 50 is also a lot? However, AsyncProcess is complaining about 2000 actions. I tried with upsert batch size of 5 also. But it didnt help. On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com> wrote: > 2016-02-14 12:34:23,593 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions to finish > > It means your writes are too many, please decrease the batch size of your > puts, and balance your requests on each RS. > > 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>: > > > After a while we also get this error: > > 2016-02-14 12:45:10,515 WARN [main] > > org.apache.phoenix.execute.MutationState: Swallowing exception and > > retrying after clearing meta cache on connection. > > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index > > metadata. ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find > > cached index metadata. key=-594230549321118802 > > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index > > update failed > > > > We have already set: > > > > > phoenix.coprocessor.maxServerCacheTimeToLiveMs18 > > > > Upset batch size is 50. Write are quite frequent so the cache would > > not timeout in 18ms > > > > > > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com> > > wrote: > > > > > Hi, > > > > > > We are using phoenix4.4, hbase 1.1(hdp2.3.4). > > > I have a MR job that is using PhoenixOutputFormat. My job keeps on > > failing > > > due to following error: > > > > > > 2016-02-14 12:29:43,182 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:29:53,197 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:03,212 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:13,225 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:23,239 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:33,253 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:43,266 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:30:53,279 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:03,293 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:13,305 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:23,318 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:33,331 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:43,345 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:31:53,358 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:03,371 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:13,385 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:23,399 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:33,412 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:43,428 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > > to finish > > > 2016-02-14 12:32:53,443 INFO [main] > > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 > actions > >
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish
Hi, We are using phoenix4.4, hbase 1.1(hdp2.3.4). I have a MR job that is using PhoenixOutputFormat. My job keeps on failing due to following error: 2016-02-14 12:29:43,182 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:29:53,197 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:03,212 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:13,225 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:23,239 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:33,253 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:43,266 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:30:53,279 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:03,293 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:13,305 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:23,318 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:33,331 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:43,345 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:31:53,358 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:03,371 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:13,385 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:23,399 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:33,412 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:43,428 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:32:53,443 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:03,457 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:13,472 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:23,486 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:33,524 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:43,538 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:33:53,551 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:34:03,565 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:34:03,953 INFO [hconnection-0xe82ca6e-shared--pool2-t16] org.apache.hadoop.hbase.client.AsyncProcess: #1, table=BI.SALES, attempt=10/35 failed=2000ops, last exception: null on hdp3.truecar.com,16020,1455326291512, tracking started null, retrying after=10086ms, replay=2000ops 2016-02-14 12:34:13,578 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish 2016-02-14 12:34:23,593 INFO [main] org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish I have never seen anything like this. Can anyone give me pointers about this problem? -- Thanks & Regards, Anil Gupta
Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish
After a while we also get this error: 2016-02-14 12:45:10,515 WARN [main] org.apache.phoenix.execute.MutationState: Swallowing exception and retrying after clearing meta cache on connection. java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index metadata. ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find cached index metadata. key=-594230549321118802 region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index update failed We have already set: phoenix.coprocessor.maxServerCacheTimeToLiveMs18 Upset batch size is 50. Write are quite frequent so the cache would not timeout in 18ms On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com> wrote: > Hi, > > We are using phoenix4.4, hbase 1.1(hdp2.3.4). > I have a MR job that is using PhoenixOutputFormat. My job keeps on failing > due to following error: > > 2016-02-14 12:29:43,182 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:29:53,197 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:03,212 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:13,225 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:23,239 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:33,253 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:43,266 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:30:53,279 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:03,293 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:13,305 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:23,318 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:33,331 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:43,345 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:31:53,358 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:03,371 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:13,385 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:23,399 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:33,412 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:43,428 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:32:53,443 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:03,457 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:13,472 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:23,486 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:33,524 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:43,538 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:33:53,551 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:34:03,565 INFO [main] > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to > finish > 2016-02-14 12:34:03,953 INFO [hconnection-0xe82ca6e-shared--pool2-t16] > org.apache.hadoop.hbase.client.AsyncProcess: #1, table=BI.SALES, > attempt=10/35 failed=2000ops, last exception: null on > hdp3.truecar.com,16020,1455326291512, tracking started null, retrying > after=10086ms, replay=2000ops > 2016-02-14 12:34:13,578 INFO [main] > org.
Re: Java API vs Hbase Thrift
You are not gonna gain much by using Rest service of HBase. You need to use native java api of HBase for gaining performance. Similar to thrift, Rest service also has an extra hop. Sent from my iPhone > On Jan 21, 2016, at 1:03 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> > wrote: > > Hi, > > As you all said I have tried Rest web service using Hbase Java API to > get data from Hbase table but it seems to be slower than that of one using > Hbase thrift server. > > can any one tell how ? > > Thanks > >> On Sat, Jan 16, 2016 at 5:41 PM, Zheng Shen <zhengshe...@outlook.com> wrote: >> >> Java API is at least 10 times faster than thrift on Hbase write >> operations based on my experience in production environment (cloudera >> 5.4.7, hbase 1.0.0) >> >> Zheng >> >> ---Original--- >> From: "Vladimir Rodionov "<vladrodio...@gmail.com> >> Date: 2016/1/15 06:31:34 >> To: "user@hbase.apache.org"<user@hbase.apache.org>; >> Subject: Re: Java API vs Hbase Thrift >> >> >>>> I have to access hbase using Java API will it be fast like thrift. >> >> Bear in mind that when you use Thrift Gateway/Thrift API you access HBase >> RegionServer through the single gateway server, >> when you use Java API - you access Region Server directly. >> Java API is much more scalable. >> >> -Vlad >> >>> On Tue, Jan 12, 2016 at 7:36 AM, Anil Gupta <anilgupt...@gmail.com> wrote: >>> >>> Java api should be same or better in performance as compared to Thrift >> api. >>> With Thrift api there is an extra hop. So, most of the time java api >> would >>> be better for performance. >>> >>> Sent from my iPhone >>> >>>> On Jan 12, 2016, at 4:29 AM, Rajeshkumar J < >> rajeshkumarit8...@gmail.com> >>> wrote: >>>> >>>> Hi, >>>> >>>> I am currently accessing records via Hbase thrift server and it is >> fast. >>>> If I have to access hbase using Java API will it be fast like thrift. >>>> >>>> Thanks >> >>
Re: Run hbase shell script from java
Hey Serega, Have you tried using Java API of HBase to create table? IMO, invoking a shell script from java program to create a table might not be the most elegant way. Have a look at https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html HTH, Anil Gupta On Wed, Jan 13, 2016 at 1:30 PM, Serega Sheypak <serega.shey...@gmail.com> wrote: > Hi, is there any easy way/example/howto to run 'create table ' shell script > from java? > Usecase: I'm tired to write table DDL in shell script and in Java for > integration testing. I want to run shell script table DDL from java. > Thanks! > -- Thanks & Regards, Anil Gupta
Re: Java API vs Hbase Thrift
Java api should be same or better in performance as compared to Thrift api. With Thrift api there is an extra hop. So, most of the time java api would be better for performance. Sent from my iPhone > On Jan 12, 2016, at 4:29 AM, Rajeshkumar J> wrote: > > Hi, > > I am currently accessing records via Hbase thrift server and it is fast. > If I have to access hbase using Java API will it be fast like thrift. > > Thanks
Re: Type of Scan to be used for real time analysis
Hi RajeshKumar, IMO, type of scan is not decided on the basis of response time. Its decided on the basis of your query logic and data model. Also, Response time cannot be directly correlated to any filter or scan. Response time is more about how much data needs to read, cpu, network IO, etc to suffice requirement of your query. So, you will need to look at your data model and pick the best query. HTH, Anil On Thu, Dec 17, 2015 at 10:17 PM, Rajeshkumar J <rajeshkumarit8...@gmail.com > wrote: > Hi, > >My hbase table holds 10 million rows and I need to query it and I want > hbase to return the query within one or two seconds. Help me to choose > which type of scan do I have to use for this - range scan or rowfilter scan > > Thanks > -- Thanks & Regards, Anil Gupta
Re: Type of Scan to be used for real time analysis
If you know exact rowkey of row that you need to fetch then you just need to use GET. If you know just the prefix of rowkey, then you can use range scans in HBase. Does the above 2 scenario's cover your use case? On Fri, Dec 18, 2015 at 4:29 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> wrote: > Hi Anil, > >I have about 10 million rows with each rows having more than 10k > columns. I need to query this table based on row key and which will be the > apt query process for this > > Thanks > > On Fri, Dec 18, 2015 at 5:43 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > Hi RajeshKumar, > > > > IMO, type of scan is not decided on the basis of response time. Its > decided > > on the basis of your query logic and data model. > > Also, Response time cannot be directly correlated to any filter or scan. > > Response time is more about how much data needs to read, cpu, network IO, > > etc to suffice requirement of your query. > > So, you will need to look at your data model and pick the best query. > > > > HTH, > > Anil > > > > On Thu, Dec 17, 2015 at 10:17 PM, Rajeshkumar J < > > rajeshkumarit8...@gmail.com > > > wrote: > > > > > Hi, > > > > > >My hbase table holds 10 million rows and I need to query it and I > want > > > hbase to return the query within one or two seconds. Help me to choose > > > which type of scan do I have to use for this - range scan or rowfilter > > scan > > > > > > Thanks > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Replicating only One column column family of HBase table
Hi, We have a requirement in which we want to replicate only one CF of a table whereas that table has 2 CF. I believe, its possible because replication_scope is set on CF level(in my case, i'll set replication_scope=1 on only one CF). Unfortunately, i dont have access to infrastructure to test this hypothesis. So, i would like to confirm this on mailing list. Please let me know. -- Thanks & Regards, Anil Gupta
Re: Replicating only One column column family of HBase table
Hi Ted, So, as per the jira, answer to my question is YES. We are running HDP2.3.0. That jira got fixed in 0.98.1. So, we should be fine. Thanks, Anil Gupta On Thu, Oct 29, 2015 at 12:27 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Please take a look at: > https://issues.apache.org/jira/browse/HBASE-8751 > > On Thu, Oct 29, 2015 at 11:33 AM, anil gupta <anilgupt...@gmail.com> > wrote: > > > Hi, > > > > We have a requirement in which we want to replicate only one CF of a > table > > whereas that table has 2 CF. > > > > I believe, its possible because replication_scope is set on CF level(in > my > > case, i'll set replication_scope=1 on only one CF). Unfortunately, i dont > > have access to infrastructure to test this hypothesis. > > So, i would like to confirm this on mailing list. Please let me know. > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: Replicating only One column column family of HBase table
Update: We tried and it worked. On Thu, Oct 29, 2015 at 1:24 PM, anil gupta <anilgupt...@gmail.com> wrote: > Hi Ted, > > So, as per the jira, answer to my question is YES. > We are running HDP2.3.0. That jira got fixed in 0.98.1. So, we should be > fine. > > Thanks, > Anil Gupta > > On Thu, Oct 29, 2015 at 12:27 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please take a look at: >> https://issues.apache.org/jira/browse/HBASE-8751 >> >> On Thu, Oct 29, 2015 at 11:33 AM, anil gupta <anilgupt...@gmail.com> >> wrote: >> >> > Hi, >> > >> > We have a requirement in which we want to replicate only one CF of a >> table >> > whereas that table has 2 CF. >> > >> > I believe, its possible because replication_scope is set on CF level(in >> my >> > case, i'll set replication_scope=1 on only one CF). Unfortunately, i >> dont >> > have access to infrastructure to test this hypothesis. >> > So, i would like to confirm this on mailing list. Please let me know. >> > >> > -- >> > Thanks & Regards, >> > Anil Gupta >> > >> > > > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta
Re: Opinions wanted: new site skin
Hi, Sample website does not looks good on Iphone6. Its content is unreadable since page layout is not using width of iphone screen. Thanks, Anil On Tue, Oct 27, 2015 at 6:29 PM, Misty Stanley-Jones < mstanleyjo...@cloudera.com> wrote: > If you looked right away, please look again. I didn't realize that a weird > font was being used from Google Fonts, because it was not loading locally > for me. That's been fixed now and a more normal readable font (in my > opinion) is being used. > > On Wed, Oct 28, 2015 at 10:03 AM, Misty Stanley-Jones < > mstanleyjo...@cloudera.com> wrote: > > > All, > > > > Here is another version for your consideration. Please check it out at > > different resolutions and browser sizes if you can. > > http://mstanleyjones.github.io/hbase/reflow_update/index.html > > > > If you go to > > http://mstanleyjones.github.io/hbase/reflow_update/dependency-info.html > > and a few other parts of the site, you will notice the built-in syntax > > highlighting. > > > > This version does not have a site search, and I have no clue how to add > > the Hadoop site search, Stack. Maybe that can be a phase 2 where someone > > smarter can help me figure it out. > > > > Thanks for your help, > > Misty > > > > On Fri, Oct 23, 2015 at 3:17 PM, Misty Stanley-Jones < > > mstanleyjo...@cloudera.com> wrote: > > > >> Hi all, > >> > >> We are currently using the reFlow Maven site skin. I went looking around > >> and found Fluido, which seems to be a bit more extensible. I built and > >> staged a version of the site at > >> http://mstanleyjones.github.io/hbase/index.html. Note the Github ribbon > >> and the Google site search. I'm curious to know what you think. > >> > >> I also put the 0.94 docs menu as a submenu of the Documentation menu, to > >> see how it looked. > >> > >> Thanks, > >> Misty > >> > > > > > -- Thanks & Regards, Anil Gupta
Re: Opinions wanted: new site skin
Here u go: Sent from my iPhone > On Oct 28, 2015, at 3:40 PM, Misty Stanley-Jones <mstanleyjo...@cloudera.com> > wrote: > > You're looking at the wrong staged site. Please look at the one in the > reflow_update/ directory. > >> On Oct 29, 2015, at 8:38 AM, Andrew Purtell <apurt...@apache.org> wrote: >> >> Can we remove the "fork me on GitHub banner"? We're not currently accepting >> pull requests. Remove this and I'll be +1. Until then -1, although >> otherwise it looks great. >> >> >>> On Wed, Oct 28, 2015 at 2:54 PM, Elliott Clark <ecl...@apache.org> wrote: >>> >>> Looks great with the white. +1 >>> >>> On Wed, Oct 28, 2015 at 2:52 PM, Misty Stanley-Jones < >>> mstanleyjo...@cloudera.com> wrote: >>> >>>> The grey background was inadvertent and has now been changed to white, if >>>> you refresh. >>>> >>>> Please click around and try the menus etc, as well. >>>> >>>> By the way, I know that the docs don't look great on a mobile phone, but >>>> that's a totally different issue to solve, not related to the Maven site >>>> styling. >>>> >>>>> On Thu, Oct 29, 2015 at 4:13 AM, Stack <st...@duboce.net> wrote: >>>>> >>>>> It looks lovely on a nexus (smile). >>>>> >>>>> Site looks good to me. Not sure about background light grey but all the >>>>> rest I like. >>>>> >>>>> St.Ack >>>>> >>>>> >>>>> >>>>> On Wed, Oct 28, 2015 at 11:08 AM, anil gupta <anilgupt...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Sample website does not looks good on Iphone6. Its content is >>>> unreadable >>>>>> since page layout is not using width of iphone screen. >>>>>> >>>>>> Thanks, >>>>>> Anil >>>>>> >>>>>> On Tue, Oct 27, 2015 at 6:29 PM, Misty Stanley-Jones < >>>>>> mstanleyjo...@cloudera.com> wrote: >>>>>> >>>>>>> If you looked right away, please look again. I didn't realize that >>> a >>>>>> weird >>>>>>> font was being used from Google Fonts, because it was not loading >>>>> locally >>>>>>> for me. That's been fixed now and a more normal readable font (in >>> my >>>>>>> opinion) is being used. >>>>>>> >>>>>>> On Wed, Oct 28, 2015 at 10:03 AM, Misty Stanley-Jones < >>>>>>> mstanleyjo...@cloudera.com> wrote: >>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> Here is another version for your consideration. Please check it >>> out >>>>> at >>>>>>>> different resolutions and browser sizes if you can. >>>>>>>> http://mstanleyjones.github.io/hbase/reflow_update/index.html >>>>>>>> >>>>>>>> If you go to >>>> http://mstanleyjones.github.io/hbase/reflow_update/dependency-info.html >>>>>>>> and a few other parts of the site, you will notice the built-in >>>>> syntax >>>>>>>> highlighting. >>>>>>>> >>>>>>>> This version does not have a site search, and I have no clue how >>> to >>>>> add >>>>>>>> the Hadoop site search, Stack. Maybe that can be a phase 2 where >>>>>> someone >>>>>>>> smarter can help me figure it out. >>>>>>>> >>>>>>>> Thanks for your help, >>>>>>>> Misty >>>>>>>> >>>>>>>> On Fri, Oct 23, 2015 at 3:17 PM, Misty Stanley-Jones < >>>>>>>> mstanleyjo...@cloudera.com> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> We are currently using the reFlow Maven site skin. I went >>> looking >>>>>> around >>>>>>>>> and found Fluido, which seems to be a bit more extensible. I >>> built >>>>> and >>>>>>>>> staged a version of the site at >>>>>>>>> http://mstanleyjones.github.io/hbase/index.html. Note the >>> Github >>>>>> ribbon >>>>>>>>> and the Google site search. I'm curious to know what you think. >>>>>>>>> >>>>>>>>> I also put the 0.94 docs menu as a submenu of the Documentation >>>>> menu, >>>>>> to >>>>>>>>> see how it looked. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Misty >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks & Regards, >>>>>> Anil Gupta >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White)
Re: start_replication command not available in hbase shell in HBase0.98
Hi Ashish, Sorry for such a late reply. We had a "-" in peer name so we ran into https://issues.apache.org/jira/browse/HBASE-11394. Thanks for offering help. ~Anil On Tue, Oct 13, 2015 at 8:40 PM, Ashish Singhi < ashish.singhi.apa...@gmail.com> wrote: > Hi Anil. > > I did not check this in 0.98. > By default when ever we add a peer, its state will be ENABLED. > > There is no child node for peer-state so its 'ls' output will be empty, you > can use ZK 'get' command to find its value but the output will not be in > human readable format. > > To check the peer-state value you can use zk_dump command in hbase shell or > from web UI. > > Did you find any errors in the RS logs for replication ? > > Regards, > Ashish Singhi > > On Wed, Oct 14, 2015 at 5:04 AM, anil gupta <anilgupt...@gmail.com> wrote: > > > I found that those command are deprecated as per this Jira: > > https://issues.apache.org/jira/browse/HBASE-8861 > > > > Still, after enabling peers the replication is not starting. We looked > into > > zk. Its peer state value is null/blank: > > zknode: ls /hbase-unsecure/replication/peers/prod-hbase/peer-state > > [] > > > > Can anyone tell me what is probably going on? > > > > On Tue, Oct 13, 2015 at 3:56 PM, anil gupta <anilgupt...@gmail.com> > wrote: > > > > > Hi All, > > > > > > I am using HBase 0.98(HDP2.2). > > > As per the documentation here: > > > > > > > > > http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html > > > > > > I am trying to run start_replication command. But, i m getting > following > > > error: > > > hbase(main):013:0> start_replication > > > NameError: undefined local variable or method `start_replication' for > > > # > > > > > > Is start_replication not a valid command in HBase0.98? If its > deprecated > > > then what is the alternate command? > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: Re: transfer data from hbase0.98 to hbase1.1.0 using exportSnapShot
Hi, As far as i know, export snapshot from 0.98 ->1.0 should work. Maybe, you can verify this by creating a test table, putting couple of rows in it, export a snapshot of that table, and clone exported snapshot on remote cluster. Thanks, Anil Gupta On Sat, Oct 17, 2015 at 12:30 AM, whodarewin2006 <whodarewin2...@126.com> wrote: > > > hi,Ted > I have read the web page you give,thanks a lot.But the page didn't > mention if we can use ExportSnapShot to transfer data between different > version of hbase(0.98.6->1.0.1.1),do you know this? > Thanks again! > > > > > > > > At 2015-10-15 23:06:04, "Ted Yu" <yuzhih...@gmail.com> wrote: > >See recent thread: http://search-hadoop.com/m/YGbbQfg0W1Onv5j > > > >On Thu, Oct 15, 2015 at 3:42 AM, whodarewin2006 <whodarewin2...@126.com> > >wrote: > > > >> sorry,the subject is wrong,we want to transfer data from hbase0.98.6 to > >> hbase 1.0.1.1 > >> > >> > >> > >> > >> > >> > >> > >> > >> At 2015-10-15 18:34:17, "whodarewin2006" <whodarewin2...@126.com> > wrote: > >> >hi, > >> >We upgrade our hbase cluster from hbase0.98.6 to hbase1.0.1.1,and > we > >> want to transfer our data from old cluster to new cluster using > >> ExportSnapshot,is this OK?Will this operation crash our new cluster down > >> cause different file format? > >> > -- Thanks & Regards, Anil Gupta
Export Snapshot to remote cluster and then Clone_Snapshot from exported data
Hi, I exported snapshot of a table to remote cluster. Now, i want to create table on remote cluster using that exported snapshot. I have done this around 2 years ago(on 0.94) but unfortunately, i dont remember steps now. I tried to search mailing list archive and HBase documentation but i can find steps to accomplish my task. Can anyone provide me the steps or point me to documentation? -- Thanks & Regards, Anil Gupta
Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data
I am using 0.98. I used that doc instructions to export the snapshot. What do you mean by not exporting it to correct directory? I am using HDP. Do you mean to that i just need to copy this exported in same directory structure as other snapshots? > On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com> wrote: > What version of hbase you are using ? What did you use to export snapshots > to remote cluster? Please take look > http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not exported > snapshots to correct directory. Check your hdfs directories to locate > snapshots. > > Regards > Samir > > On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > I dont see the snapshot when i run "list_snapshot" on destination > > cluster.(i checked that initially but forgot to mention in my post) > > Is it supposed to be listed in output of "list_snapshots" command? > > > > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com> > > wrote: > > > > > Hi, > > > Can you see snapshot on remote cluster? If you can see snapshot you can > > use > > > clone snapshot command from hbase shell to create table. > > > Regards > > > Samir > > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I exported snapshot of a table to remote cluster. Now, i want to create > > > > table on remote cluster using that exported snapshot. I have done this > > > > around 2 years ago(on 0.94) but unfortunately, i dont remember steps > > now. > > > > > > > > I tried to search mailing list archive and HBase documentation but i > > can > > > > find steps to accomplish my task. Can anyone provide me the steps or > > > point > > > > me to documentation? > > > > > > > > -- > > > > Thanks & Regards, > > > > Anil Gupta > > > > > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > -- Thanks & Regards, Anil Gupta
Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data
I dont see the snapshot when i run "list_snapshot" on destination cluster.(i checked that initially but forgot to mention in my post) Is it supposed to be listed in output of "list_snapshots" command? On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com> wrote: > Hi, > Can you see snapshot on remote cluster? If you can see snapshot you can use > clone snapshot command from hbase shell to create table. > Regards > Samir > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> wrote: > > > Hi, > > > > I exported snapshot of a table to remote cluster. Now, i want to create > > table on remote cluster using that exported snapshot. I have done this > > around 2 years ago(on 0.94) but unfortunately, i dont remember steps now. > > > > I tried to search mailing list archive and HBase documentation but i can > > find steps to accomplish my task. Can anyone provide me the steps or > point > > me to documentation? > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data
Hi Samir, You are right. But, HBase documentation didnt mention strict requirement of correct hbase directory. So, i have to do few more trials to come up with correct destination directory. As per my analysis, export directory should be . In cdh, rootdir is "/hbase" while in HDP, its "/apps/hbase/data". Hence, i ran into this problem. I am going to open documentation bug in HBase. Thanks for your help. Anil On Wed, Oct 14, 2015 at 1:27 PM, Samir Ahmic <ahmic.sa...@gmail.com> wrote: > If you exported snapshot with ExportSnapshot tool you shoud have "archive" > and ".hbase-snapshot" directories on destination cluster in > hbase.root.dir(usually /hbase). Inside ".hbase-snapshot" directory you > should see your snapshot. If your snapshot data is copied somewhere else > you will not see snapshots with list_snapshots command. Try to locate > snapshot directories on destination cluster and move data to correct > locations. > > Regards > Samir > > On Wed, Oct 14, 2015 at 9:10 PM, anil gupta <anilgupt...@gmail.com> wrote: > > > I am using 0.98. I used that doc instructions to export the snapshot. > What > > do you mean by not exporting it to correct directory? > > I am using HDP. Do you mean to that i just need to copy this exported in > > same directory structure as other snapshots? > > > > > On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com> > > wrote: > > > What version of hbase you are using ? What did you use to export > > snapshots > > > to remote cluster? Please take look > > > http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not > > exported > > > snapshots to correct directory. Check your hdfs directories to locate > > > snapshots. > > > > > > Regards > > > Samir > > > > > > On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com> > > wrote: > > > > > > > I dont see the snapshot when i run "list_snapshot" on destination > > > > cluster.(i checked that initially but forgot to mention in my post) > > > > Is it supposed to be listed in output of "list_snapshots" command? > > > > > > > > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com > > > > > > wrote: > > > > > > > > > Hi, > > > > > Can you see snapshot on remote cluster? If you can see snapshot you > > can > > > > use > > > > > clone snapshot command from hbase shell to create table. > > > > > Regards > > > > > Samir > > > > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I exported snapshot of a table to remote cluster. Now, i want to > > create > > > > > > table on remote cluster using that exported snapshot. I have done > > this > > > > > > around 2 years ago(on 0.94) but unfortunately, i dont remember > > steps > > > > now. > > > > > > > > > > > > I tried to search mailing list archive and HBase documentation > but > > i > > > > can > > > > > > find steps to accomplish my task. Can anyone provide me the steps > > or > > > > > point > > > > > > me to documentation? > > > > > > > > > > > > -- > > > > > > Thanks & Regards, > > > > > > Anil Gupta > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks & Regards, > > > > Anil Gupta > > > > > > > > > > > > -- > > Thanks & Regards, > > Anil Gupta > > > -- Thanks & Regards, Anil Gupta
Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data
Created this: https://issues.apache.org/jira/browse/HBASE-14612 On Wed, Oct 14, 2015 at 10:18 PM, anil gupta <anilgupt...@gmail.com> wrote: > Hi Samir, > > You are right. But, HBase documentation didnt mention strict requirement > of correct hbase directory. So, i have to do few more trials to come up > with correct destination directory. As per my analysis, export directory > should be . > > In cdh, rootdir is "/hbase" while in HDP, its "/apps/hbase/data". Hence, i > ran into this problem. > I am going to open documentation bug in HBase. > Thanks for your help. > Anil > > On Wed, Oct 14, 2015 at 1:27 PM, Samir Ahmic <ahmic.sa...@gmail.com> > wrote: > >> If you exported snapshot with ExportSnapshot tool you shoud have "archive" >> and ".hbase-snapshot" directories on destination cluster in >> hbase.root.dir(usually /hbase). Inside ".hbase-snapshot" directory you >> should see your snapshot. If your snapshot data is copied somewhere else >> you will not see snapshots with list_snapshots command. Try to locate >> snapshot directories on destination cluster and move data to correct >> locations. >> >> Regards >> Samir >> >> On Wed, Oct 14, 2015 at 9:10 PM, anil gupta <anilgupt...@gmail.com> >> wrote: >> >> > I am using 0.98. I used that doc instructions to export the snapshot. >> What >> > do you mean by not exporting it to correct directory? >> > I am using HDP. Do you mean to that i just need to copy this exported in >> > same directory structure as other snapshots? >> > >> > > On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com> >> > wrote: >> > > What version of hbase you are using ? What did you use to export >> > snapshots >> > > to remote cluster? Please take look >> > > http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not >> > exported >> > > snapshots to correct directory. Check your hdfs directories to locate >> > > snapshots. >> > > >> > > Regards >> > > Samir >> > > >> > > On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com> >> > wrote: >> > > >> > > > I dont see the snapshot when i run "list_snapshot" on destination >> > > > cluster.(i checked that initially but forgot to mention in my post) >> > > > Is it supposed to be listed in output of "list_snapshots" command? >> > > > >> > > > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic < >> ahmic.sa...@gmail.com> >> > > > wrote: >> > > > >> > > > > Hi, >> > > > > Can you see snapshot on remote cluster? If you can see snapshot >> you >> > can >> > > > use >> > > > > clone snapshot command from hbase shell to create table. >> > > > > Regards >> > > > > Samir >> > > > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> >> wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > I exported snapshot of a table to remote cluster. Now, i want to >> > create >> > > > > > table on remote cluster using that exported snapshot. I have >> done >> > this >> > > > > > around 2 years ago(on 0.94) but unfortunately, i dont remember >> > steps >> > > > now. >> > > > > > >> > > > > > I tried to search mailing list archive and HBase documentation >> but >> > i >> > > > can >> > > > > > find steps to accomplish my task. Can anyone provide me the >> steps >> > or >> > > > > point >> > > > > > me to documentation? >> > > > > > >> > > > > > -- >> > > > > > Thanks & Regards, >> > > > > > Anil Gupta >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Thanks & Regards, >> > > > Anil Gupta >> > > > >> > >> > >> > >> > -- >> > Thanks & Regards, >> > Anil Gupta >> > >> > > > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta
start_replication command not available in hbase shell in HBase0.98
Hi All, I am using HBase 0.98(HDP2.2). As per the documentation here: http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html I am trying to run start_replication command. But, i m getting following error: hbase(main):013:0> start_replication NameError: undefined local variable or method `start_replication' for # Is start_replication not a valid command in HBase0.98? If its deprecated then what is the alternate command? -- Thanks & Regards, Anil Gupta
Re: start_replication command not available in hbase shell in HBase0.98
I found that those command are deprecated as per this Jira: https://issues.apache.org/jira/browse/HBASE-8861 Still, after enabling peers the replication is not starting. We looked into zk. Its peer state value is null/blank: zknode: ls /hbase-unsecure/replication/peers/prod-hbase/peer-state [] Can anyone tell me what is probably going on? On Tue, Oct 13, 2015 at 3:56 PM, anil gupta <anilgupt...@gmail.com> wrote: > Hi All, > > I am using HBase 0.98(HDP2.2). > As per the documentation here: > > http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html > > I am trying to run start_replication command. But, i m getting following > error: > hbase(main):013:0> start_replication > NameError: undefined local variable or method `start_replication' for > # > > Is start_replication not a valid command in HBase0.98? If its deprecated > then what is the alternate command? > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta
Re: Does adding new columns cause compaction storm?
Hi Liren, In short, adding new columns will *not* trigger compaction. THanks, Anil Gupta On Sat, Oct 10, 2015 at 9:20 PM, Liren Ding <sky.gonna.bri...@gmail.com> wrote: > Thanks Ted. So far I don't see direct answer yet in any hbase books or > articles. all resources say that values are ordered by rowkey:cf:column, > but no one explains how new columns are stored after compaction. I think > after compaction the store files should still follow the same way to > organize data. So if a new column need to be added in all rows regularly, > the compaction might have to extra works I/O operations accordingly. Maybe > the schema design better to keep old data intact instead of keep adding new > columns into it. > > On Sat, Oct 10, 2015 at 7:55 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Please take a look at: > > > > http://hbase.apache.org/book.html#_compaction > > http://hbase.apache.org/book.html#exploringcompaction.policy > > > > > http://hbase.apache.org/book.html#compaction.ratiobasedcompactionpolicy.algorithm > > > > FYI > > > > On Sat, Oct 10, 2015 at 6:53 PM, Liren Ding <sky.gonna.bri...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am trying to design a schema for time series events data. The row key > > is > > > eventId, and event data is added into new "date" columns daily. So in a > > > query I only need to set filter on columns to find all data for > specified > > > events. The table should look like following: > > > > > > rowkey | 09-01-2015 | 09-02-2015 | .. > > > > > > eventid1 data11 data12 > > > eventid2 data21 data22 > > > eventid3 ..,.. > > > ... > > > > > > I know during compaction the data with same row key will be stored > > > together. So with this design, will new columns cause compaction storm? > > Or > > > any other issues? > > > Appreciate! > > > > > > -- Thanks & Regards, Anil Gupta
Re: alter column family - possible operational impacts on big tables
Hi Nicolas, For a table with 5k regions, it should not take more than 10 min for alter table operations. Also, in HBase 1.0+, alter table operations does not require disabling the table. So, you are encouraged to upgrade. Sent from my iPhone > On Oct 9, 2015, at 1:15 AM, Nicolae Marasoiu> wrote: > > Hi, > > Indeed, we have tables with 1-5000 regions, distributed on 10-15 RSs. > > A few hours are sufficient to do the alter one a single such table, right? > > Thanks, > Nicu > > > From: Jean-Marc Spaggiari > Sent: Thursday, October 8, 2015 10:19 PM > To: user > Subject: Re: alter column family - possible operational impacts on big tables > > Hi Nicu, > > Indeed, with 0.94 you have to disable the table before doing the alter. > However, for 30 regions, it should be pretty fast. When you say 30+, are > you talking about like 1K regions? Or more like 32? The alter will only > update the meta table, so not that much impact on the servers. And no > compactions required for that. The ttl will only take effect at the next > compaction by, as you said, filtering out more records. > > JM > > 2015-10-08 10:49 GMT-04:00 Nicolae Marasoiu : > >> Hi, >> >> >> If we run at night an alter column family, set ttl, my understanding is >> that it will disable the table, make the alter, and re-enable the table, >> which can be some time for large tables with 30+ regions (hbase version >> 0.94 [image: ☹] ). >> >> >> Do you have any advice about this? How long can it take per region? What >> is the operational hit at the time of the alter command being issued, and >> what when compaction runs on the table? I imagine that compaction is not >> too affected by this, just by filtering out more records when re-writing >> the new HFiles, is this correct? >> >> >> Thanks, >> >> Nicu >>
Re: Exporting a snapshot to external cluster
Hi Akmal, It will be better if you use name service value. You will not need to worry about which NN is active. I believe you can find that property in Hadoop's core-site.xml file. Sent from my iPhone On Sep 24, 2015, at 7:23 AM, Akmal Abbasovwrote: >> My suggestion is different. You should put remote NN HA configuration in >> hdfs-site.xml. > ok, in case I’ll put it, still how I can determine which of those 2 namenodes > is active? > >> On 24 Sep 2015, at 15:56, Serega Sheypak wrote: >> >> Have no Idea, some guys try to use "curl" to determine active NN. >> My suggestion is different. You should put remote NN HA configuration in >> hdfs-site.xml. >> >> 2015-09-24 14:33 GMT+02:00 Akmal Abbasov : >> add remote cluster HA configuration to your "local" hdfs client configuration >>> I am using the following command in script >>> $HBASE_PATH/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot >>> -snapshot snapshot-name -copy-to hdfs://remote_hbase_master/hbase >>> >>> In this case how I can know which namenode is active? >>> >>> Thanks! >>> > On 23 Sep 2015, at 12:14, Serega Sheypak wrote: > 1. to know which of the HDFS namenode is active add remote cluster HA configuration to your "local" hdfs client configuration > Afaik, it should be done through zookeeper, but through which API it >>> will be more convenient? no,no,no use hdfs-site.xml configuration. You need to add configuration for remote NN HA and your local hdfs client would correctly resolve active NN. 2015-09-23 11:32 GMT+02:00 Akmal Abbasov : > Hi all, > I would like to know the best practice when exporting a snapshot to >>> remote > hbase cluster with ha configuration. > My assumption is: > 1. to know which of the HDFS namenode is active > 2. export snapshot to active namenode > > Since I need to do this programmatically what is the best way to know > which namenode is active? > Afaik, it should be done through zookeeper, but through which API it >>> will > be more convenient? > > Thanks. >
Re: Hbase import/export change number of rows
How many rows are expected? Can you do sanity checking in your data to make sure there are no duplicate rowkeys? Sent from my iPhone > On Sep 22, 2015, at 8:35 AM, OM PARKASH Nain> wrote: > > I using two methods for row count: > > hbase shell: > > count "Table1" > > another is: > > hbase org.apache.hadoop.hbase.mapreduce.RowCounter "Table1" > > Both give same number of row but export have different number of rows. > > hbase org.apache.hadoop.hbase.mapreduce.Export "Table1" "hdfs path" > > > > > On Tue, Sep 22, 2015 at 5:33 PM, OM PARKASH Nain > wrote: > >> I am using Hbase export using command. >> >> hbase org.apache.hadoop.hbase.mapreduce.Export "Table1" "hdfs path" >> >> Then I use import command from HDFS to Hbase Table; >> >> hbase org.apache.hadoop.hbase.mapreduce.Import "hdfs path" "Table2" >> >> Then I count number of row in both tables, I found mismatch number of rows >> >> Table1:8301 Table2:8032 >> >> Please define what goes wrong with my system. >>
Re: Problem with HBase + Kerberos
ext(Unknown Source) >at sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown > Source) >at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source) >at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source) >... 19 more > 2015-08-31 10:15:27,911 WARN [regionserver60020] > regionserver.HRegionServer: reportForDuty failed; sleeping and then > retrying. > > Is there kind of an expiration limit for keytab credentials ? > Thanks for your help, > > > Loïc > > Loïc CHANEL > Engineering student at TELECOM Nancy > Trainee at Worldline - Villeurbanne > > 2015-08-27 18:24 GMT+02:00 anil gupta <anilgupt...@gmail.com>: > >> Maybe, this is related to some Ambari setup? Can you also ask on Ambari >> mailing list. >> IMO, secure HBase cluster connectivity has been working in HBase for a very >> long time. >> >> On Thu, Aug 27, 2015 at 12:48 AM, Loïc Chanel < >> loic.cha...@telecomnancy.net> >> wrote: >> >>> I did not, but as I Kerberized my cluster with Ambari, it did the >> mandatory >>> modifications. >>> >>> Loïc CHANEL >>> Engineering student at TELECOM Nancy >>> Trainee at Worldline - Villeurbanne >>> >>> 2015-08-27 1:17 GMT+02:00 Laurent H <laurent.hat...@gmail.com>: >>> >>>> Do you change some stuff in your hbase-site.xml when you've installed >>>> Kerberos ? >>>> >>>> -- >>>> Laurent HATIER - Consultant Big Data & Business Intelligence chez >>> CapGemini >>>> fr.linkedin.com/pub/laurent-hatier/25/36b/a86/ >>>> <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/> >>>> >>>> 2015-08-21 9:44 GMT+02:00 Loïc Chanel <loic.cha...@telecomnancy.net>: >>>> >>>>> Sorry if I didn't mention that, but yeah, I ran kinit before invoking >>>> hbase >>>>> shell, and klists command says that my user has a ticket. >>>>> [root@host /]# klist >>>>> Ticket cache: FILE:/tmp/krb5cc_0 >>>>> Default principal: testuser@REALM >>>>> >>>>> Valid starting ExpiresService principal >>>>> 08/21/15 09:39:33 08/22/15 09:39:33 krbtgt/REALM@REALM >>>>>renew until 08/21/15 09:39:33 >>>>> >>>>> >>>>> Loïc CHANEL >>>>> Engineering student at TELECOM Nancy >>>>> Trainee at Worldline - Villeurbanne >>>>> >>>>> 2015-08-21 6:12 GMT+02:00 anil gupta <anilgupt...@gmail.com>: >>>>> >>>>>> Did you run kinit command before invoking "hbase shell"? What does >>>> klist >>>>>> command says? >>>>>> >>>>>> On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel < >>>>> loic.cha...@telecomnancy.net >>>>>> wrote: >>>>>> >>>>>>> By the way, as this may help to find my issue, I just tested >> typing >>>>>> *whoami >>>>>>> *in HBase shell : this returned me exactly what it should : >>>>>>> testuser@REALM (auth:KERBEROS) >>>>>>>groups: nobody, toast >>>>>>> >>>>>>> Loïc CHANEL >>>>>>> Engineering student at TELECOM Nancy >>>>>>> Trainee at Worldline - Villeurbanne >>>>>>> >>>>>>> 2015-08-20 15:17 GMT+02:00 Loïc Chanel < >>> loic.cha...@telecomnancy.net >>>>> : >>>>>>> >>>>>>>> Nothing more with your option :/ >>>>>>>> >>>>>>>> Loïc CHANEL >>>>>>>> Engineering student at TELECOM Nancy >>>>>>>> Trainee at Worldline - Villeurbanne >>>>>>>> >>>>>>>> 2015-08-20 15:04 GMT+02:00 Loïc Chanel < >>>> loic.cha...@telecomnancy.net >>>>>> : >>>>>>>> >>>>>>>>> I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2. >>>>>>>>> I have unlimited strength JCE installed. >>>>>>>>> >>>>>>>>> I'll try to have more clues with this option. >>>>>>>>> >>>>>>>>> Loïc CHANEL >>>>>>>>> Engineering student at TELECOM Nancy >>>>>>>
Re: Problem with HBase + Kerberos
Maybe, this is related to some Ambari setup? Can you also ask on Ambari mailing list. IMO, secure HBase cluster connectivity has been working in HBase for a very long time. On Thu, Aug 27, 2015 at 12:48 AM, Loïc Chanel loic.cha...@telecomnancy.net wrote: I did not, but as I Kerberized my cluster with Ambari, it did the mandatory modifications. Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-27 1:17 GMT+02:00 Laurent H laurent.hat...@gmail.com: Do you change some stuff in your hbase-site.xml when you've installed Kerberos ? -- Laurent HATIER - Consultant Big Data Business Intelligence chez CapGemini fr.linkedin.com/pub/laurent-hatier/25/36b/a86/ http://fr.linkedin.com/pub/laurent-h/25/36b/a86/ 2015-08-21 9:44 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net: Sorry if I didn't mention that, but yeah, I ran kinit before invoking hbase shell, and klists command says that my user has a ticket. [root@host /]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: testuser@REALM Valid starting ExpiresService principal 08/21/15 09:39:33 08/22/15 09:39:33 krbtgt/REALM@REALM renew until 08/21/15 09:39:33 Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-21 6:12 GMT+02:00 anil gupta anilgupt...@gmail.com: Did you run kinit command before invoking hbase shell? What does klist command says? On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel loic.cha...@telecomnancy.net wrote: By the way, as this may help to find my issue, I just tested typing *whoami *in HBase shell : this returned me exactly what it should : testuser@REALM (auth:KERBEROS) groups: nobody, toast Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 15:17 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net : Nothing more with your option :/ Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 15:04 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net : I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2. I have unlimited strength JCE installed. I'll try to have more clues with this option. Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 14:58 GMT+02:00 Ted Yu yuzhih...@gmail.com: Which hbase / hadoop release are you using ? Running with -Dsun.security.krb5.debug=true will provide more clue. Do you have unlimited strength JCE installed ? Cheers On Thu, Aug 20, 2015 at 5:46 AM, Loïc Chanel loic.cha...@telecomnancy.net wrote: Hi all, Since I kerberized my cluster, it seems like I can't use HBase anymore ... For example, executing create 'toto','titi' on HBase shell results in the printing of this line endlessly : WARN [main] security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. And nothing else happens. I tried to restart HDFS and HBase, and to re-generate credentials and keytabs, but nothing changed. As for the logs, they are not very explicits, as the only thing they say (and keep saying) is : 2015-08-20 13:50:12,697 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Created SASL server with mechanism = GSSAPI 2015-08-20 13:50:12,698 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Have read input token of size 650 for processing by saslServer.evaluateResponse() 2015-08-20 13:50:12,704 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Will send token of size 108 from saslServer. 2015-08-20 13:50:12,706 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Have read input token of size 0 for processing by saslServer.evaluateResponse() 2015-08-20 13:50:12,707 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Will send token of size 32 from saslServer. 2015-08-20 13:50:12,708 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING client 192.168.6.148:43014 because read count=-1. Number of active connections: 3 Do anyone has an idea about where this might come from, or how to solve it ? Because I couldn't find much documentation about this. Thanks in advance for your help ! Loïc Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne
Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)
AFAIK, region movement does not moves the data of region on the (distributed)FileSystem. It should only, update metadata of HBase. Did you check diskio stats during region movement? On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu yuzhih...@gmail.com wrote: Please see http://hbase.apache.org/book.html#regions.arch.assignment On Tue, Aug 25, 2015 at 10:37 AM, donmai dood...@gmail.com wrote: NFS 0.98.10 Will get to you as soon as I am able, on travel Is my general understanding correct, though, that there shouldn't be any data movement from a region reassignment? On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu yuzhih...@gmail.com wrote: Can you give a bit more information: which filesystem you use which hbase release you use master log snippet for the long region assignment Thanks On Tue, Aug 25, 2015 at 9:30 AM, donmai dood...@gmail.com wrote: Hi, I'm curious about how exactly region movement works with regard to data transfer. To my understanding from the docs given an HDFS-backed cluster, a region movement / transition involves changing things in meta only, all data movement for locality is handled by HDFS. In the case where rootdir is a shared file system, there shouldn't be any data movement with a region reassignment, correct? I'm running into performance issues where region assignment takes a very long time and I'm trying to figure out why. Thanks! -- Thanks Regards, Anil Gupta
Re: Problem with HBase + Kerberos
Did you run kinit command before invoking hbase shell? What does klist command says? On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel loic.cha...@telecomnancy.net wrote: By the way, as this may help to find my issue, I just tested typing *whoami *in HBase shell : this returned me exactly what it should : testuser@REALM (auth:KERBEROS) groups: nobody, toast Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 15:17 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net: Nothing more with your option :/ Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 15:04 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net: I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2. I have unlimited strength JCE installed. I'll try to have more clues with this option. Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne 2015-08-20 14:58 GMT+02:00 Ted Yu yuzhih...@gmail.com: Which hbase / hadoop release are you using ? Running with -Dsun.security.krb5.debug=true will provide more clue. Do you have unlimited strength JCE installed ? Cheers On Thu, Aug 20, 2015 at 5:46 AM, Loïc Chanel loic.cha...@telecomnancy.net wrote: Hi all, Since I kerberized my cluster, it seems like I can't use HBase anymore ... For example, executing create 'toto','titi' on HBase shell results in the printing of this line endlessly : WARN [main] security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before. And nothing else happens. I tried to restart HDFS and HBase, and to re-generate credentials and keytabs, but nothing changed. As for the logs, they are not very explicits, as the only thing they say (and keep saying) is : 2015-08-20 13:50:12,697 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Created SASL server with mechanism = GSSAPI 2015-08-20 13:50:12,698 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Have read input token of size 650 for processing by saslServer.evaluateResponse() 2015-08-20 13:50:12,704 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Will send token of size 108 from saslServer. 2015-08-20 13:50:12,706 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Have read input token of size 0 for processing by saslServer.evaluateResponse() 2015-08-20 13:50:12,707 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: Will send token of size 32 from saslServer. 2015-08-20 13:50:12,708 DEBUG [RpcServer.reader=2,port=6] ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING client 192.168.6.148:43014 because read count=-1. Number of active connections: 3 Do anyone has an idea about where this might come from, or how to solve it ? Because I couldn't find much documentation about this. Thanks in advance for your help ! Loïc Loïc CHANEL Engineering student at TELECOM Nancy Trainee at Worldline - Villeurbanne -- Thanks Regards, Anil Gupta
Re: hbase doubts
: For #1, take a look at the following in hbase-default.xml : namehbase.client.keyvalue.maxsize/name value10485760/value For #2, it would be easier to answer if you can outline access patterns in your app. For #3, adjustment according to current region boundaries is done client side. Take a look at the javadoc for LoadQueueItem in LoadIncrementalHFiles.java Cheers On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora shushantaror...@gmail.com wrote: 1.Is there any max limit on key size of hbase table. 2.Is multiple small tables vs one large table which one is preferred. 3.for bulk load -when LoadIncremantalHfile is run it again recalculates the region splits based on region boundary - is this division happens on client side or server side again at region server or hbase master and then it assigns the splits which cross target region boundary to desired regionserver. -- Thanks Regards, Anil Gupta
Re: groupby(prefix(rowkey)) with multiple custom aggregated columns
Hi Nicu, Have you taken a look at Phoenix. It supports group by : https://phoenix.apache.org/language/index.html It will also provide you much more sql like querying on HBase. On Fri, Aug 7, 2015 at 2:19 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at hbase-client/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java which shows several other aggregations. BTW group by functionality would involve some more work since rows for the same group may span multiple regions. Cheers On Fri, Aug 7, 2015 at 9:42 AM, Nicolae Marasoiu nicolae.maras...@gmail.com wrote: Hi, I need to implement a limited sql like filter+group+order, and the group is on a fixed-length prefix of the rowkey (fixed per query), and the results are multiple metrics including some custom ones like statistical unique counts. I noticed that available tooling with coprocessors, like ColumnAggregationProtocol, involve just one metric e.g. one sum(column). We collect many, and of course it is more efficient to scan the data once. Please advise, Nicu -- Thanks Regards, Anil Gupta
Re: Disable Base64 encoding in Stargate request and Return as String
Thanks Andrew. I didnt to change behavior of hbase shell. I intend to provide an enhancement to HBase Rest while not impacting its default behavior. On Thu, Aug 6, 2015 at 5:29 PM, Andrew Purtell apurt...@apache.org wrote: returned from the shell Meant returned from the REST gateway. On Thu, Aug 6, 2015 at 5:28 PM, Andrew Purtell apurt...@apache.org wrote: Unfortunately we can't change the current set of representations are returned from the shell, that would be a backwards compatibility problem. We can however add new representations (selectable by way of the Accept header, e.g. Accept: text/plain). If you'd like to propose a patch we'd certainly look at it. Thanks. On Wed, Aug 5, 2015 at 12:51 AM, anil gupta anilgupt...@gmail.com wrote: Hi Andrew, Thanks for sharing your thoughts. Sorry for late reply as i recently came back from vacation. I understand that HBase stores byte arrays, so its hard for HBase to figure out the data type. What if, the client knows that all the columns in the Rest request are Strings. In that case, can we give the option of setting a request header StringDecoding:True. By default, we can assume StringDecoding: false. Just some food for thought. Also, if we can replicate the Encoding that we do in HBase Shell(where string are shown in readable format and we hex encode all binary data). That would be best. In this case, it would be really convenient use of Rest service rather than invoking hbase shell. Right now, IMO, due to lack of readability its only good to fetch images.(we store images in HBase) Provided my employer allows me to contribute, I am willing to work on this. Would HBase accept a patch? Thanks, Anil Gupta On Fri, Jul 17, 2015 at 4:57 PM, Andrew Purtell apurt...@apache.org wrote: The closest you can get to just a string is have your client use an accept header of Accept: application/octet-stream with making a query. This will return zero or one value in the response. If a value is present in the table at the requested location, the response body will be the unencoded bytes. If you've stored a string, you'll get back a string. If you've stored an image, you'll get back the raw image bytes. Note that using an accept header of application/octet-stream implicitly limits you to queries that only return zero or one values. (Strictly speaking, per the package doc: If binary encoding is requested, only one cell can be returned, the first to match the resource specification. The row, column, and timestamp associated with the cell will be transmitted in X headers: X-Row, X-Column, and X-Timestamp, respectively. Depending on the precision of the resource specification, some of the X-headers may be elided as redundant.) In general, the REST gateway supports several alternate encodings. See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html for some examples. Note that HBase cell data is binary , not string. It does not make sense to turn off base64 encoding for the default response encoding, XML, because that would produce invalid XML if a value happens to include non XML safe bytes . HBase can't know that in advance. We need to encode keys and values in a safe manner to avoid blowing up your client's XML. The same is roughly true for JSON. If your client sends an accept header of Accept: application/protobuf you'll get back a protobuf encoded object. Your client will need to be prepared to handle that representation. This is probably not what you want. Why are we even talking about using XML , JSON, or protobuf to encode responses? Because for many types of REST queries, HBase must return a structured response. The client has asked for more than simply one value, simply one string . The response must include key s , values , timestamps ; maybe a whole row 's worth of keys, values, and timestamps ; maybe multiple rows. It depends on the query you issued. (See the ' Cell or Row Query (Multiple Values) ' section in the package doc.) On Fri, Jul 17, 2015 at 2:20 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, We have a String Rowkey. We have String values of cells. Still, Stargate returns the data with Base64 encoding due to which a user cant read the data. Is there a way to disable Base64 encoding and then Rest request would just return Strings. -- Thanks Regards, Anil Gupta -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Thanks Regards, Anil Gupta
Re: Disable Base64 encoding in Stargate request and Return as String
Hi Andrew, Thanks for sharing your thoughts. Sorry for late reply as i recently came back from vacation. I understand that HBase stores byte arrays, so its hard for HBase to figure out the data type. What if, the client knows that all the columns in the Rest request are Strings. In that case, can we give the option of setting a request header StringDecoding:True. By default, we can assume StringDecoding: false. Just some food for thought. Also, if we can replicate the Encoding that we do in HBase Shell(where string are shown in readable format and we hex encode all binary data). That would be best. In this case, it would be really convenient use of Rest service rather than invoking hbase shell. Right now, IMO, due to lack of readability its only good to fetch images.(we store images in HBase) Provided my employer allows me to contribute, I am willing to work on this. Would HBase accept a patch? Thanks, Anil Gupta On Fri, Jul 17, 2015 at 4:57 PM, Andrew Purtell apurt...@apache.org wrote: The closest you can get to just a string is have your client use an accept header of Accept: application/octet-stream with making a query. This will return zero or one value in the response. If a value is present in the table at the requested location, the response body will be the unencoded bytes. If you've stored a string, you'll get back a string. If you've stored an image, you'll get back the raw image bytes. Note that using an accept header of application/octet-stream implicitly limits you to queries that only return zero or one values. (Strictly speaking, per the package doc: If binary encoding is requested, only one cell can be returned, the first to match the resource specification. The row, column, and timestamp associated with the cell will be transmitted in X headers: X-Row, X-Column, and X-Timestamp, respectively. Depending on the precision of the resource specification, some of the X-headers may be elided as redundant.) In general, the REST gateway supports several alternate encodings. See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html for some examples. Note that HBase cell data is binary , not string. It does not make sense to turn off base64 encoding for the default response encoding, XML, because that would produce invalid XML if a value happens to include non XML safe bytes . HBase can't know that in advance. We need to encode keys and values in a safe manner to avoid blowing up your client's XML. The same is roughly true for JSON. If your client sends an accept header of Accept: application/protobuf you'll get back a protobuf encoded object. Your client will need to be prepared to handle that representation. This is probably not what you want. Why are we even talking about using XML , JSON, or protobuf to encode responses? Because for many types of REST queries, HBase must return a structured response. The client has asked for more than simply one value, simply one string . The response must include key s , values , timestamps ; maybe a whole row 's worth of keys, values, and timestamps ; maybe multiple rows. It depends on the query you issued. (See the ' Cell or Row Query (Multiple Values) ' section in the package doc.) On Fri, Jul 17, 2015 at 2:20 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, We have a String Rowkey. We have String values of cells. Still, Stargate returns the data with Base64 encoding due to which a user cant read the data. Is there a way to disable Base64 encoding and then Rest request would just return Strings. -- Thanks Regards, Anil Gupta -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Thanks Regards, Anil Gupta
Re: [DISCUSS] Split up the book again?
Hi All, Since we are talking about HBase documentation. Is it possible to have docs for Specific versions. Right now, JavaDocs refer to 0.94 or HBase2.0. Its not convenient to look at 2.0 docs while working on 0.98 or 1.0. I hope this should not be super difficult to accomplish. Apache Kafka, ElasticSearch, and many other product make the docs available for all the currently supported versions. It would be nice if we can just change the version in this url: http://hbase.apache.org/hbase_version/apidocs/index.html and look at the docs. That's how many Apache TLP's do. On Thu, Jul 30, 2015 at 9:41 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: +1 too. Even if cleaner and nicer, searching in it is a pain compares to before. Le 2015-07-30 07:17, Shane O'Donnell sha...@knownormal.com a écrit : +1. One specific case where this is an issue is if you are entering the book with an anchor link. If you try this, it appears to just hang. Shane O. On Thu, Jul 30, 2015 at 10:07 AM, Stack st...@duboce.net wrote: On Thu, Jul 30, 2015 at 2:06 PM, Lars Francke lars.fran...@gmail.com wrote: While I like the new and better layout of the book it is painful to use - at least for me - because of its size. I've started to notice this too. It'd be sweet if it loaded more promptly. Thanks for starting the discussion. St.Ack -- Thanks Regards, Anil Gupta
Disable Base64 encoding in Stargate request and Return as String
Hi All, We have a String Rowkey. We have String values of cells. Still, Stargate returns the data with Base64 encoding due to which a user cant read the data. Is there a way to disable Base64 encoding and then Rest request would just return Strings. -- Thanks Regards, Anil Gupta
Re: HBase co-processor performance
Yes, If possible, try to denormalize data and reduce number of calls. Its ok to store some redundant data with each row due to denormalization. On Thu, Jul 16, 2015 at 6:18 AM, Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, Thanks for the inputs. As you said, it is better to change database design than moving this business logic to co-processors, and sorry for duplicate mail. I guess duplicate mail was in my mobile's outbox and after syncing mobile that mail was sent. Regards, Chandrash3khar Kotekar Mobile - +91 8600011455 On Wed, Jul 15, 2015 at 12:40 PM, anil gupta anilgupt...@gmail.com wrote: Using coprocessor to make calls to other Tables or remote Regions is an ANTI-PATTERN. It will create cyclic dependency between RS in your cluster. Coprocessors should be strictly used for operation on local Regions. Search mailing archives for more detailed discussion on this topic. How about denormalizing the data and then just doing ONE call? Now, this becomes more of a data modeling question. Thanks, Anil Gupta On Tue, Jul 14, 2015 at 11:39 PM, Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, REST APIs of my project make 2-3 calls to different tables in HBase. These calls are taking 10s of milli seconds to finish. I would like to know 1) If moving business logic to HBase co-processors and/or observer will improve performance? Idea is like to pass all the related information to HBase co-processors and/or observer, co-processor will make those 2-3 calls to different HBase tables and return result to the client. 2) I wonder if this approach will reduce time to finish or is it a bad approach? 3) If co-processor running on one region server fetches data from other region server then it will be same as tomcat server fetching that data from HBase region server. Isn't it? Regards, Chandrash3khar Kotekar Mobile - +91 8600011455 -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: HBase co-processor performance
Using coprocessor to make calls to other Tables or remote Regions is an ANTI-PATTERN. It will create cyclic dependency between RS in your cluster. Coprocessors should be strictly used for operation on local Regions. Search mailing archives for more detailed discussion on this topic. How about denormalizing the data and then just doing ONE call? Now, this becomes more of a data modeling question. Thanks, Anil Gupta On Tue, Jul 14, 2015 at 11:39 PM, Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, REST APIs of my project make 2-3 calls to different tables in HBase. These calls are taking 10s of milli seconds to finish. I would like to know 1) If moving business logic to HBase co-processors and/or observer will improve performance? Idea is like to pass all the related information to HBase co-processors and/or observer, co-processor will make those 2-3 calls to different HBase tables and return result to the client. 2) I wonder if this approach will reduce time to finish or is it a bad approach? 3) If co-processor running on one region server fetches data from other region server then it will be same as tomcat server fetching that data from HBase region server. Isn't it? Regards, Chandrash3khar Kotekar Mobile - +91 8600011455 -- Thanks Regards, Anil Gupta
Re: Performance of co-processor and observer while fetching data from other RS
I think this is a duplicate post. Please avoid posting same questions. Please use previous thread where I replied. Sent from my iPhone On Jul 14, 2015, at 11:17 PM, Chandrashekhar Kotekar shekhar.kote...@gmail.com wrote: Hi, REST APIs of my project make 2-3 calls to different tables in HBase. These calls are taking 10s of milli seconds to finish. I would like to know 1) If moving business logic to HBase co-processors and/or observer will improve performance? Idea is like to pass all the related information to HBase co-processors and/or observer, co-processor will make those 2-3 calls to different HBase tables and return result to the client. 2) I wonder if this approach will reduce time to finish or is it a bad approach? 3) If co-processor running on one region server fetches data from other region server then it will be same as tomcat server fetching that data from HBase region server. Isn't it? Regards, Chandrash3khar Kotekar Mobile - +91 8600011455
Re: HConnection thread waiting on blocking queue indefinitely
also facing the same issue that client connection thread is waiting at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200). Any help is appreciated. Regards, Praneesh -- Thanks Regards, Anil Gupta
Re: Fix Number of Regions per Node ?
Hi Rahul, I dont think, there is anything like that. But, you can effectively do that by setting Region size. However, if hardware configuration varies across the cluster, then this property would not be helpful because AFAIK, region size can be set on table basis only(not on node basis). It would be best to avoid having diff in hardware in cluster machines. Thanks, Anil Gupta On Wed, Jun 17, 2015 at 5:12 PM, rahul malviya malviyarahul2...@gmail.com wrote: Hi, Is it possible to configure HBase to have only fix number of regions per node per table in hbase. For example node1 serves 2 regions, node2 serves 3 regions etc for any table created ? Thanks, Rahul -- Thanks Regards, Anil Gupta
Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)
Thanks Stack. On Wed, Jun 10, 2015 at 8:06 AM, Stack st...@duboce.net wrote: On Mon, Jun 8, 2015 at 10:27 PM, anil gupta anilgupt...@gmail.com wrote: So, if we have to match against non-string data in hbase shell. We should always use double quotes? Double-quotes means the shell (ruby) will interpret and undo any escaping -- e..g. showing as hex -- of binary characters. What we emit on the shell is a combo of ruby escaping and our running all through https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toStringBinary(byte[]) first. If you type 'help' in the shell on the end we try to say this but could do a better job: If you are using binary keys or values and need to enter them in the shell, use double-quote'd hexadecimal representation. For example: hbase get 't1', key\x03\x3f\xcd hbase get 't1', key\003\023\011 hbase put 't1', test\xef\xff, 'f1:', \x01\x33\x40 St.Ack Even for matching values of cells? On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote: Double quotes allow you to do string interpolation. Aother difference (one pertinent to Anil's question) is that 'escape sequence' does not work using single quote. Cheers On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com wrote: Hi Jean, My bad. I gave a wrong illustration. This is the query is was trying on my composite key: hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '110\x00' , LIMIT=1} ROW COLUMN+CELL 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE, timestamp=1432899595317, value=SEDAN 0 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan. I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00 So, i was expecting to get that row. Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 , LIMIT=1} I dont really know what's the difference between single quotes and double quotes in startrow. Can anyone explain? Also, It would help others, if it can be documented somewhere. Thanks, Anil On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Anil, Can you please clarify what seems to be wrong for you? You asked for start row 33078. Which mean Rows starting with a 3, followed by a 3, a 0, etc. and the first row returned start with a 4 which is correct given the startrow you have specified. You seems to have a composite key. And you seems to scan without building the composite key. How have you created your table and what is your key design? JM 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com: Hi All, I m having a lot of trouble dealing with HBase shell. I am running following query: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1} ROW COLUMN+CELL 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT UTILITY 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLESLUG, timestamp=1430280906358, value=sport-utility 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358, value=\x01 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false, isTruck: false, isSedan: false, isCoupe: false, isSuv: true, isConvertible: false, isVan: false, isWagon: false, isEasyCareQualified: true} I specified, startRow='33078'. Then how come this result shows up? What's going over here? -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: Where can I find the apidoc for newer version of Hbase?
+1 on getting the docs of all current releases on HBase website. IMHO, It's not convenient to tell people to download stuff just to see docs. Especially, when you are trying to make people adopt/learn HBase(i have faced resistance from some of my colleagues on this.) I like that ElasticSearch website exposes this: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html It would be great if we can do something like this. ~Anil On Sun, Jun 14, 2015 at 8:37 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: version 1.0.1.1, but I'd like to read them online. -- Original -- From: Sean Busbey;bus...@cloudera.com; Send time: Sunday, Jun 14, 2015 9:55 AM To: useruser@hbase.apache.org; Subject: Re: Where can I find the apidoc for newer version of Hbase? What version are you looking for, specifically? If you download a binary artifact, it will have a copy of the javadocs for that version. If you download a source artifact, you can build the javadocs using the site maven goal. On Sat, Jun 13, 2015 at 8:33 PM, guxiaobo1982 guxiaobo1...@qq.com wrote: Hi, I can only find documentations for 0.94 version of Hbase at http://hbase.apache.org/0.94/apidocs/index.html, but where can I find the URL for newer version? Thanks -- Sean -- Thanks Regards, Anil Gupta
Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)
Yes, Lets say, from hbase shell i would like to filter( SingleColumnValueFilter) rows on basis of cell value that is stored as an Int. Lets assume the column name and value to be USER:AGE=5 On Tue, Jun 9, 2015 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote: bq. if we have to match against non-string data in hbase shell. We should always use double quotes? I think so. bq. Even for matching values of cells? Did you mean through use of some Filter ? Cheers On Mon, Jun 8, 2015 at 10:27 PM, anil gupta anilgupt...@gmail.com wrote: So, if we have to match against non-string data in hbase shell. We should always use double quotes? Even for matching values of cells? On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote: Double quotes allow you to do string interpolation. Aother difference (one pertinent to Anil's question) is that 'escape sequence' does not work using single quote. Cheers On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com wrote: Hi Jean, My bad. I gave a wrong illustration. This is the query is was trying on my composite key: hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '110\x00' , LIMIT=1} ROW COLUMN+CELL 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE, timestamp=1432899595317, value=SEDAN 0 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan. I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00 So, i was expecting to get that row. Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 , LIMIT=1} I dont really know what's the difference between single quotes and double quotes in startrow. Can anyone explain? Also, It would help others, if it can be documented somewhere. Thanks, Anil On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Anil, Can you please clarify what seems to be wrong for you? You asked for start row 33078. Which mean Rows starting with a 3, followed by a 3, a 0, etc. and the first row returned start with a 4 which is correct given the startrow you have specified. You seems to have a composite key. And you seems to scan without building the composite key. How have you created your table and what is your key design? JM 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com: Hi All, I m having a lot of trouble dealing with HBase shell. I am running following query: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1} ROW COLUMN+CELL 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT UTILITY 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLESLUG, timestamp=1430280906358, value=sport-utility 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358, value=\x01 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false, isTruck: false, isSedan: false, isCoupe: false, isSuv: true, isConvertible: false, isVan: false, isWagon: false, isEasyCareQualified: true} I specified, startRow='33078'. Then how come this result shows up? What's going over here? -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
HBase shell providing wrong results with startrow(with composite key having String and Ints)
Hi All, I m having a lot of trouble dealing with HBase shell. I am running following query: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1} ROW COLUMN+CELL 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT UTILITY 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLESLUG, timestamp=1430280906358, value=sport-utility 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358, value=\x01 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false, isTruck: false, isSedan: false, isCoupe: false, isSuv: true, isConvertible: false, isVan: false, isWagon: false, isEasyCareQualified: true} I specified, startRow='33078'. Then how come this result shows up? What's going over here? -- Thanks Regards, Anil Gupta
Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)
So, if we have to match against non-string data in hbase shell. We should always use double quotes? Even for matching values of cells? On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote: Double quotes allow you to do string interpolation. Aother difference (one pertinent to Anil's question) is that 'escape sequence' does not work using single quote. Cheers On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com wrote: Hi Jean, My bad. I gave a wrong illustration. This is the query is was trying on my composite key: hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '110\x00' , LIMIT=1} ROW COLUMN+CELL 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE, timestamp=1432899595317, value=SEDAN 0 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan. I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00 So, i was expecting to get that row. Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 , LIMIT=1} I dont really know what's the difference between single quotes and double quotes in startrow. Can anyone explain? Also, It would help others, if it can be documented somewhere. Thanks, Anil On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Anil, Can you please clarify what seems to be wrong for you? You asked for start row 33078. Which mean Rows starting with a 3, followed by a 3, a 0, etc. and the first row returned start with a 4 which is correct given the startrow you have specified. You seems to have a composite key. And you seems to scan without building the composite key. How have you created your table and what is your key design? JM 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com: Hi All, I m having a lot of trouble dealing with HBase shell. I am running following query: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1} ROW COLUMN+CELL 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT UTILITY 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLESLUG, timestamp=1430280906358, value=sport-utility 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358, value=\x01 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false, isTruck: false, isSedan: false, isCoupe: false, isSuv: true, isConvertible: false, isVan: false, isWagon: false, isEasyCareQualified: true} I specified, startRow='33078'. Then how come this result shows up? What's going over here? -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)
Hi Jean, My bad. I gave a wrong illustration. This is the query is was trying on my composite key: hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '110\x00' , LIMIT=1} ROW COLUMN+CELL 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE, timestamp=1432899595317, value=SEDAN 0 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan. I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00 So, i was expecting to get that row. Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 , LIMIT=1} I dont really know what's the difference between single quotes and double quotes in startrow. Can anyone explain? Also, It would help others, if it can be documented somewhere. Thanks, Anil On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Anil, Can you please clarify what seems to be wrong for you? You asked for start row 33078. Which mean Rows starting with a 3, followed by a 3, a 0, etc. and the first row returned start with a 4 which is correct given the startrow you have specified. You seems to have a composite key. And you seems to scan without building the composite key. How have you created your table and what is your key design? JM 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com: Hi All, I m having a lot of trouble dealing with HBase shell. I am running following query: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1} ROW COLUMN+CELL 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT UTILITY 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:BODYSTYLESLUG, timestamp=1430280906358, value=sport-utility 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358, value=\x01 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false, isTruck: false, isSedan: false, isCoupe: false, isSuv: true, isConvertible: false, isVan: false, isWagon: false, isEasyCareQualified: true} I specified, startRow='33078'. Then how come this result shows up? What's going over here? -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta
Re: Hbase vs Cassandra
Hey Ajay, Your topic of discussion of too broad. There are tons of comparison on HBase vs Cassandra: https://www.google.com/search?q=hbase+vs+cassandraie=utf-8oe=utf-8 Which one you should use, boils down to your use case? strong consistency? range scans? need deeper integration with hadoop ecosystem?,etc Please explain your use case and share your thoughts after doing some preliminary reading. Thanks, Anil Gupta On Fri, May 29, 2015 at 12:20 PM, Lukáš Vlček lukas.vl...@gmail.com wrote: As for the #4 you might be interested in reading https://aphyr.com/posts/294-call-me-maybe-cassandra Not sure if there is comparable article about HBase (anybody knows?) but it can give you another perspective about what else to keep an eye on regarding these systems. Regards, Lukas On Fri, May 29, 2015 at 9:12 PM, Ajay ajay.ga...@gmail.com wrote: Hi, I need some info on Hbase vs Cassandra as a data store (in general plus specific to time series data). The comparison in the following helps: 1: features 2: deployment and monitoring 3: performance 4: anything else Thanks Ajay -- Thanks Regards, Anil Gupta
Re: HBase failing to restart in single-user mode
20:39:19,224 INFO [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14d651aaec2, negotiated timeout = 400 2015-05-17 20:39:19,249 INFO [M:0;localhost:49807] regionserver.HRegionServer: ClusterId : 6ad7eddd-2886-4ff0-b377-a2ff42c8632f 2015-05-17 20:39:49,208 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Master not active after 30 seconds at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:194) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:197) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2002) I noticed that this has something to do with the ZooKeeper data. If I rm -rf $TMPDIR/hbase-tsuna/zookeeper then I can start HBase again. But of course HBase won’t work properly because while some tables exist on the filesystem, they no longer exist in ZK, etc. Does anybody know what could be left behind in ZK that could make it hang during startup? I looked at a jstack output while it was paused during 30s and didn’t find anything noteworthy. -- Benoit tsuna Sigoure -- Benoit tsuna Sigoure -- Thanks Regards, Anil Gupta
Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)
to separate them enough to not cause an issue. Thankfully we have not moved to secure HBase yet. That's actually on the to-do list, but hoping to do it *after* the CDH upgrade. --- Thanks again guys. I'm expecting this will be a drawn out process considering our scope, but will be happy to keep updates here as I proceed. On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez este...@cloudera.com wrote: Just to a little bit to what StAck said: -- Cloudera, Inc. On Tue, May 5, 2015 at 3:53 PM, Stack st...@duboce.net wrote: On Tue, May 5, 2015 at 8:58 AM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: Hello, I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have 6 production hbase clusters, 2 hadoop clusters, and hundreds of APIs/daemons/crons/etc hitting all of these things. Many of these clients hit multiple clusters in the same process. Daunting to say the least. Nod. We can't take full downtime on any of these, though we can take read-only. And ideally we could take read-only on each cluster in a staggered fashion. From a client perspective, all of our code currently assumes an HTableInterface, which gives me some wiggle room I think. With that in mind, here's my current plan: You've done a review of HTI in 1.0 vs 0.94 to make sure we've not mistakenly dropped anything you need? (I see that stuff has moved around but HTI should have everything still from 0.94) - Shade CDH5 to something like org.apache.hadoop.cdh5.hbase. - Create a shim implementation of HTableInterface. This shim would delegate to either the old cdh4 APIs or the new shaded CDH5 classes, depending on the cluster being talked to. - Once the shim is in place across all clients, I will put each cluster into read-only (a client side config of ours), migrate data to a new CDH5 cluster, then bounce affected services so they look there instead. I will do this for each cluster in sequence. Sounds like you have experience copying tables in background in a manner that minimally impinges serving given you have dev'd your own in-house cluster cloning tools? You will use the time while tables are read-only to 'catch-up' the difference between the last table copy and data that has come in since? This provides a great rollback strategy, and with our existing in-house cluster cloning tools we can minimize the read-only window to a few minutes if all goes well. There are a couple gotchas I can think of with the shim, which I'm hoping some of you might have ideas/opinions on: 1) Since protobufs are used for communication, we will have to avoid shading those particular classes as they need to match the package/classnames on the server side. I think this should be fine, as these are net-new, not conflicting with CDH4 artifacts. Any additions/concerns here? CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5? If your clients are interacting with HDFS then you need to go the route of shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0 use PBs in the RPC protocol and it shouldn't be any problem as long as you don't need security (this is mostly because the client does a UGI in the client and its easy to patch on both 0.94 and 1.0 to avoid to call UGI). Another option is to move your application to asynchbase and it should be clever enough to handle both HBase versions. I myself have little experience going a shading route so have little to contribute. Can you 'talk out loud' as you try stuff Bryan and if we can't help highlevel, perhaps we can help on specifics. St.Ack cheers, esteban. -- Thanks Regards, Anil Gupta
Re: MR against snapshot causes High CPU usage on Datanodes
Inline. On Wed, May 13, 2015 at 10:31 AM, rahul malviya malviyarahul2...@gmail.com wrote: *How many mapper/reducers are running per node for this job?* I am running 7-8 mappers per node. The spike is seen in mapper phase so no reducers where running at that point of time. *Also how many mappers are running as data local mappers?* How to determine this ? On the counter web page of your job. Look for Data-local map tasks counter. * You load/data equally distributed?* Yes as we use presplit hash keys in our hbase cluster and data is pretty evenly distributed. Thanks, Rahul On Wed, May 13, 2015 at 10:25 AM, Anil Gupta anilgupt...@gmail.com wrote: How many mapper/reducers are running per node for this job? Also how many mappers are running as data local mappers? You load/data equally distributed? Your disk, cpu ratio looks ok. Sent from my iPhone On May 13, 2015, at 10:12 AM, rahul malviya malviyarahul2...@gmail.com wrote: *The High CPU may be WAIT IOs, which would mean that you’re cpu is waiting for reads from the local disks.* Yes I think thats what is going on but I am trying to understand why it happens only in case of snapshot MR but if I run the same job without using snapshot everything is normal. What is the difference in snapshot version which can cause such a spike ? I looking through the code for snapshot version if I can find something. cores / disks == 24 / 12 or 40 / 12. We are using 10K sata drives on our datanodes. Rahul On Wed, May 13, 2015 at 10:00 AM, Michael Segel michael_se...@hotmail.com wrote: Without knowing your exact configuration… The High CPU may be WAIT IOs, which would mean that you’re cpu is waiting for reads from the local disks. What’s the ratio of cores (physical) to disks? What type of disks are you using? That’s going to be the most likely culprit. On May 13, 2015, at 11:41 AM, rahul malviya malviyarahul2...@gmail.com wrote: Yes. On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com wrote: Have you enabled short circuit read ? Cheers On Wed, May 13, 2015 at 9:37 AM, rahul malviya malviyarahul2...@gmail.com wrote: Hi, I have recently started running MR on hbase snapshots but when the MR is running there is pretty high CPU usage on datanodes and I start seeing IO wait message in datanode logs and as soon I kill the MR on Snapshot everything come back to normal. What could be causing this ? I am running cdh5.2.0 distribution. Thanks, Rahul -- Thanks Regards, Anil Gupta
Re: MR against snapshot causes High CPU usage on Datanodes
How many mapper/reducers are running per node for this job? Also how many mappers are running as data local mappers? You load/data equally distributed? Your disk, cpu ratio looks ok. Sent from my iPhone On May 13, 2015, at 10:12 AM, rahul malviya malviyarahul2...@gmail.com wrote: *The High CPU may be WAIT IOs, which would mean that you’re cpu is waiting for reads from the local disks.* Yes I think thats what is going on but I am trying to understand why it happens only in case of snapshot MR but if I run the same job without using snapshot everything is normal. What is the difference in snapshot version which can cause such a spike ? I looking through the code for snapshot version if I can find something. cores / disks == 24 / 12 or 40 / 12. We are using 10K sata drives on our datanodes. Rahul On Wed, May 13, 2015 at 10:00 AM, Michael Segel michael_se...@hotmail.com wrote: Without knowing your exact configuration… The High CPU may be WAIT IOs, which would mean that you’re cpu is waiting for reads from the local disks. What’s the ratio of cores (physical) to disks? What type of disks are you using? That’s going to be the most likely culprit. On May 13, 2015, at 11:41 AM, rahul malviya malviyarahul2...@gmail.com wrote: Yes. On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com wrote: Have you enabled short circuit read ? Cheers On Wed, May 13, 2015 at 9:37 AM, rahul malviya malviyarahul2...@gmail.com wrote: Hi, I have recently started running MR on hbase snapshots but when the MR is running there is pretty high CPU usage on datanodes and I start seeing IO wait message in datanode logs and as soon I kill the MR on Snapshot everything come back to normal. What could be causing this ? I am running cdh5.2.0 distribution. Thanks, Rahul
Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.
time on 2015-05-06 as to whether we should release these bits as HBase 1.1.0. Thanks, Nick -- Thanks Regards, Anil Gupta
hbase.apache.org homepage looks weird on Chrome and Firefox
Hi, I am aware that recently there were some updates done on HBase website. For last few months, more often than not, the homepage is displayed in weird way in chrome and firefox. Is there a bug on homepage that is leading to this view: https://www.dropbox.com/s/jcpfnu4jwim28zg/Screen%20Shot%202015-04-15%20at%2011.18.46%20PM.png?dl=0 https://www.dropbox.com/s/o7xminppnzll6x7/Screen%20Shot%202015-04-15%20at%2011.19.55%20PM.png?dl=0 IMO, if the homepage looks broken then its hard to proceed ahead to read the docs. My two cents. Also, it would be nice if we could move docs of startgate from here: https://wiki.apache.org/hadoop/Hbase/Stargate to hbase.apache.org. -- Thanks Regards, Anil Gupta
Re: hbase.apache.org homepage looks weird on Chrome and Firefox
In chrome, i did Clear Browsing Data and then revisited http://hbase.apache.org/;. It came up properly. Thanks for the pointer, Nick. On Thu, Apr 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org wrote: Looks fine for me, Chrome and Firefox tested. As Nick says Looks like the CSS asset didn't load at Anil's location for whatever reason. On Thu, Apr 16, 2015 at 8:36 AM, Stack st...@duboce.net wrote: Are others running into the issue Anil sees? Thanks, St.Ack On Thu, Apr 16, 2015 at 8:13 AM, anil gupta anilgupt...@gmail.com wrote: Chrome: Version 42.0.2311.90 (64-bit) on Mac But, firefox(34.0.5) also displays the page in same way. On Thu, Apr 16, 2015 at 12:58 AM, Ted Yu yuzhih...@gmail.com wrote: Which Chrome version do you use ? I use 41.0.2272.104 (64-bit) (on Mac) and the page renders fine. Cheers On Wed, Apr 15, 2015 at 11:27 PM, anil gupta anilgupt...@gmail.com wrote: Hi, I am aware that recently there were some updates done on HBase website. For last few months, more often than not, the homepage is displayed in weird way in chrome and firefox. Is there a bug on homepage that is leading to this view: https://www.dropbox.com/s/jcpfnu4jwim28zg/Screen%20Shot%202015-04-15%20at%2011.18.46%20PM.png?dl=0 https://www.dropbox.com/s/o7xminppnzll6x7/Screen%20Shot%202015-04-15%20at%2011.19.55%20PM.png?dl=0 IMO, if the homepage looks broken then its hard to proceed ahead to read the docs. My two cents. Also, it would be nice if we could move docs of startgate from here: https://wiki.apache.org/hadoop/Hbase/Stargate to hbase.apache.org. -- Thanks Regards, Anil Gupta -- Thanks Regards, Anil Gupta -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) -- Thanks Regards, Anil Gupta
Getting binary data from HBase rest server without Base64 encoding
Hi All, I want to fetch an image file from HBase using its Rest server. Right now, i get a xml where the image byte array(cell value) is Base64 encoded. I need to decode Base64 to Binary to view the image. Is there are way where we can ask rest server not to perform base64 encoding and just return the Cell value(ie: the image file) If its not there, and we were to do it. what kind of effort it would take? Any pointers to code that i would need to modify would be appreciated. -- Thanks Regards, Anil Gupta