from:"anil gupta"

Re: Too many connections from / - max is 60

2020-06-03 Thread anil gupta

Thanks for sharing insights. Moving hbase mailing list to cc.
Sorry, forgot to mention that we are using Phoenix4.7(HDP 2.6.3). This
cluster is mostly being queried via Phoenix apart from few pure NoSql cases
that uses raw HBase api's.

I looked further into zk logs and found that only 6/15 RS are running into
max connection problems(no other ip/hosts of our client apps were found)
constantly. One of those RS is getting 3-4x the connections errors as
compared to others, this RS is hosting hbase:meta
<http://ip-10-74-10-228.us-west-2.compute.internal:16030/region.jsp?name=1588230740>,
regions of phoenix secondary indexes and region of Phoenix and HBase
tables. I also looked into other 5 RS that are getting max connection
errors, for me nothing really stands out since all of them are hosting
regions of phoenix secondary indexes and region of Phoenix and HBase tables.

I also tried to run netstat and tcpdump on zk host to find out anomaly but
couldn't find anything apart from above mentioned analysis. Also ran hbck
and it reported that things are fine. I am still unable to pin point exact
problem(maybe something with phoenix secondary index?). Any other pointer
to further debug the problem will be appreciated.

Lastly, I constantly see following zk connection loss logs in above
mentioned 6 RS:

*2020-06-03 06:40:30,859 WARN

[RpcServer.FifoWFPBQ.default.handler=123,queue=3,port=16020-SendThread(ip-10-74-0-120.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Session 0x0 for server
ip-10-74-0-120.us-west-2.compute.internal/10.74.0.120:2181
<http://10.74.0.120:2181>, unexpected error, closing socket connection and
attempting reconnectjava.io.IOException: Connection reset by peerat
sun.nio.ch.FileDispatcherImpl.read0(Native Method)at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)at
sun.nio.ch.IOUtil.read(IOUtil.java:192)at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
  at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
  at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)2020-06-03
06:40:30,861 INFO

[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>. Will not attempt to authenticate using SASL
(unknown error)2020-06-03 06:40:30,861 INFO

[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Socket connection established, initiating session,
client: /10.74.10.228:60012 <http://10.74.10.228:60012>, server:
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>2020-06-03 06:40:30,861 WARN

[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Session 0x0 for server
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>, unexpected error, closing socket connection and
attempting reconnectjava.io.IOException: Connection reset by peerat
sun.nio.ch.FileDispatcherImpl.read0(Native Method)at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)at
sun.nio.ch.IOUtil.read(IOUtil.java:192)at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
  at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
  at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)*

Thanks!

On Tue, Jun 2, 2020 at 6:57 AM Josh Elser  wrote:

> HBase (daemons) try to use a single connection for themselves. A RS also
> does not need to mutate state in ZK to handle things like gets and puts.
>
> Phoenix is probably the thing you need to look at more closely
> (especially if you're using an old version of Phoenix that matches the
> old HBase 1.1 version). Internally, Phoenix acts like an HBase client
> which results in a new ZK connection. There have certainly been bugs
> like that in the past (speaking generally, not specifically).
>
> On 6/1/20 5:59 PM, anil gupta wrote:
> > Hi Folks,
> >
> > We are running in HBase problems due to hitting the limit of ZK
> > connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x on I3en
> ec2
> > instance type in AWS. Almost all our Region server are listed in zk logs
> > with "Too many connec

Too many connections from / - max is 60

2020-06-01 Thread anil gupta

Hi Folks,

We are running in HBase problems due to hitting the limit of ZK
connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x on I3en ec2
instance type in AWS. Almost all our Region server are listed in zk logs
with "Too many connections from / - max is 60".
2020-06-01 21:42:08,375 - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from
/ - max is 60

 On a average each RegionServer has ~250 regions. We are also running
Phoenix on this cluster. Most of the queries are short range scans but
sometimes we are doing full table scans too.

  It seems like one of the simple fix is to increase maxClientCnxns
property in zoo.cfg to 300, 500, 700, etc. I will probably do that. But, i
am just curious to know In what scenarios these connections are
created/used(Scans/Puts/Delete or during other RegionServer operations)?
Are these also created by hbase clients/apps(my guess is NO)? How can i
calculate optimal value of maxClientCnxns for my cluster/usage?

-- 
Thanks & Regards,
Anil Gupta

Re: Does Dropping the Source HBase Table Affect Its Snapshots and Cloned Tables from Snapshots?

2018-11-28 Thread Anil Gupta

Cloned table and snapshots should not have any impact if you drop source table. 

Sent from my iPhone

> On Nov 28, 2018, at 5:23 PM, William Shen  wrote:
> 
> Hi,
> 
> I understand that changes made to the tables cloned using snapshot will not
> affect the snapshot nor the source data table the snapshot is based on.
> However, I could not find information on whether or not a snapshot or a
> cloned table will be affected by the source table getting dropped. Can
> someone chime in on the HBase behavior in this case?
> 
> Thank you!

Re: question on reducing number of versions

2018-08-26 Thread Anil Gupta

You should see a smaller t2 after major compaction if your table actually had 
versions over 18k.(as Ted mentioned)

Sent from my iPhone

> On Aug 26, 2018, at 5:20 PM, Ted Yu  wrote:
> 
> This depends on how far down you revise the max versions for table t2.
> If your data normally only reaches 15000 versions and you lower max
> versions to ~15000, there wouldn't be much saving.
> 
> FYI
> 
>> On Sun, Aug 26, 2018 at 3:52 PM Antonio Si  wrote:
>> 
>> Thanks Anil.
>> 
>> We are using hbase on s3. Yes, I understand 18000 is very high. We are in
>> the process of reducing it.
>> 
>> If I have a snapshot and I restore the table from this snapshot. Let's call
>> this table t1.
>> I then clone another table from the same snapshot, call it t2.
>> 
>> If I reduce the max versions of t2 and run a major compaction on t2, will I
>> see the decrease in table size for t2? If I compare the size of t2 and t1,
>> I should see a smaller size for t2?
>> 
>> Thanks.
>> 
>> Antonio.
>> 
>>> On Sun, Aug 26, 2018 at 3:33 PM Anil Gupta  wrote:
>>> 
>>> You will need to do major compaction on table for the table to
>>> clean/delete up extra version.
>>> Btw, 18000 max version is a unusually high value.
>>> 
>>> Are you using hbase on s3 or hbase on hdfs?
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Aug 26, 2018, at 2:34 PM, Antonio Si  wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I have a hbase table whose definition has a max number of versions set
>> to
>>>> 36000.
>>>> I have verified that there are rows which have more than 2 versions
>>>> saved.
>>>> 
>>>> Now, I change the definition of the table and reduce the max number of
>>>> versions to 18000. Will I see the size of the table being reduced as I
>> am
>>>> not seeing that?
>>>> 
>>>> Also, after I reduce the max number of versions, I try to create a
>>>> snapshot, but I am getting a
>>>> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo
>>>> 
>>>> del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404;
>>>> Error Code: 404 Not Found;
>>>> 
>>>> 
>>>> What may be the cause of that?
>>>> 
>>>> I am using s3 as my storage.
>>>> 
>>>> 
>>>> Thanks in advance for your suggestions.
>>>> 
>>>> 
>>>> Antonio.
>>> 
>>

Re: question on reducing number of versions

2018-08-26 Thread Anil Gupta

You will need to do major compaction on table for the table to clean/delete up 
extra version. 
Btw, 18000 max version is a unusually high value. 

Are you using hbase on s3 or hbase on hdfs? 

Sent from my iPhone

> On Aug 26, 2018, at 2:34 PM, Antonio Si  wrote:
> 
> Hello,
> 
> I have a hbase table whose definition has a max number of versions set to
> 36000.
> I have verified that there are rows which have more than 2 versions
> saved.
> 
> Now, I change the definition of the table and reduce the max number of
> versions to 18000. Will I see the size of the table being reduced as I am
> not seeing that?
> 
> Also, after I reduce the max number of versions, I try to create a
> snapshot, but I am getting a
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.mo
> 
> del.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404;
> Error Code: 404 Not Found;
> 
> 
> What may be the cause of that?
> 
> I am using s3 as my storage.
> 
> 
> Thanks in advance for your suggestions.
> 
> 
> Antonio.

Re: Unable to read from Kerberised HBase

2018-07-10 Thread anil gupta

As per error message, your scan ran for more than 1 minute but the timeout
is set for 1 minute. Hence the error. Try doing smaller scans or increasing
timeout.(PS: HBase is mostly good for short scan not for full table scans.)

On Mon, Jul 9, 2018 at 8:37 PM, Lalit Jadhav 
wrote:

> While connecting to remote HBase cluster, I can create Table and get Table
> Listing.  But unable to scan Table using Java API. Below is code
>
> configuration.set("hbase.zookeeper.quorum", "QUARAM");
> configuration.set("hbase.master", "MASTER");
> configuration.set("hbase.zookeeper.property.clientPort", "2181");
> configuration.set("hadoop.security.authentication", "kerberos");
> configuration.set("hbase.security.authentication", "kerberos");
> configuration.set("zookeeper.znode.parent", "/hbase-secure");
> configuration.set("hbase.cluster.distributed", "true");
> configuration.set("hbase.rpc.protection", "authentication");
> configuration.set("hbase.regionserver.kerberos.principal",
> "hbase/Principal@realm");
> configuration.set("hbase.regionserver.keytab.file",
> "/home/developers/Desktop/hbase.service.keytab3");
> configuration.set("hbase.master.kerberos.principal",
> "hbase/HbasePrincipal@realm");
> configuration.set("hbase.master.keytab.file",
> "/etc/security/keytabs/hbase.service.keytab");
>
> System.setProperty("java.security.krb5.conf","/etc/krb5.conf");
>
> String principal = System.getProperty("kerberosPrincipal",
> "hbase/HbasePrincipal@realm");
> String keytabLocation = System.getProperty("kerberosKeytab",
> "/etc/security/keytabs/hbase.service.keytab");
> UserGroupInformation.setconfiguration(configuration);
> UserGroupInformation.loginUserFromKeytab(principal, keytabLocation);
> UserGroupInformation userGroupInformation =
> UserGroupInformation.loginUserFromKeytabAndReturnUG
> I("hbase/HbasePrincipal@realm",
> "/etc/security/keytabs/hbase.service.keytab");
> UserGroupInformation.setLoginUser(userGroupInformation);
>
> I am getting bellow errors,
>
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions: Mon Jul 09 18:45:57 IST 2018, null,
> java.net.SocketTimeoutException: callTimeout=6, callDuration=64965:
> row
> '' on table 'DEMO_TABLE' at
> region=DEMO_TABLE,,1529819280641.40f0e7dc4159937619da237915be8b11.,
> hostname=dn1-devup.mstorm.com,60020,1531051433899, seqNum=526190
>
> Exception : java.io.IOException: Failed to get result within timeout,
> timeout=6ms
>
>
> --
> Regards,
> Lalit Jadhav
> Network Component Private Limited.
>



-- 
Thanks & Regards,
Anil Gupta

Re: can not write to HBase

2018-05-22 Thread Anil Gupta

It seems you might have a write hotspot. 
Are your writes evenly distributed across the cluster? Do you have more than 
15-20 regions for that table?

Sent from my iPhone

> On May 22, 2018, at 9:52 PM, Kang Minwoo  wrote:
> 
> I think hbase flush is too slow.
> so memstore reached upper limit.
> 
> flush took about 30min.
> I don't know why flush is too long.
> 
> Best regards,
> Minwoo Kang
> 
> 
> 보낸 사람: 张铎(Duo Zhang) 
> 보낸 날짜: 2018년 5월 23일 수요일 11:37
> 받는 사람: hbase-user
> 제목: Re: can not write to HBase
> 
> org.apache.hadoop.hbase.RegionTooBusyException:
> org.apache.hadoop.hbase.RegionTooBusyException:
> Above memstore limit, regionName={region}, server={server},
> memstoreSize=2600502128, blockingMemStoreSize=2600468480
> 
> This means that you're writing too fast and memstore has reached its upper
> limit. Is the flush and compaction fine at RS side?
> 
> 2018-05-23 10:20 GMT+08:00 Kang Minwoo :
> 
>> attach client exception and stacktrace.
>> 
>> I've looked more.
>> It seems to be the reason why it takes 1290 seconds to flush in the Region
>> Server.
>> 
>> 2018-05-23T07:24:31.202 [INFO] Call exception, tries=34, retries=35,
>> started=513393 ms ago, cancelled=false, msg=row '{row}' on table '{table}'
>> at region={region}, hostname={host}, seqNum=155455658
>> 2018-05-23T07:24:31.208 [ERROR]
>> java.lang.RuntimeException: com.google.protobuf.ServiceException: Error
>> calling method MultiRowMutationService.MutateRows
>>at com.google.common.base.Throwables.propagate(Throwables.java:160)
>> ~[stormjar.jar:?]
>>at ...
>>at org.apache.storm.daemon.executor$fn__8058$tuple_
>> action_fn__8060.invoke(executor.clj:731) [storm-core-1.0.2.jar:1.0.2]
>>at 
>> org.apache.storm.daemon.executor$mk_task_receiver$fn__7979.invoke(executor.clj:464)
>> [storm-core-1.0.2.jar:1.0.2]
>>at 
>> org.apache.storm.disruptor$clojure_handler$reify__7492.onEvent(disruptor.clj:40)
>> [storm-core-1.0.2.jar:1.0.2]
>>at 
>> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:451)
>> [storm-core-1.0.2.jar:1.0.2]
>>at org.apache.storm.utils.DisruptorQueue.
>> consumeBatchWhenAvailable(DisruptorQueue.java:430)
>> [storm-core-1.0.2.jar:1.0.2]
>>at 
>> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
>> [storm-core-1.0.2.jar:1.0.2]
>>at 
>> org.apache.storm.daemon.executor$fn__8058$fn__8071$fn__8124.invoke(executor.clj:850)
>> [storm-core-1.0.2.jar:1.0.2]
>>at org.apache.storm.util$async_loop$fn__624.invoke(util.clj:484)
>> [storm-core-1.0.2.jar:1.0.2]
>>at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
>>at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
>> Caused by: com.google.protobuf.ServiceException: Error calling method
>> MultiRowMutationService.MutateRows
>>at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.
>> callBlockingMethod(CoprocessorRpcChannel.java:75) ~[stormjar.jar:?]
>>at org.apache.hadoop.hbase.protobuf.generated.
>> MultiRowMutationProtos$MultiRowMutationService$BlockingStub.mutateRows(
>> MultiRowMutationProtos.java:2149) ~[stormjar.jar:?]
>>at ...
>>... 13 more
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Failed after attempts=35, exceptions:
>> Wed May 23 07:15:57 KST 2018, 
>> RpcRetryingCaller{globalStartTime=1527027357808,
>> pause=100, retries=35}, org.apache.hadoop.hbase.RegionTooBusyException:
>> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit,
>> regionName={region}, server={server}, memstoreSize=2600502128,
>> blockingMemStoreSize=2600468480
>>at org.apache.hadoop.hbase.regionserver.HRegion.
>> checkResources(HRegion.java:3649)
>>at org.apache.hadoop.hbase.regionserver.HRegion.
>> processRowsWithLocks(HRegion.java:6935)
>>at org.apache.hadoop.hbase.regionserver.HRegion.
>> mutateRowsWithLocks(HRegion.java:6885)
>>at org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.
>> mutateRows(MultiRowMutationEndpoint.java:116)
>>at org.apache.hadoop.hbase.protobuf.generated.
>> MultiRowMutationProtos$MultiRowMutationService.callMethod(
>> MultiRowMutationProtos.java:2053)
>>at org.apache.hadoop.hbase.regionserver.HRegion.
>> execService(HRegion.java:7875)
>>at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>> execServiceOnRegion(RSRpcServices.java:2008)
>> 
>> 
>> Best regards,
>> Minwoo Kang
>> 
>> 
>> 보낸 사람: 张铎(Duo Zhang) 
>> 보낸 날짜: 2018년 5월 23일 수요일 09:22
>> 받는 사람: hbase-user
>> 제목: Re: can not write to HBase
>> 
>> What is the exception? And the stacktrace?
>> 
>> 2018-05-23 8:17 GMT+08:00 Kang Minwoo :
>> 
>>> Hello, Users
>>> 
>>> My HBase client does not work after print the following logs.
>>>

Re: Want to change key structure

2018-02-20 Thread anil gupta

Hi Marcell,

Since key is changing you will need to rewrite the entire table. I think
generating HFlies(rather than doing puts) will be the most efficient here.
IIRC, you will need to use HFileOutputFormat in your MR job.
For locality, i dont think you should worry that much because major
compaction usually takes care of it. If you want very high locality from
beginning then you can run a major compaction on new table after your
initial load.

HTH,
Anil Gupta

On Mon, Feb 19, 2018 at 11:46 PM, Marcell Ortutay <mortu...@23andme.com>
wrote:

> I have a large HBase table (~10 TB) that has an existing key structure.
> Based on some recent analysis, the key structure is causing performance
> problems for our current query load. I would like to re-write the table
> with a new key structure that performs substantially better.
>
> What is the best way to go about re-writing this table? Since they key
> structure will change, it will affect locality, so all the data will have
> to move to a new location. If anyone can point to examples of code that
> does something like this, that would be very helpful.
>
> Thanks,
> Marcell
>

-- 
Thanks & Regards,
Anil Gupta

Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?

2018-02-10 Thread anil gupta

Thanks Ted. Will try to do the clean-up. Unfortunately, we ran out of
support for this cluster since its nearing End-of-life. For our new
clusters we are in process of getting support.

PS: IMO, I agree that i should use vendor forum/list for any vendor
specific stuff but i think its appropriate to use this mailing Apache HBase
questions/issues related to HBase. As per my understanding, Apache projects
are supposed to encourage collaboration rather building boundaries around
vendors.("collaboration and openness" is one of the reason i like Apache
Projects)

On Sat, Feb 10, 2018 at 10:11 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> You can cleanup oldwal directory beginning with oldest data.
>
> Please open support case with the vendor.
>
> On Sat, Feb 10, 2018 at 10:02 AM, anil gupta <anilgupt...@gmail.com>
> wrote:
>
> > Hi Ted,
> >
> > We cleaned up all the snaphsots around Feb 7-8th. You were right that i
> > dont see the CorruptedSnapshotException since then. Nice observation!
> > So, i am again back to square one. Not really, sure why oldwals and
> > recovered.edits are not getting cleaned up. I have already removed all
> the
> > replication peer and deleted all the snapshots.
> > Is it ok if i just ahead and cleanup oldwal directory manually? Can i
> also
> > clean up recovered.edits?
> >
> > Thanks,
> > Anil
> >
> > On Sat, Feb 10, 2018 at 9:37 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Can you clarify whether /apps/hbase/data/.hbase-snapshot/.tmp/ became
> > > empty
> > > after 2018-02-07 09:10:08 ?
> > >
> > > Do you see CorruptedSnapshotException for file outside of
> > > /apps/hbase/data/.hbase-snapshot/.tmp/ ?
> > >
> > > Cheers
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>

-- 
Thanks & Regards,
Anil Gupta

Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?

2018-02-10 Thread anil gupta

Hi Ted,

We cleaned up all the snaphsots around Feb 7-8th. You were right that i
dont see the CorruptedSnapshotException since then. Nice observation!
So, i am again back to square one. Not really, sure why oldwals and
recovered.edits are not getting cleaned up. I have already removed all the
replication peer and deleted all the snapshots.
Is it ok if i just ahead and cleanup oldwal directory manually? Can i also
clean up recovered.edits?

Thanks,
Anil

On Sat, Feb 10, 2018 at 9:37 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you clarify whether /apps/hbase/data/.hbase-snapshot/.tmp/ became
> empty
> after 2018-02-07 09:10:08 ?
>
> Do you see CorruptedSnapshotException for file outside of
> /apps/hbase/data/.hbase-snapshot/.tmp/ ?
>
> Cheers
>

-- 
Thanks & Regards,
Anil Gupta

Re: [Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?

2018-02-10 Thread anil gupta

Hi Ted,

Thanks for your reply. I read the comment of jira. But, in my case
"/apps/hbase/data/.hbase-snapshot/.tmp/" is already empty. So, i am not
really sure what i can sideline. Please let me know if i am missing
something.

~Anil


On Sat, Feb 10, 2018 at 8:35 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please the first few review comments of HBASE-16464.
>
> You can sideline the corrupt snapshots (according to master log).
>
> You can also contact the vendor for a HOTFIX.
>
> Cheers
>
> On Sat, Feb 10, 2018 at 8:13 AM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > We are running HBase1.1.2. It seems like we are hitting
> > https://issues.apache.org/jira/browse/HBASE-16464 in our Production
> > cluster. Our oldwals folder has grown to 9.5Tb. I am aware that this is
> > fixed in releases after 2016 but unfortunately we need to operate this
> > production cluster for few more months. (We are already migrating to a
> > newer version of HBase).
> >
> > I have verified that we dont have any snapshots in this cluster. Also, we
> > removed all the replication_peers from that cluster. We have already
> > restarted HBase master a few days ago but it didnt help.  We have TB's of
> > oldwal and tens of thousand of recovered edit files.(assuming recovered
> > edits files are cleaned up by chore cleaner). Seems like the problem
> > started happening around mid december but at that time we didnt do any
> > major thing on this cluster.
> >
> > I would like to see if there is a workaround for HBASE-16464? Is there
> any
> > references left to those deleted snapshots in hdfs or zk? If yes, how
> can i
> > clean up?
> >
> > I keep on seeing this in HMaster logs:
> > 2018-02-07 09:10:08,514 ERROR
> > [hdpmaster6.bigdataprod1.wh.truecarcorp.com,6,
> > 1517601353645_ChoreService_3]
> > snapshot.SnapshotHFileCleaner: Exception while checking if files were
> > valid, keeping them just in case.
> > org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't
> read
> > snapshot info
> > from:hdfs://PRODNN/apps/hbase/data/.hbase-snapshot/.tmp/
> > LEAD_SALES-1517979610/.snapshotinfo
> > at
> > org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.
> > readSnapshotInfo(SnapshotDescriptionUtils.java:313)
> > at
> > org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(
> > SnapshotReferenceUtil.java:328)
> > at
> > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.
> > filesUnderSnapshot(SnapshotHFileCleaner.java:85)
> > at
> > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.
> > getSnapshotsInProgress(SnapshotFileCache.java:303)
> > at
> > org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.
> > getUnreferencedFiles(SnapshotFileCache.java:194)
> > at
> > org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.
> > getDeletableFiles(SnapshotHFileCleaner.java:62)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(
> > CleanerChore.java:233)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:157)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> > checkAndDeleteDirectory(CleanerChore.java:180)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:149)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> > checkAndDeleteDirectory(CleanerChore.java:180)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:149)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> > checkAndDeleteDirectory(CleanerChore.java:180)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:149)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> > checkAndDeleteDirectory(CleanerChore.java:180)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:149)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> > checkAndDeleteDirectory(CleanerChore.java:180)
> > at
> > org.apache.hadoop.hbase.master.cleaner.CleanerChore.
> checkAndDeleteEntries(
> > CleanerChore.java:149)
> > at
> >

[Production Impacted] Any workaround for https://issues.apache.org/jira/browse/HBASE-16464?

2018-02-10 Thread anil gupta

(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1215)
at
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:303)
at
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:269)
at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:261)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1540)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:303)
at
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:299)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767)
at
org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:306)
... 26 more


-- 
Thanks & Regards,
Anil Gupta

Frequent Region Server Failures with namenode.LeaseExpiredException

2018-02-08 Thread anil gupta

Hi Folks,

We are running a 60 Node MapReduce/HBase HDP cluster. HBase 1.1.2 , HDP:
2.3.4.0-3485. Phoenix is enabled on this cluster.
Each slave has ~120gb ram. RS has 20 Gb heap, 12 disk of 2Tb each and 24
cores. This cluster has been running OK for last 2 years but recently with
few disk failures(we unmounted those disks) it hasnt been running fine. I
have checked hbck and hdfs fsck. Both of them report no inconsistency.

Some our RegionServers keeps on aborting with following error:
1 ==>
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on
/apps/hbase/data/data/default/DE.TABLE_NAME/35aa0de96715c33e1f0664aa4d9292ba/recovered.edits/03948161445.temp
(inode 420864666): File does not exist. [Lease. Holder:
DFSClient_NONMAPREDUCE_-64710857_1, pendingcreates: 1]

2 ==> 2018-02-08 03:09:51,653 ERROR [regionserver/
hdpslave26.bigdataprod1.com/1.16.6.56:16020] regionserver.HRegionServer:
Shutdown / close of WAL failed:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/apps/hbase/data/oldWALs/hdpslave26.bigdataprod1.com%2C16020%2C1518027416930.default.1518085177903
(inode 420996935): File is not open for writing. Holder
DFSClient_NONMAPREDUCE_649736540_1 does not have any open files.

All the LeaseExpiredException are happening for recovered.edits and
oldWALs.

HDFS is around 48% full. Most of the DN's have 30-40% space left on them.
NN heap is at 60% use. I have tried googling around but cant find anything
concrete to fix this problem. Currently, 15/60 nodes are already down in
last 2 days.
Can someone please point out what might be causing these RegionServer
failures?

--
Thanks & Regards,
Anil Gupta

Re: hbase data migration from one cluster to another cluster on different versions

2017-10-26 Thread anil gupta

> > >> > at org.apache.hadoop.mapred.
> YarnChild.main(YarnChild.java:
> > > 158)
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > can anyone suggest how to migrate data?
> > > >> >
> > > >> > Thanks
> > > >> > Manjeet Singh
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Hi All,
> > > >> >
> > > >> > I have query regarding hbase data migration from one cluster to
> > > another
> > > >> > cluster in same N/W, but with a different version of hbase one is
> > > >> 0.94.27
> > > >> > (source cluster hbase) and another is destination cluster hbase
> > > version
> > > >> is
> > > >> > 1.2.1.
> > > >> >
> > > >> > I have used below command to take backup of hbase table on source
> > > >> cluster
> > > >> > is:
> > > >> >  ./hbase org.apache.hadoop.hbase.mapreduce.Export SPDBRebuild
> > > >> > /data/backupData/
> > > >> >
> > > >> > below files were genrated by using above command:-
> > > >> >
> > > >> >
> > > >> > drwxr-xr-x 3 root root4096 Dec  9  2016 _logs
> > > >> > -rw-r--r-- 1 root root   788227695 Dec 16  2016 part-m-0
> > > >> > -rw-r--r-- 1 root root  1098757026 Dec 16  2016 part-m-1
> > > >> > -rw-r--r-- 1 root root   906973626 Dec 16  2016 part-m-2
> > > >> > -rw-r--r-- 1 root root  1981769314 Dec 16  2016 part-m-3
> > > >> > -rw-r--r-- 1 root root  2099785782 Dec 16  2016 part-m-4
> > > >> > -rw-r--r-- 1 root root  4118835540 Dec 16  2016 part-m-5
> > > >> > -rw-r--r-- 1 root root 14217981341 Dec 16  2016 part-m-6
> > > >> > -rw-r--r-- 1 root root   0 Dec 16  2016 _SUCCESS
> > > >> >
> > > >> >
> > > >> > in order to restore these files I am assuming I have to move these
> > > >> files in
> > > >> > destination cluster and have to run below command
> > > >> >
> > > >> > hbase org.apache.hadoop.hbase.mapreduce.Import 
> > > >> > /data/backupData/
> > > >> >
> > > >> > Please suggest if I am on correct direction, second if anyone have
> > > >> another
> > > >> > option.
> > > >> > I have tryed this with test data but above command took very long
> > time
> > > >> and
> > > >> > at end it gets fails
> > > >> >
> > > >> > 17/10/23 11:54:21 INFO mapred.JobClient:  map 0% reduce 0%
> > > >> > 17/10/23 12:04:24 INFO mapred.JobClient: Task Id :
> > > >> > attempt_201710131340_0355_m_02_0, Status : FAILED
> > > >> > Task attempt_201710131340_0355_m_02_0 failed to report status
> > for
> > > >> 600
> > > >> > seconds. Killing!
> > > >> >
> > > >> >
> > > >> > Thanks
> > > >> > Manjeet Singh
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > luv all
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > luv all
> > > >
> > >
> > >
> > >
> > > --
> > > luv all
> > >
> >
> --
>
>
> -- Enrico Olivelli
>



-- 
Thanks & Regards,
Anil Gupta

Re: HBASE data been deleted! Please HELP

2017-09-28 Thread anil gupta

AFAIK, in order to recover data, user has to react within minutes or
seconds.  But, have you checked ".Trash" folder in hdfs under hbase user or
the user that issued the rmr command.

On Thu, Sep 28, 2017 at 5:53 AM, hua beatls <bea...@gmail.com> wrote:

> Hello, I have a big problem
> We deleted hbase data with " hdfs dfs -rmr -skipTrash /hbase",
>
> Is there any way to recovery the deleted date?
>
> Thanks a lot!
>

-- 
Thanks & Regards,
Anil Gupta

Re: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689

2017-07-06 Thread anil gupta

Thanks for the pointers Aaron. We checked hdfs. Its reporting 0
underreplicated or corrupted blocks.

@Ted: we are using Hadoop 2.7.3(EMR5.7.2)

On Thu, Jul 6, 2017 at 4:49 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which hadoop release are you using ?
>
> In FSOutputSummer.java, I see the following around line 106:
>
> checkClosed();
>
> if (off < 0 || len < 0 || off > b.length - len) {
>   throw new ArrayIndexOutOfBoundsException();
>
> You didn't get ArrayIndexOutOfBoundsException - maybe b was null ?
>
> On Thu, Jul 6, 2017 at 2:08 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
>> Hey Ted,
>>
>> This is what i see in one of region server log(NPE at the bottom):
>> 2017-07-06 19:07:07,778 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 155739ms
>> 2017-07-06 19:07:17,853 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 132731ms
>> 2017-07-06 19:07:28,038 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 4316ms
>> 2017-07-06 19:07:37,819 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 190960ms
>> 2017-07-06 19:07:47,767 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 41231ms
>> 2017-07-06 19:07:57,767 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 222748ms
>> 2017-07-06 19:08:07,973 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 245966ms
>> 2017-07-06 19:08:18,669 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 76257ms
>> 2017-07-06 19:08:28,029 INFO
>> [ip-10-74-5-153.us-west-2.compute.internal,16020,14993202605
>> 01_ChoreService_1]
>> regionserver.HRegionServer:
>> ip-10-74-5-153.us-west-2.compute.internal,16020,149932026050
>> 1-MemstoreFlusherChore
>> requesting flush of
>> SYSTEM.CATALOG,,1499317930655.a47ffa359aa4588f5f360790ac8e4561. because 0
>> has an old edit so flush to free WALs after random delay 78310ms
>> 2017-07-06 19:08:38,459 INFO
>> [ip-10-74-5-15

Re: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689

2017-07-06 Thread anil gupta

(BatchEventProcessor.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-07-06 19:48:39,255 WARN
[regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller]
wal.FSHLog: Failed sync-before-close but no outstanding appends; closing
WAL: org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append
sequenceId=7846, requesting roll of WAL
2017-07-06 19:48:39,261 INFO
[regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller]
wal.FSHLog: Rolled WAL
/user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086
with entries=0, filesize=174 B; new WAL
/user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370519235
2017-07-06 19:48:39,261 INFO
[regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.logRoller]
wal.FSHLog: Archiving
hdfs://ip-10-74-31-169.us-west-2.compute.internal:8020/user/hbase/WALs/ip-10-74-5-153.us-west-2.compute.internal,16020,1499320260501/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086
to
hdfs://ip-10-74-31-169.us-west-2.compute.internal:8020/user/hbase/oldWALs/ip-10-74-5-153.us-west-2.compute.internal%2C16020%2C1499320260501.default.1499370518086
2017-07-06 19:48:40,322 WARN
[regionserver/ip-10-74-5-153.us-west-2.compute.internal/10.74.5.153:16020.append-pool1-t1]
wal.FSHLog: Append sequenceId=7847, requesting roll of WAL
java.lang.NullPointerException
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:106)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:60)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.hbase.KeyValue.oswrite(KeyValue.java:2571)
at org.apache.hadoop.hbase.KeyValueUtil.oswrite(KeyValueUtil.java:623)
at
org.apache.hadoop.hbase.regionserver.wal.WALCellCodec$EnsureKvEncoder.write(WALCellCodec.java:338)
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:122)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1909)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1773)
at
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1695)
at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

On Thu, Jul 6, 2017 at 1:55 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> HBASE-16960 mentioned the following :
>
> Caused by: java.net.SocketTimeoutException: 2 millis timeout while
> waiting for channel to be ready for read
>
> Do you see similar line in region server log ?
>
> Cheers
>
> On Thu, Jul 6, 2017 at 1:48 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Hi All,
> >
> > We are running HBase/Phoenix on EMR5.2(HBase1.2.3 and Phoenix4.7) and we
> running into following exception when we are trying to load data into one
> of our Phoenix table:
> > 2017-07-06 19:57:57,507 INFO [hconnection-0x60e5272-shared--pool2-t249]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=DE.CONFIG_DATA,
> attempt=30/35 failed=38ops, last exception: org.apache.hadoop.hbase.
> regionserver.wal.DamagedWALException: org.apache.hadoop.hbase.
> regionserver.wal.DamagedWALException: Append sequenceId=8689, requesting
> roll of WAL
> >   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> RingBufferEventHandler.append(FSHLog.java:1921)
> >   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> RingBufferEventHandler.onEvent(FSHLog.java:1773)
> >   at org.apache.hadoop.hbase.regionserver.wal.FSHLog$
> RingBufferEventHandler.onEvent(FSHLog.java:1695)
> >   at com.lmax.disruptor.BatchEventProcessor.run(
> BatchEventProcessor.java:128)
> >   at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >   at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >   at java.lang.Thread.run(Thread.java:745)
> >
> > We are OK with wiping out this table and rebuilding the dataset. We
> tried to drop the table and recreate the table but it didnt fix it.
> > Can anyone please let us know how can we get rid of above problem? Are
> we running into https://issues.apache.org/jira/browse/HBASE-16960?
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append sequenceId=8689

2017-07-06 Thread anil gupta

Hi All,

We are running HBase/Phoenix on EMR5.2(HBase1.2.3 and Phoenix4.7) and
we running into following exception when we are trying to load data
into one of our Phoenix table:
2017-07-06 19:57:57,507 INFO
[hconnection-0x60e5272-shared--pool2-t249]
org.apache.hadoop.hbase.client.AsyncProcess: #1, table=DE.CONFIG_DATA,
attempt=30/35 failed=38ops, last exception:
org.apache.hadoop.hbase.regionserver.wal.DamagedWALException:
org.apache.hadoop.hbase.regionserver.wal.DamagedWALException: Append
sequenceId=8689, requesting roll of WAL
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.append(FSHLog.java:1921)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1773)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$RingBufferEventHandler.onEvent(FSHLog.java:1695)
at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

We are OK with wiping out this table and rebuilding the dataset. We
tried to drop the table and recreate the table but it didnt fix it.
Can anyone please let us know how can we get rid of above problem? Are
we running into https://issues.apache.org/jira/browse/HBASE-16960?


-- 
Thanks & Regards,
Anil Gupta

mapreduce.LoadIncrementalHFiles: Split occured while grouping HFiles

2017-05-25 Thread anil gupta

Cross posting since this seems to be an HBase issue.
I think completeBulkLoad step is failing. Please refer to the mail below.

-- Forwarded message --
From: anil gupta <anilgupt...@gmail.com>
Date: Thu, May 25, 2017 at 4:38 PM
Subject: [IndexTool NOT working] mapreduce.LoadIncrementalHFiles: Split
occured while grouping HFiles
To: "u...@phoenix.apache.org" <u...@phoenix.apache.org>


Hi,

We are using HDP2.3.2(Phoenix 4.4 and HBase 1.1), we created a secondary
index on an already existing table. We paused all writes to Primary table.
Then we invoked IndexTool to populate secondary index table. We have tried
same steps many times but we keep on getting following error(we have also
tried drop the index and adding it again):

2017-05-24 18:00:10,281 WARN  [LoadIncrementalHFiles-2]
util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is
deprecated by hbase.regionserver.global.memstore.size
2017-05-24 18:00:10,340 WARN  [LoadIncrementalHFiles-12]
util.HeapMemorySizeUtil: hbase.regionserver.global.memstore.upperLimit is
deprecated by hbase.regionserver.global.memstore.size
2017-05-24 18:00:10,342 INFO  [LoadIncrementalHFiles-11]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
c79ae6d27824424f99523dad586e86b1 first=JF2GPADC8GH331037\x00\
x80\x00\x1A0\x80\x00\x01Wj\x03r1defc4d301e4ec172b49be4a7ea33c2f7
last=JTHBK1GG4E2122477\x00\x80\x00$\xE4\x80\x00\x01[\xAD`{\
x17901d036d588292854ac5b1d4c29d8e1e
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-14]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
f0e97b218aed4abf9949cf49a57e559b first=5NPEB4AC3DH620091\x00\
x80\x00\xE0\x16\x80\x00\x01X\xE5g\xD6\x0B81d210ac753ed281e8627e5edb7eb59f
last=JF2GPADC8GH331037\x00\x80\x00\x1A0\x80\x00\x01W]&\
xE54f37d636104f6cd916b2b07bf3aa94d3f
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-2]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
27c0a905ee174c9898d324acf1554bf9 first=WMWZP3C58FT708786\x00\
x80\x00\xE0\x16\x80\x00\x01Y\xB8\x95U\xA0d21d32aed18af976dd53735705c728cd
last=`1GCRCPE05BZ430377\x00\x80\x00}\x05\x80\x00\x01[\xDEE\
x91L383768c6ac5f306fa99f68964b4f18aa
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-12]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
d0a6c4b727bb416f840ed254658f3982 first=1N4BZ0CP4GC308715\x00\
x80\x01T\xFC\x80\x00\x01U\xE3\x7FL\x9A37b77d47941e99e430fcb0e0657f5558
last=2GKALMEK7H6220949\x00\x80\x00!\x1A\x80\x00\x01Y\x18\xE6\xB3\
xB42e72036f7e7e03078f41fc82712c5de7
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-0]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
11627a2861e3446e9d6f684ab534563e first=3C7WRNAJ6GG313342\x00\
x80\x00NB\x80\x00\x01V}\xFD\xE4+65bbebdd06dedd8466a31ebd33841a51
last=3N1CE2CP2FL407481\x00\x80\x00\xE0\x16\x80\x00\x01W\x1B\x0A\x02\
xC1fc95d4114d5e91197a5e41bf37c9e8c7
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-1]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
23df78bafd304ff887385a2b6becf06d first=1C4RJFLT6HC742023\x00\
x80\x00x\xFF\x80\x00\x01[J8\x0Ac8b65a80fe1662fb25d80798a66cc83dc
last=1FMCU9J90EUB68140\x00\x80\x01X\xA4\x80\x00\x01[\x1B\xDD\xB2\
x1C577502512ec987844b0108738a9ec6ba
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-3]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
39dc73882bec49a0bdd5d787b06ac032 first=1G1JD5SB5H4136951\x00\
x80\x00!\x8A\x80\x00\x01Z%\xF6\x7Ffef0b8faeeeb4a10103e1a67ea5ebdbec
last=1GNKVHKD7HJ275239\x00\x80\x00$\x87\x80\x00\x01Z%\xF6s\
xDC0961566a370af3b7da440e9705bc4c8c
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-8]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
a37a2a56ff5c48399cf1abd92f99662f first=###239824\x00\
x80\x01(\xFE\x80\x00\x01Z\xAE\xD6\xE0Xe5a45a2beab337228bdba90c06f34a12
last=1C4RJFLT6HC742023\x00\x80\x00x\xFF\x80\x00\x01[H\xF9w\
x8D60edb518c27ef80f8a751701926d9174
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-10]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
b5843a62a6bd47fbbfc29303bee158e3 first=5FNRL5H4XFB033259\x00\
x80\x00\x1EZ\x80\x00\x01\x5C"\x87s\xF5ce24ec7e2a3698836386bccabc1265af
last=5NPEB4AC3DH620091\x00\x80\x00\xE0\x16\x80\x00\x01X\xE4\x1B\
x9Dq95568f371c1ebd06c497df7129f248a2
2017-05-24 18:00:10,343 INFO  [LoadIncrementalHFiles-5]
mapreduce.LoadIncrementalHFiles:
Trying to load hfile=hdfs://HDFS1/user/hbase/idx_test_7/BI.VIN_IDX/I/
595e89b2fae8494d8a878bc6ba306e2f first=JTHBK1GG4E2122477\x00\
x80\x00$\xE4\x80\x00\x01[\xAF\xF0]%d306ddb81ea3bc093c40efe9f198f03a
last=KNDPMCAC1H7201793\x00\x80\x00\x

Re: HBASE and MOB

2017-05-12 Thread anil gupta

Backporting MOB wont be a trivial task.
AFAIK, Cloudera backported MOB to HBase1.x  branch for CDH(its not in
apache HBase1.x branch yet). It might be easier to just use CDH for MOB.

On Fri, May 12, 2017 at 8:51 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Thanks for those details.
>
> How big are you PDF? Are they all small size? If they are not above 1MB,
> MOBs will not really be 100% mandatory. Even if few of them are above.
>
> If you want to apply the patch on another branch,this is what is called a
> back port (like Ted said before) and will require a pretty good amount of
> work. You can jump on that, but if you are not used to the HBase code, it
> might be a pretty big challenge...
>
> Another way is to look for an HBase distribution that already includes the
> MOB code already.
>
> JMS
>
> 2017-05-12 11:21 GMT-04:00 F. T. <bibo...@hotmail.fr>:
>
> > Hi Jean Marc
> >
> > I'm using a 1.2.3 version. I downloaded a "bin" version from Apache
> > official web site. Maybe I've to install it from the "src" option with
> mvn ?
> >
> > I would like index PDF into Hbase and use it in a Solr collection.
> >
> > In fact I would like reproduce this process :
> > http://blog.cloudera.com/blog/2015/10/how-to-index-scanned-
> > pdfs-at-scale-using-fewer-than-50-lines-of-code/
> >
> >
> > But maybe is there another solution to reproduce it .
> >
> > Fred
> >
> >
> > 
> > De : Jean-Marc Spaggiari <jean-m...@spaggiari.org>
> > Envoyé : vendredi 12 mai 2017 17:06
> > À : user
> > Objet : Re: HBASE and MOB
> >
> > Hi Fred,
> >
> > Can you please confirm the following information?
> >
> > 1) What exact version of HBase are you using? From a distribution, build
> by
> > yourself, from the JARs, etc.
> > 2) Why do you think you need the MOB feature
> > 3) Is an upgrade an option for you or not really.
> >
> > Thanks,
> >
> > JMS
> >
> >
> > 2017-05-12 11:02 GMT-04:00 Ted Yu <yuzhih...@gmail.com>:
> >
> > > It is defined here in
> > > hbase-client/src/main/java/org/apache/hadoop/hbase/
> > HColumnDescriptor.java:
> > >   public static final String IS_MOB = "IS_MOB";
> > >
> > > MOB feature hasn't been backported to branch-1 (or earlier releases).
> > >
> > > Looks like you're using a vendor's release.
> > >
> > > Consider contacting the corresponding mailing list if you are stuck.
> > >
> > > On Fri, May 12, 2017 at 7:59 AM, F. T. <bibo...@hotmail.fr> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to use MOB in HBase to store PDF files. I'm using Hbase
> 1.2.3
> > > but
> > > > I'get this error creating a table with MOB column : NameError:
> > > > uninitialized constant IS_MOB.
> > > >
> > > > A lot of web sites (including Apache official web site) talk about
> the
> > > > patch 11339 or HBase 2.0.0, but, I don't find any explanation about
> the
> > > way
> > > > to install this patch and
> > > >
> > > > I can't find the 2.0.0 version anywhere. So I'm completly lost. Could
> > you
> > > > help me please ?
> > > >
> > > >
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: limiting user threads on client

2017-03-13 Thread anil gupta

I think you need to set that property before you make HBaseConfiguration
object. Have you tried that?

On Mon, Mar 13, 2017 at 10:24 AM, Henning Blohm <henning.bl...@zfabrik.de>
wrote:

> Unfortunately it doesn't seem to make a difference.
>
> I see that the configuration has hbase.htable.threads.max=1 right before
> setting up the Connection but then I still get hundreds of
>
> hconnection-***
>
> threads. Is that actually Zookeeper?
>
> Thanks,
> Henning
>
> On 13.03.2017 17:28, Ted Yu wrote:
>
>> Are you using Java client ?
>> See the following in HTable :
>>
>>public static ThreadPoolExecutor getDefaultExecutor(Configuration
>> conf) {
>>
>>  int maxThreads = conf.getInt("hbase.htable.threads.max", Integer.
>> MAX_VALUE);
>>
>> FYI
>>
>> On Mon, Mar 13, 2017 at 9:14 AM, Henning Blohm <henning.bl...@zfabrik.de>
>> wrote:
>>
>> Hi,
>>>
>>> I am running an HBase client on a very resource limited machine. In
>>> particular numproc is limited so that I frequently get "Cannot create
>>> native thread" OOMs. I noticed that, in particular in write situations,
>>> the
>>> hconnection pool grows into the hundreds of threads - even when at most
>>> writing with less than ten application threads. Threads are discarded
>>> again
>>> after some minutes.
>>>
>>> In conjunction with other programs running on that machine, this
>>> sometimes
>>> leads to an "overload" situation.
>>>
>>> Is there a way to keep thread pool usage limited - or in some closer
>>> relation with the actual concurrency required?
>>>
>>> Thanks,
>>>
>>> Henning
>>>
>>>
>>>
>>>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Hbase Row key lock

2016-10-23 Thread anil gupta

As per my experience, in normal case lock wont be held for 60 seconds.
How many writes/sec per node you are doing?
Seems like there is some hotspotting in your use case or cluster might need
some tuning/tweaking. Have you verified that your writes/reads are evenly
spread out. Do u have a time component as prefix of your rowkey?

On Sun, Oct 23, 2016 at 7:01 PM, Manjeet Singh <manjeet.chand...@gmail.com>
wrote:

> Anil its written it can hold lock upto 60 second. In my case my job get
> stuck and many update for same rowkey cause fir bead health of hbase in cdh
> 5.8
>
> On 24 Oct 2016 06:26, "anil gupta" <anilgupt...@gmail.com> wrote:
>
> Writes/Updates usually takes few milliseconds in HBase. So, in normal cases
> lock wont be held for seconds.
>
> On Sun, Oct 23, 2016 at 12:57 PM, Manjeet Singh <
> manjeet.chand...@gmail.com>
> wrote:
>
> > Anil all information are correct I am talking about suppose I didn't set
> > any version and I have very simple requirement to update if I found xyz
> > record and if I hv few ETL process which are responsible for aggregate
> the
> > data which is very common. ... why my hbase stuck if I try to update same
> > rowkey... its mean its hold the lock for few second
> >
> > On 24 Oct 2016 00:46, "anil gupta" <anilgupt...@gmail.com> wrote:
> >
> > > Writes within a HBase row are atomic. Now, whichever write becomes the
> > > latest write(with the help of timestamp value) will prevail as the
> > default
> > > value. If you set versions to more than 1 in column family, then you
> will
> > > be able to see both the values if you query for multiple versions.
> > >
> > > HTH,
> > > Anil Gupta
> > >
> > > On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh <
> > > manjeet.chand...@gmail.com>
> > > wrote:
> > >
> > > > Till now what i understand their is no update
> > > >
> > > > if two different thread try to update same record what happen
> > > >
> > > > first record insert with some version
> > > > second thread comes and change the version and its like a new insert
> > with
> > > > some version
> > > > this process called MVCC
> > > >
> > > > If I am correct how hbase support MVCC mean any configuration for
> > > handlling
> > > > multiple thread at same time?
> > > >
> > > > On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh <
> > > > manjeet.chand...@gmail.com>
> > > > wrote:
> > > >
> > > > > No I don't have 50 clients? I want to understand internal working
> of
> > > > Hbase
> > > > > in my usecase I have bulk update operation from spark job we have 7
> > > > > different kafka pipeline and 7 spark job
> > > > > it might happen that my 2 0r 3 spark job have same rowkey for
> update
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak <
> dimaspi...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > >> If your typical use case sees 50 clients simultaneously trying to
> > > update
> > > > >> the same row, then a strongly consistent data store that writes to
> > > disk
> > > > >> for
> > > > >> fault tolerance may not be for you. That said, such a use case
> seems
> > > > >> extremely unusual to me and I'd ask why you're trying to update
> the
> > > same
> > > > >> row in such a manner.
> > > > >>
> > > > >> On Sunday, October 23, 2016, Manjeet Singh <
> > > manjeet.chand...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Dima,
> > > > >> >
> > > > >> > I didn't get ? point is assume I have 50 different client all
> > having
> > > > >> same
> > > > >> > rowkey all want to update on same rowkey at same time now just
> > tell
> > > > what
> > > > >> > will happen? who will get what value?
> > > > >> >
> > > > >> > Thanks
> > > > >> > Manjeet
> > > > >> >
> > > > >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak <
> > > dimaspi...@apache.org
> > > > >> > <javascript:;>> wrote:
> >

Re: Hbase Row key lock

2016-10-23 Thread anil gupta

Writes/Updates usually takes few milliseconds in HBase. So, in normal cases
lock wont be held for seconds.

On Sun, Oct 23, 2016 at 12:57 PM, Manjeet Singh <manjeet.chand...@gmail.com>
wrote:

> Anil all information are correct I am talking about suppose I didn't set
> any version and I have very simple requirement to update if I found xyz
> record and if I hv few ETL process which are responsible for aggregate the
> data which is very common. ... why my hbase stuck if I try to update same
> rowkey... its mean its hold the lock for few second
>
> On 24 Oct 2016 00:46, "anil gupta" <anilgupt...@gmail.com> wrote:
>
> > Writes within a HBase row are atomic. Now, whichever write becomes the
> > latest write(with the help of timestamp value) will prevail as the
> default
> > value. If you set versions to more than 1 in column family, then you will
> > be able to see both the values if you query for multiple versions.
> >
> > HTH,
> > Anil Gupta
> >
> > On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh <
> > manjeet.chand...@gmail.com>
> > wrote:
> >
> > > Till now what i understand their is no update
> > >
> > > if two different thread try to update same record what happen
> > >
> > > first record insert with some version
> > > second thread comes and change the version and its like a new insert
> with
> > > some version
> > > this process called MVCC
> > >
> > > If I am correct how hbase support MVCC mean any configuration for
> > handlling
> > > multiple thread at same time?
> > >
> > > On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh <
> > > manjeet.chand...@gmail.com>
> > > wrote:
> > >
> > > > No I don't have 50 clients? I want to understand internal working of
> > > Hbase
> > > > in my usecase I have bulk update operation from spark job we have 7
> > > > different kafka pipeline and 7 spark job
> > > > it might happen that my 2 0r 3 spark job have same rowkey for update
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak <dimaspi...@apache.org
> >
> > > > wrote:
> > > >
> > > >> If your typical use case sees 50 clients simultaneously trying to
> > update
> > > >> the same row, then a strongly consistent data store that writes to
> > disk
> > > >> for
> > > >> fault tolerance may not be for you. That said, such a use case seems
> > > >> extremely unusual to me and I'd ask why you're trying to update the
> > same
> > > >> row in such a manner.
> > > >>
> > > >> On Sunday, October 23, 2016, Manjeet Singh <
> > manjeet.chand...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Dima,
> > > >> >
> > > >> > I didn't get ? point is assume I have 50 different client all
> having
> > > >> same
> > > >> > rowkey all want to update on same rowkey at same time now just
> tell
> > > what
> > > >> > will happen? who will get what value?
> > > >> >
> > > >> > Thanks
> > > >> > Manjeet
> > > >> >
> > > >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak <
> > dimaspi...@apache.org
> > > >> > <javascript:;>> wrote:
> > > >> >
> > > >> > > Unless told not to, HBase will always write to memory and append
> > to
> > > >> the
> > > >> > WAL
> > > >> > > on disk before returning and saying the write succeeded. That's
> by
> > > >> design
> > > >> > > and the same write pattern that companies like Apple and
> Facebook
> > > have
> > > >> > > found works for them at scale. So what's there to solve?
> > > >> > >
> > > >> > > On Sunday, October 23, 2016, Manjeet Singh <
> > > >> manjeet.chand...@gmail.com
> > > >> > <javascript:;>>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi All,
> > > >> > > >
> > > >> > > > I have read below mention blog and it also said Hbase holds
> the
> > > >> lock on
> > > >> > > > rowkey level
> > > >> > > > h

Re: Hbase Row key lock

2016-10-23 Thread anil gupta

Writes within a HBase row are atomic. Now, whichever write becomes the
latest write(with the help of timestamp value) will prevail as the default
value. If you set versions to more than 1 in column family, then you will
be able to see both the values if you query for multiple versions.

HTH,
Anil Gupta

On Sun, Oct 23, 2016 at 12:02 PM, Manjeet Singh <manjeet.chand...@gmail.com>
wrote:

> Till now what i understand their is no update
>
> if two different thread try to update same record what happen
>
> first record insert with some version
> second thread comes and change the version and its like a new insert with
> some version
> this process called MVCC
>
> If I am correct how hbase support MVCC mean any configuration for handlling
> multiple thread at same time?
>
> On Mon, Oct 24, 2016 at 12:24 AM, Manjeet Singh <
> manjeet.chand...@gmail.com>
> wrote:
>
> > No I don't have 50 clients? I want to understand internal working of
> Hbase
> > in my usecase I have bulk update operation from spark job we have 7
> > different kafka pipeline and 7 spark job
> > it might happen that my 2 0r 3 spark job have same rowkey for update
> >
> >
> >
> > On Mon, Oct 24, 2016 at 12:20 AM, Dima Spivak <dimaspi...@apache.org>
> > wrote:
> >
> >> If your typical use case sees 50 clients simultaneously trying to update
> >> the same row, then a strongly consistent data store that writes to disk
> >> for
> >> fault tolerance may not be for you. That said, such a use case seems
> >> extremely unusual to me and I'd ask why you're trying to update the same
> >> row in such a manner.
> >>
> >> On Sunday, October 23, 2016, Manjeet Singh <manjeet.chand...@gmail.com>
> >> wrote:
> >>
> >> > Hi Dima,
> >> >
> >> > I didn't get ? point is assume I have 50 different client all having
> >> same
> >> > rowkey all want to update on same rowkey at same time now just tell
> what
> >> > will happen? who will get what value?
> >> >
> >> > Thanks
> >> > Manjeet
> >> >
> >> > On Mon, Oct 24, 2016 at 12:12 AM, Dima Spivak <dimaspi...@apache.org
> >> > <javascript:;>> wrote:
> >> >
> >> > > Unless told not to, HBase will always write to memory and append to
> >> the
> >> > WAL
> >> > > on disk before returning and saying the write succeeded. That's by
> >> design
> >> > > and the same write pattern that companies like Apple and Facebook
> have
> >> > > found works for them at scale. So what's there to solve?
> >> > >
> >> > > On Sunday, October 23, 2016, Manjeet Singh <
> >> manjeet.chand...@gmail.com
> >> > <javascript:;>>
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I have read below mention blog and it also said Hbase holds the
> >> lock on
> >> > > > rowkey level
> >> > > > https://blogs.apache.org/hbase/entry/apache_hbase_
> >> > internals_locking_and
> >> > > > (0) Obtain Row Lock
> >> > > > (1) Write to Write-Ahead-Log (WAL)
> >> > > > (2) Update MemStore: write each cell to the memstore
> >> > > > (3) Release Row Lock
> >> > > >
> >> > > >
> >> > > > SO question is how to solve this if I have very frequent update on
> >> > Hbase
> >> > > >
> >> > > > Thanks
> >> > > > Manjeet
> >> > > >
> >> > > > On Wed, Aug 17, 2016 at 9:54 AM, Manjeet Singh <
> >> > > manjeet.chand...@gmail.com <javascript:;>
> >> > > > <javascript:;>>
> >> > > > wrote:
> >> > > >
> >> > > > > Hi All
> >> > > > >
> >> > > > > Can anyone help me about how and in which version of Hbase
> support
> >> > > Rowkey
> >> > > > > lock ?
> >> > > > > I have seen article about rowkey lock but it was about .94
> >> version it
> >> > > > said
> >> > > > > that if row key not exist and any update request come and that
> >> rowkey
> >> > > not
> >> > > > > exist then in this case Hbase hold the lock for 60 sec.
> >> > > > >
> >> > > > > currently I am using Hbase 1.2.2 version
> >> > > > >
> >> > > > > Thanks
> >> > > > > Manjeet
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > luv all
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > luv all
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > -Dima
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > luv all
> >> >
> >>
> >>
> >> --
> >> -Dima
> >>
> >
> >
> >
> > --
> > luv all
> >
>
>
>
> --
> luv all
>



-- 
Thanks & Regards,
Anil Gupta

Re: CopyTable fails on copying between two secured clusters

2016-09-09 Thread anil gupta

Hi Frank,

I dont know your exact use case. But, I have successfully run copyTable
across *2 secure* clusters back in 2013-2014 on a CDH distro cluster.
Unfortunately, I dont remember the settings or command that we ran to do
that since it was at my previous job.

Thanks,
Anil Gupta

On Fri, Sep 9, 2016 at 10:22 AM, Esteban Gutierrez <este...@cloudera.com>
wrote:

> Hi Frank,
>
> doesn't looks like the you are pointing the znode base to /hbase-secure,
> see the arguments that you provided initially:
>
> "--peer.adr=zookeeper1, zookeeper2:2181:/hbase",
> "--new.name=TargetTable",
> "SourceTable"
>
> if the destination cluster has the base znode under /hbase-secure then you
> need to point to the right base znode in --peer.adr, e.g. something
> like: --peer.adr=zookeeper1, zookeeper2:2181:/hbase-secure
>
> or is there something different you have as the arguments for CopyTable?
>
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Fri, Sep 9, 2016 at 9:40 AM, Frank Luo <j...@merkleinc.com> wrote:
>
> > I think I know the cause now.
> >
> > The code tries to get "baseZNode" from the config. and the latter is
> > obtained from Connection#getConfiguration(). Now we have two connections,
> > one from local hbase, the other remote. The local hbase's connection has
> > the configuration set perfectly, while the one on the remote connection
> > barely has anything, hence not able to get a value of "baseZNode".
> >
> > So based on this theory, CopyTable will never work if the remote is a
> > secured cluster, is that a right assessment? Does anyone have luck to get
> > it work?
> >
> > -Original Message-
> > From: Frank Luo
> > Sent: Thursday, September 08, 2016 6:45 PM
> > To: user@hbase.apache.org
> > Subject: RE: CopyTable fails on copying between two secured clusters
> >
> > I don't think they are pointing to different locations. Both of them
> > should be /hbase-secure.
> >
> > However, the debugger shows that ConnectionManager#retrieveClusterId are
> > called twice, the first time regards to the source cluster, which works
> > fine, and watcher.clusterIdZNode=/hbase-secure/hbaseid, and it is
> correct.
> >
> > The second time for the remote cluster, watcher.clusterIdZNode=/hbase/
> hbaseid,
> > which should be incorrect.
> >
> > What I am suspecting is ZooKeeperWatcher, method setNodeNames. It reads:
> >
> >   private void setNodeNames(Configuration conf) {
> > baseZNode = conf.get(HConstants.ZOOKEEPER_ZNODE_PARENT,
> > HConstants.DEFAULT_ZOOKEEPER_ZNODE_PARENT);
> >
> > I am not sure the conf is corrected fetched from the remote cluster. If
> > not, the default value is given, which is /hbase and incorrect.
> >
> > By the way, below is the root znodes for zookeepers:
> >
> > The source cluster:
> >  [hbase-secure, hiveserver2, hive, hbase-unsecure, templeton-hadoop,
> > hadoop-ha, zookeeper]
> >
> > The target cluster:
> > [hbase-secure, hive, hiveserver2, hbase-unsecure, hadoop-ha, zookeeper]
> >
> > -Original Message-
> > From: Esteban Gutierrez [mailto:este...@cloudera.com]
> > Sent: Thursday, September 08, 2016 1:02 PM
> > To: user@hbase.apache.org
> > Subject: Re: CopyTable fails on copying between two secured clusters
> >
> > Is it possible that in your destination cluster zookeeper.znode.parent
> > points to a different location than /hbase ? If both clusters are under
> the
> > same kerberos realm then there is no need to worry about
> > zookeeper.security.auth_to_local.
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Thu, Sep 8, 2016 at 10:50 AM, Frank Luo <j...@merkleinc.com> wrote:
> >
> > > Thanks Esteban for replying.
> > >
> > > The Kerberos realm is shared between the two clusters.
> > >
> > >  I searched zookeeper config and couldn't find the rule, so where it
> > > is set?
> > >
> > > Having said that, I looked at parameters passed to getData call, and
> > > it doesn't look like security related.
> > >
> > > PS. I am using hbase 1.1.2.
> > >
> > > Here is the log:
> > >
> > > com.merkleinc.cr.hbase_maintenance.tableexport.CopyTableTest,testCopyT
> > > able Connected to the target VM, address: '127.0.0.1:50669',
> > > transport:
> > > 'socket'
> > > 0[main] WARN  org.apache.hadoop.hbase.io.util.HeapMemorySizeUtil
> -
&

Re: How to backport MOB to Hbase 1.2.2

2016-08-21 Thread anil gupta

So, in that case if someone really wants to use MOB without waiting for
HBase2.0 release they can take CDH5.4+ for a spin. Right?

~Anil
PS: I dont work for Cloudera.

On Sun, Aug 21, 2016 at 8:45 AM, Dima Spivak <dspi...@cloudera.com> wrote:

> Hey Anil,
>
> No, you're totally right; CDH 5.4 shipped with MOB, but on an HBase based
> on the upstream 1.0 release. I can tell you firsthand that the time and
> effort undertaken at Cloudera and Intel to make it production-ready (and
> convince ourselves of that through rigorous testing) was pretty
> significant, so someone looking to "roll their own" based on an Apache
> release is in for some long nights.
>
> On Sunday, August 21, 2016, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Hi Dima,
> >
> > I was under impression that some CDH5.x GA release shipped MOB. Is that
> > wrong?
> >
> > Thanks,
> > Anil
> >
> > On Sat, Aug 20, 2016 at 10:48 PM, Dima Spivak <dspi...@cloudera.com
> > <javascript:;>> wrote:
> >
> > > Nope, you'd be in uncharted territory there, my friend, and definitely
> > not
> > > in a place that would be production-ready. Sorry to be the bearer of
> bad
> > > news :(.
> > >
> > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com
> > <javascript:;>> wrote:
> > >
> > > > I have read HBASE-15370.   We have to wait quite a while for HBase
> 2.0,
> > > >  this is the reason why I want to try out MOB now in HBase 1.2.2 in
> my
> > > test
> > > > environment, any steps and guide to do the backport?
> > > >
> > > >
> > > > On Sun, Aug 21, 2016 at 12:44 PM, Dima Spivak <dspi...@cloudera.com
> > <javascript:;>
> > > > <javascript:;>> wrote:
> > > >
> > > > > Hi Ascot,
> > > > >
> > > > > MOB won't be backported into any pre-2.0 HBase branch. HBASE-15370
> > > > tracked
> > > > > the effort and an email thread on the dev list ("[DISCUSS] Criteria
> > for
> > > > > including MOB feature backport in branch-1" started by Ted Yu on
> > March
> > > > 3rd
> > > > > of this year) has additional rationale as to why that is.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com
> > <javascript:;>
> > > > <javascript:;>
> > > > > <javascript:_e(%7B%7D,'cvml','ascot.m...@gmail.com <javascript:;>
> > <javascript:;>');>>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I want to use MOB in Hbase 1.2.2, can anyone advise the step to
> > > > backport
> > > > > > MOB to HBase 1.2.2?
> > > > > >
> > > > > > Regards
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Dima
> > > > >
> > > >
> > >
> > >
> > > --
> > > -Dima
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
> --
> -Dima
>



-- 
Thanks & Regards,
Anil Gupta

Re: How to backport MOB to Hbase 1.2.2

2016-08-21 Thread anil gupta

Hi Dima,

I was under impression that some CDH5.x GA release shipped MOB. Is that
wrong?

Thanks,
Anil

On Sat, Aug 20, 2016 at 10:48 PM, Dima Spivak <dspi...@cloudera.com> wrote:

> Nope, you'd be in uncharted territory there, my friend, and definitely not
> in a place that would be production-ready. Sorry to be the bearer of bad
> news :(.
>
> On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com> wrote:
>
> > I have read HBASE-15370.   We have to wait quite a while for HBase 2.0,
> >  this is the reason why I want to try out MOB now in HBase 1.2.2 in my
> test
> > environment, any steps and guide to do the backport?
> >
> >
> > On Sun, Aug 21, 2016 at 12:44 PM, Dima Spivak <dspi...@cloudera.com
> > <javascript:;>> wrote:
> >
> > > Hi Ascot,
> > >
> > > MOB won't be backported into any pre-2.0 HBase branch. HBASE-15370
> > tracked
> > > the effort and an email thread on the dev list ("[DISCUSS] Criteria for
> > > including MOB feature backport in branch-1" started by Ted Yu on March
> > 3rd
> > > of this year) has additional rationale as to why that is.
> > >
> > > Cheers,
> > >
> > > On Saturday, August 20, 2016, Ascot Moss <ascot.m...@gmail.com
> > <javascript:;>
> > > <javascript:_e(%7B%7D,'cvml','ascot.m...@gmail.com <javascript:;>');>>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I want to use MOB in Hbase 1.2.2, can anyone advise the step to
> > backport
> > > > MOB to HBase 1.2.2?
> > > >
> > > > Regards
> > > >
> > >
> > >
> > > --
> > > -Dima
> > >
> >
>
>
> --
> -Dima
>



-- 
Thanks & Regards,
Anil Gupta

Re: Is it ok to store all integers as Strings instead of byte[] in hbase?

2016-07-08 Thread anil gupta

Hi Mahesha,

I think its not a good idea to store Numbers/Dates as String. If you store
numbers as strings then you wont be able to do numerical/date comparison.
HBase is Data Type Agnostic. IMO, you will be better off by using Apache
Phoenix(http://phoenix.apache.org/). Phoenix is a sql layer on top of
HBase. It is ANSI SQL compliant.

Currently Phoenix is officially supported by HDP and it is also present in
cloudera labs.

HTH,
Anil Gupta

On Fri, Jul 8, 2016 at 5:18 AM, Dima Spivak <dspi...@cloudera.com> wrote:

> Hey Mahesha,
>
> It might be worthwhile to read through the architecture section of our ref
> guide: https://hbase.apache.org/book.html#_architecture
>
> Cheers,
>   Dima
>
> On Friday, July 8, 2016, Mahesha999 <abnav...@gmail.com> wrote:
>
> > I am trying out some hbase code. I realised that when I insert data
> through
> > hbase shell using put command, then everything (both numeric and string)
> is
> > put as string:
> >
> > hbase(main):001:0> create 'employee', {NAME => 'f'}
> > hbase(main):003:0> put 'employee', 'ganesh','f:age',30
> > hbase(main):004:0> put 'employee', 'ganesh','f:desg','mngr'
> > hbase(main):005:0> scan 'employee'
> > ROW   COLUMN+CELL
> > ganesh   column=f:age, timestamp=1467926618738, value=30
> > ganesh   column=f:desg, timestamp=1467926639557, value=mngr
> >
> > However when I put data using Java API, non-string stuff gets serialized
> as
> > byte[]:
> >
> > Cluster lNodes = new Cluster();
> > lNodes.add("digitate-VirtualBox:8090");
> > Client lClient= new Client(lNodes);
> > RemoteHTable remoteht = new RemoteHTable(lClient, "employee");
> >
> > Put lPut = new Put(Bytes.toBytes("mahesh"));
> > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"), Bytes.toBytes(25));
> > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("desg"),
> Bytes.toBytes("dev"));
> > remoteht.put(lPut);
> >
> > Scan in hbase shell shows age 25 of mahesh is stored as \x00\x00\x00\x19:
> >
> > hbase(main):006:0> scan 'employee'
> > ROW   COLUMN+CELL
> > ganesh   column=f:age, timestamp=1467926618738, value=30
> > ganesh   column=f:desg, timestamp=1467926639557, value=mngr
> > mahesh   column=f:age, timestamp=1467926707712,
> > value=\x00\x00\x00\x19
> > mahesh   column=f:desg, timestamp=1467926707712, value=dev
> >
> > *1.* Considering I will be storing only numeric and string data in hbase,
> > what benefits it does provide to store numeric data as byte[] (as in case
> > of
> > above) or as string:
> > lPut.add(Bytes.toBytes("f"), Bytes.toBytes("age"), Bytes.toBytes("25"));
> > //instead of toBytes(25)
> >
> > *2.*Also why strings are stored as is and are not serialized to byte[]
> even
> > when put using Java API?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/Is-it-ok-to-store-all-integers-as-Strings-instead-of-byte-in-hbase-tp4081100.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Table/column layout

2016-06-11 Thread anil gupta

My 2 cents:

#1. HBase version timestamp is purely used for storing & purging historical
data on basis of TTL. If you try to build an app toying around timestamps
you might run into issues. So, you might need to be very careful with that.

#2. Usually HBase suggests that column name to be around 5-6 chars because
HBase store data as KV. But, its hard to keep on doing that in **real world
apps**. When you use block encoding/compression, the performance penalty of
wide columns is reduced. For example, Apache Phoenix uses Fast_Diff
encoding by default due to non-short column name.
Here is another blogpost that discuss perf of encoding/compression:
http://hadoop-hbase.blogspot.com/2016/02/hbase-compression-vs-blockencoding_17.html
I have been using user friendly column names(more readable rather than
short abbreviation) and i still get decent performance in my
apps.(Obviously, YMMV. My apps are performing within our SLA.)
In prod, I have a table that has 1100+ columns, column names are not short.
Hence, i would recommend you to go ahead with your non-short column naming.
You might need to try out different encoding/compression to see what
provides you best performance.

HTH,
Anil Gupta

On Fri, Jun 10, 2016 at 8:16 PM, Ken Hampson <hamps...@gmail.com> wrote:

> I realize that was probably a bit of a wall of text... =)
>
> So, TL;DR: I'm wondering:
> 1) If people have used and had good experiences with caller-specified
> version timestamps (esp. given the caveats in the HBase book doc re: issues
> with deletions and TTLs).
>
> 2) About suggestions for optimal column naming for potentially large
> numbers of different column groupings for very wide tables.
>
> Thanks,
> - Ken
>
> On Tue, Jun 7, 2016 at 10:52 PM Ken Hampson <hamps...@gmail.com> wrote:
>
> > Hi:
> >
> > I'm currently using HBase 1.1.2 and am in the process of determining how
> > best to proceed with the column layout for an upcoming expansion of our
> > data pipeline.
> >
> > Background:
> >
> > Table A: billions of rows, 1.3 TB (with snappy compression), rowkey is
> sha1
> > Table B: billions of rows (more than Table A), 1.8 TB (with snappy
> > compression), rowkey is sha1
> >
> >
> > These tables represent data obtained via a combination batch/streaming
> > process. We want to expand our data pipeline to run an assortment of
> > analyses on these tables (both batch and streaming) and be able to store
> > the results in each table as appropriate. Table A is a set of unique
> > entries with some example data, whereas Table B is correlated to Table A
> > (via Table A's sha1), but is not de-duplicated (that is to say, it
> contains
> > contextual data).
> >
> > For the expansion of the data pipeline, we want to store the data either
> > in Table A if context is not needed, and Table B if context is needed.
> > Since we have a theoretically unlimited number of different analyses that
> > we may want to perform and store the results for (that is to say, I need
> to
> > assume there will be a substantial number of data sets that need to be
> > stored in these tables, which will grow over time and could each
> themselves
> > potentially be somewhat wide in terms of columns).
> >
> > Originally, I had considered storing these in column families, where each
> > analysis is grouped together in a different column family. However, I
> have
> > read in the HBase book documentation that HBase does not  perform well
> with
> > many column families (a few default, ~10 max), so I have discarded this
> > option.
> >
> > The next two options both involve using wide tables with many columns in
> a
> > separate column family (e.g. "d"), where all the various analysis would
> be
> > grouped into the same family in a potentially wide amount of columns in
> > total. Each of these analyses needs to maintain their own versions so we
> > can correlate the data from each one. The variants which come to mind to
> > accomplish that, and on which I would appreciate some feedback on are:
> >
> >1. Use HBase's native versioning to store the version of the analysis
> >2. Encode a version in the column name itself
> >
> > I know the HBase native versions use the server's timestamp by default,
> > but can take any long value. So we could assign a particular time value
> to
> > be a version of a particular analysis. However, the doc also warned that
> > there could be negative ramifications of this because HBase uses the
> > versions internally for things like TTL for deletes/maintenance. Do
> people
> > use versions in this way? Are the TTL issues of great concern? (We li

Re: [ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm

2016-04-27 Thread anil gupta

Cool, Thanks. Let me send the talk proposal to higher management.

On Wed, Apr 27, 2016 at 8:16 AM, James Taylor <jamestay...@apache.org>
wrote:

> Yes, that sounds great - please let me know when I can add you to the
> agenda.
>
> James
>
> On Tuesday, April 26, 2016, Anil Gupta <anilgupt...@gmail.com> wrote:
>
> > Hi James,
> > I spoke to my manager and he is fine with the idea of giving the talk.
> > Now, he is gonna ask higher management for final approval. I am assuming
> > there is still a slot for my talk in use case srction. I should go ahead
> > with my approval process. Correct?
> >
> > Thanks,
> > Anil Gupta
> > Sent from my iPhone
> >
> > > On Apr 26, 2016, at 5:56 PM, James Taylor <jamestay...@apache.org
> > <javascript:;>> wrote:
> > >
> > > We invite you to attend the inaugural PhoenixCon on Wed, May 25th
> 9am-1pm
> > > (the day after HBaseCon) hosted by Salesforce.com in San Francisco.
> There
> > > will be two tracks: one for use cases and one for internals. Drop me a
> > note
> > > if you're interested in giving a talk. To RSVP and for more details,
> see
> > > here[1].
> > >
> > > Thanks,
> > > James
> > >
> > > [1]
> > http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: [ANNOUNCE] PhoenixCon 2016 on Wed, May 25th 9am-1pm

2016-04-26 Thread Anil Gupta

Hi James,
I spoke to my manager and he is fine with the idea of giving the talk. Now, he 
is gonna ask higher management for final approval. I am assuming there is still 
a slot for my talk in use case srction. I should go ahead with my approval 
process. Correct?

Thanks,
Anil Gupta 
Sent from my iPhone

> On Apr 26, 2016, at 5:56 PM, James Taylor <jamestay...@apache.org> wrote:
> 
> We invite you to attend the inaugural PhoenixCon on Wed, May 25th 9am-1pm
> (the day after HBaseCon) hosted by Salesforce.com in San Francisco. There
> will be two tracks: one for use cases and one for internals. Drop me a note
> if you're interested in giving a talk. To RSVP and for more details, see
> here[1].
> 
> Thanks,
> James
> 
> [1] http://www.meetup.com/SF-Bay-Area-Apache-Phoenix-Meetup/events/230545182

Re: is it a good idea to disable tables not currently hot?

2016-03-19 Thread anil gupta

ncer will be confused when regions come and go. And I cannot
> > > > afford not to have it running in case of region server crashes and
> > > > come back. So doesn’t anyone have good ideas how to handle it?
> > > >
> > > > I already doing compact myself so that is not an issue.
> > > >
> > > > Another related question, if a region is enabled but not active
> > > > read/write, how much resources it takes in terms of region server?
> > > >
> > > > Thanks!
> > > >
> > > > Frank Luo
> > > >
> > >
> > > Merkle was named a leader in Customer Insights Services Providers by
> > > Forrester Research <
> > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-
> > > ag
> > > ency/awards-recognition/merkle-named-leader-forrester?utm_source=ema
> > > il footer_medium=email_campaign=2016MonthlyEmployeeFooter
> > > >
> > >
> > > Forrester Research report names 500friends, a Merkle Company, a
> > > leader in customer Loyalty Solutions for Midsize Organizations<
> > > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-
> > > ag
> > > ency/awards-recognition/500friends-merkle-company-named?utm_source=e
> > > ma ilfooter_medium=email_campaign=2016MonthlyEmployeeFooter
> > > >
> > > This email and any attachments transmitted with it are intended for
> > > use by the intended recipient(s) only. If you have received this
> > > email in error, please notify the sender immediately and then delete
> > > it. If you are not the intended recipient, you must not keep, use,
> > > disclose, copy or distribute this email without the author’s prior
> permission.
> > > We take precautions to minimize the risk of transmitting software
> > > viruses, but we advise you to perform your own virus checks on any
> > > attachment to this message. We cannot accept liability for any loss
> > > or damage caused by software viruses. The information contained in
> > > this communication may be confidential and may be subject to the
> > attorney-client privilege.
> > >
> > Merkle was named a leader in Customer Insights Services Providers by
> > Forrester Research <
> > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-ag
> > ency/awards-recognition/merkle-named-leader-forrester?utm_source=email
> > footer_medium=email_campaign=2016MonthlyEmployeeFooter
> > >
> >
> > Forrester Research report names 500friends, a Merkle Company, a leader
> > in customer Loyalty Solutions for Midsize Organizations<
> > http://www.merkleinc.com/who-we-are-customer-relationship-marketing-ag
> > ency/awards-recognition/500friends-merkle-company-named?utm_source=ema
> > ilfooter_medium=email_campaign=2016MonthlyEmployeeFooter
> > >
> > This email and any attachments transmitted with it are intended for
> > use by the intended recipient(s) only. If you have received this email
> > in error, please notify the sender immediately and then delete it. If
> > you are not the intended recipient, you must not keep, use, disclose,
> > copy or distribute this email without the author’s prior permission.
> > We take precautions to minimize the risk of transmitting software
> > viruses, but we advise you to perform your own virus checks on any
> > attachment to this message. We cannot accept liability for any loss or
> > damage caused by software viruses. The information contained in this
> > communication may be confidential and may be subject to the
> attorney-client privilege.
> >
> Merkle was named a leader in Customer Insights Services Providers by
> Forrester Research
> <
> http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/merkle-named-leader-forrester?utm_source=emailfooter_medium=email_campaign=2016MonthlyEmployeeFooter
> >
>
> Forrester Research report names 500friends, a Merkle Company, a leader in
> customer Loyalty Solutions for Midsize Organizations<
> http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/500friends-merkle-company-named?utm_source=emailfooter_medium=email_campaign=2016MonthlyEmployeeFooter
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>



-- 
Thanks & Regards,
Anil Gupta

Re: Spark on Hbase

2016-03-10 Thread anil gupta

Apart from Phoenix Spark connector. You can also have a look at:
https://github.com/Huawei-Spark/Spark-SQL-on-HBase

On Wed, Mar 9, 2016 at 4:58 PM, Divya Gehlot <divya.htco...@gmail.com>
wrote:

> I agree with Talat
> As couldn't connect directly with Hbase
> Connecting it through Phoenix .
> If you are using Hortonworks distribution ,it comes with Phoenix.
>
> Thanks,
> Divya
> On Mar 10, 2016 3:04 AM, "Talat Uyarer" <ta...@uyarer.com> wrote:
>
> > Hi,
> >
> > Have you ever tried Apache phoenix ? They have spark solution[1]. I
> > have just started to use on spark. I haven't tried it with spark
> > streaming.
> >
> > [1] http://phoenix.apache.org/phoenix_spark.html
> >
> > 2016-03-08 22:04 GMT-08:00 Rachana Srivastava
> > <rachanasrivas...@yahoo.com.invalid>:
> > > I am trying to integrate SparkStreaming with HBase.  I am calling
> > following APIs to connect to HBase
> > >
> > > HConnection hbaseConnection =
> > HConnectionManager.createConnection(conf);hBaseTable =
> > hbaseConnection.getTable(hbaseTable);
> > > Since I cannot get the connection and broadcast the connection each API
> > call to get data from HBase is very expensive.  I tried using
> > JavaHBaseContext (JavaHBaseContext hbaseContext = new
> JavaHBaseContext(jsc,
> > conf)) by using hbase-spark library in CDH 5.5 but I cannot import the
> > library from maven.  Has anyone been able to successfully resolve this
> > issue.
> >
> >
> >
> > --
> > Talat UYARER
> > Websitesi: http://talat.uyarer.com
> > Twitter: http://twitter.com/talatuyarer
> > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Ruby gem of Apache Phoenix: https://rubygems.org/gems/ruby-phoenix/versions/0.0.8

2016-03-08 Thread anil gupta

My bad. 2nd time in a week i used wrong mailing list. Please ignore.

On Tue, Mar 8, 2016 at 5:34 PM, Sean Busbey <bus...@cloudera.com> wrote:

> Hi Anil!
>
> You should contact the Apache Phoenix community for this question.
>
> Details on subscribing to their user list can be found here:
>
> http://mail-archives.apache.org/mod_mbox/phoenix-user/
>
> On Tue, Mar 8, 2016 at 4:54 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Hi,
> >
> > One of our ruby apps might be using this ruby gem(
> > https://rubygems.org/gems/ruby-phoenix/versions/0.0.8) to query
> Phoenix. I
> > dont know programming in Ruby.
> > This gem is listing Phoenix4.2 as dependency. We are running Phoenix4.4.
> > So, i am curious to know whether we would be able to connect to
> Phoenix4.4
> > with a ruby gem of Phoenix4.2? If not, then what we would need to
> > do?(upgrade ruby gem to Phoenix4.4?)
> >
> > Here is the git: https://github.com/wxianfeng/ruby-phoenix
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
>
> --
> busbey
>



-- 
Thanks & Regards,
Anil Gupta

Re: Database browser tools for Phoenix on Mac

2016-03-08 Thread anil gupta

Oh my bad. I m on wrong mailing list. Didn't notice my mistake. Thanks for
the reminder, Stack.

On Tue, Mar 8, 2016 at 5:10 PM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 8, 2016 at 4:57 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Yeah, i have looked at that. Non-commercial only provides very basic
> > feature.
> > I have just tried DBeaver(http://dbeaver.jkiss.org/download/). Its based
> > on
> > Eclipse framework and its UI looks much better.
> > DBeaver supports Cassandra and MongoDB out of the box. It would be great
> if
> > it start supporting Phoenix out of the box.
> >
> >
> You pinged the Phoenix phellows Anil?
> St.Ack
>
>
>
> > On Sat, Mar 5, 2016 at 12:04 PM, Rohit Jain <rohit.j...@esgyn.com>
> wrote:
> >
> > > You probably already looked at dbVisualizer
> > >
> > > Rohit
> > >
> > > On Mar 5, 2016, at 1:25 PM, anil gupta <anilgupt...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I have been using SquirrelSql to query Phoenix. For oracle/sql server,
> i
> > > have been using SQLDeveloper.
> > > I feel like SquirrelSql has a lot of room for improvement when i
> compare
> > it
> > > with SQLDeveloper GUI.
> > >
> > >
> > > I tried to register Phoenix JDBC driver with SQLDeveloper, but i
> haven't
> > > been successful. Has anyone being successful.
> > >
> > > I would like to know what other Database browser tools people are using
> > to
> > > connect.
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> > > PS: I would prefer to use Database browser tools to query a database
> that
> > > itself has Apache License. :)
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Database browser tools for Phoenix on Mac

2016-03-08 Thread anil gupta

Yeah, i have looked at that. Non-commercial only provides very basic
feature.
I have just tried DBeaver(http://dbeaver.jkiss.org/download/). Its based on
Eclipse framework and its UI looks much better.
DBeaver supports Cassandra and MongoDB out of the box. It would be great if
it start supporting Phoenix out of the box.

On Sat, Mar 5, 2016 at 12:04 PM, Rohit Jain <rohit.j...@esgyn.com> wrote:

> You probably already looked at dbVisualizer
>
> Rohit
>
> On Mar 5, 2016, at 1:25 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> Hi,
>
> I have been using SquirrelSql to query Phoenix. For oracle/sql server, i
> have been using SQLDeveloper.
> I feel like SquirrelSql has a lot of room for improvement when i compare it
> with SQLDeveloper GUI.
>
>
> I tried to register Phoenix JDBC driver with SQLDeveloper, but i haven't
> been successful. Has anyone being successful.
>
> I would like to know what other Database browser tools people are using to
> connect.
>
> --
> Thanks & Regards,
> Anil Gupta
>
> PS: I would prefer to use Database browser tools to query a database that
> itself has Apache License. :)
>

-- 
Thanks & Regards,
Anil Gupta

Ruby gem of Apache Phoenix: https://rubygems.org/gems/ruby-phoenix/versions/0.0.8

2016-03-08 Thread anil gupta

Hi,

One of our ruby apps might be using this ruby gem(
https://rubygems.org/gems/ruby-phoenix/versions/0.0.8) to query Phoenix. I
dont know programming in Ruby.
This gem is listing Phoenix4.2 as dependency. We are running Phoenix4.4.
So, i am curious to know whether we would be able to connect to Phoenix4.4
with a ruby gem of Phoenix4.2? If not, then what we would need to
do?(upgrade ruby gem to Phoenix4.4?)

Here is the git: https://github.com/wxianfeng/ruby-phoenix
-- 
Thanks & Regards,
Anil Gupta

Database browser tools for Phoenix on Mac

2016-03-05 Thread anil gupta

Hi,

I have been using SquirrelSql to query Phoenix. For oracle/sql server, i
have been using SQLDeveloper.
I feel like SquirrelSql has a lot of room for improvement when i compare it
with SQLDeveloper GUI.


I tried to register Phoenix JDBC driver with SQLDeveloper, but i haven't
been successful. Has anyone being successful.

I would like to know what other Database browser tools people are using to
connect.

-- 
Thanks & Regards,
Anil Gupta

PS: I would prefer to use Database browser tools to query a database that
itself has Apache License. :)

Re: Calling Coprocessor via HBase Thrift or RestService

2016-02-28 Thread anil gupta

Also came across this: https://issues.apache.org/jira/browse/HBASE-6790
HBASE-6790 is also unresolved.

On Sun, Feb 28, 2016 at 10:26 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi,
>
> A non java app would like to use AggregateImplementation(
> https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html
> )
> Is it possible to use HBase Thrift gateway or Stargate(Rest gateway) to
> make calls to AggregateImplementation coprocessor? If yes, can you also
> tell me how to make calls.
> I came across this: https://issues.apache.org/jira/browse/HBASE-5600 .
> But, its unresolved.
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Calling Coprocessor via HBase Thrift or RestService

2016-02-28 Thread anil gupta

Hi,

A non java app would like to use AggregateImplementation(
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.html
)
Is it possible to use HBase Thrift gateway or Stargate(Rest gateway) to
make calls to AggregateImplementation coprocessor? If yes, can you also
tell me how to make calls.
I came across this: https://issues.apache.org/jira/browse/HBASE-5600 . But,
its unresolved.

-- 
Thanks & Regards,
Anil Gupta

Re: Two questions about the maximum number of versions of a column family

2016-02-22 Thread Anil Gupta

If its possible to make the timestamps as a suffix of your rowkey(assuming the 
rowkey is composite) then you would not run into read/write hotspots. 
Have a look at open tsdb data model that scales really really well.

Sent from my iPhone

> On Feb 21, 2016, at 10:28 AM, Stephen Durfey  wrote:
> 
> I personally don't deal with time series data, so I'm not going to make a 
> statement on which is better. I would think from a scanning viewpoint putting 
> the time stamp in the row key is easier, but that will introduce scanning 
> performance bottlenecks due to the row keys being stored lexicographically. 
> All data from the same date range will end up in the same region or regions 
> (this is causes hot spots) reducing the number of tasks you get for reads, 
> thus increasing extraction time. 
> One method to deal with this is salting your row keys to get an even 
> distribution of data around the cluster. Cloudera recently had a good post 
> about this on their blog: 
> http://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key-ranges-in-mapreduce/
> 
> 
> 
> 
> 
> On Sun, Feb 21, 2016 at 9:47 AM -0800, "Daniel"  wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks for your sharing, Stephen and Ted. The reference guide recommends 
> "rows" over "versions" concerning time series data. Are there advantages of 
> using "reversed timestamps" in row keys over the built-in "versions" with 
> regard to scanning performance?
> 
> -- Original --
> From: "Ted Yu"
> Date: Mon, Feb 22, 2016 01:02 AM
> To: "user@hbase.apache.org";
> Subject: Re: Two questions about the maximum number of versions of a column 
> family
> 
> 
> Thanks for sharing, Stephen.
> 
> bq. scan performance on the region servers needing to scan over all that
> data you may not need
> 
> When number of versions is large, try to utilize Filters (where
> appropriate) which implements:
> 
>  public Cell getNextCellHint(Cell currentKV) {
> 
> See MultiRowRangeFilter for example.
> 
> 
> Please see hbase-shell/src/main/ruby/shell/commands/alter.rb for syntax on
> how to alter table. When "hbase.online.schema.update.enable" is true, table
> can stay online during the change.
> 
> Cheers
> 
>> On Sun, Feb 21, 2016 at 8:20 AM, Stephen Durfey  wrote:
>> 
>> Someone please correct me if I am wrong.
>> I've looked into this recently due to some performance reasons with my
>> tables in a production environment. Like the books says, I don't recommend
>> keeping this many versions around unless you really need them. Telling
>> HBase to keep around a very large number doesn't waste space, that's just a
>> value in the table descriptor. So, I wouldn't worry about that. The
>> problems are going to come in when you actually write out those versions.
>> My tables currently have max_versions set and roughly 40% of the tables
>> are due to historical versions. So, one table in particular is around 25
>> TB. I don't have a need to keep this many versions, so I am working on
>> changing the max versions to the default of 3 (some cells are hundreds or
>> thousands of cells deep). The issue youll run into is scan performance on
>> the region servers needing to scan over all that data you may not need (due
>> to large store files). This could lead to increased scan time and
>> potentially scanner timeouts, depending upon how large your batch size is
>> set on the scan.
>> I assume this has some performance impact on compactions, both minor and
>> major, but I didn't investigate that, and potentially on the write path,
>> but also not something I looked into.
>> Changing the number of versions after the table has been created doesn't
>> have a performance impact due to just being a metadata change. The table
>> will need to be disabled, changed, and re-enabled again. If this is done
>> through a script the table could be offline for a couple of seconds. The
>> only concern around that are users of the table. If they have scheduled job
>> runs that hit that table that would break if they try to read from it while
>> the table is disabled. The only performance impact I can think of around
>> this change would be major compaction of the table, but even that shouldn't
>> be an issue.
>> 
>> 
>>_
>> From: Daniel 
>> Sent: Sunday, February 21, 2016 9:22 AM
>> Subject: Two questions about the maximum number of versions of a column
>> family
>> To: user 
>> 
>> 
>> Hi, I have two questions about the maximum number of versions of a column
>> family:
>> 
>> (1) Is it OK to set a very large (>100,000) maximum number of versions for
>> a column family?
>> 
>> The reference guide says "It is not recommended setting the number of max
>> versions to an exceedingly high level (e.g., hundreds or more) unless those
>> old values are very dear to you because this will greatly increase
>> StoreFile size." (Chapter 36.1)
>> 
>> I'm new to the Hadoop

Re: Rename tables or swap alias

2016-02-15 Thread Anil Gupta

I dont think there is any atomic operations in hbase to support ddl across 2 
tables.

But, maybe you can use hbase snapshots.
1.Create a hbase snapshot.
2.Truncate the table.
3.Write data to the table.
4.Create a table from snapshot taken in step #1 as table_old.

Now you have two tables. One with current run data and other with last run data.
I think above process will suffice. But, keep in mind that it is not atomic.

HTH,
Anil
Sent from my iPhone

> On Feb 15, 2016, at 4:25 PM, Pat Ferrel  wrote:
> 
> Any other way to do what I was asking. With Spark this is a very normal thing 
> to treat a table as immutable and create another to replace the old.
> 
> Can you lock two tables and rename them in 2 actions then unlock in a very 
> short period of time?
> 
> Or an alias for table names?
> 
> Didn’t see these in any docs or Googling, any help is appreciated. Writing 
> all this data back to the original table would be a huge load on a table 
> being written to by external processes and therefore under large load to 
> begin with.
> 
>> On Feb 14, 2016, at 5:03 PM, Ted Yu  wrote:
>> 
>> There is currently no native support for renaming two tables in one atomic
>> action.
>> 
>> FYI
>> 
>>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel  wrote:
>>> 
>>> I use Spark to take an old table, clean it up to create an RDD of cleaned
>>> data. What I’d like to do is write all of the data to a new table in HBase,
>>> then rename the table to the old name. If possible it could be done by
>>> changing an alias to point to the new table as long as all external code
>>> uses the alias, or by a 2 table rename operation. But I don’t see how to do
>>> this for HBase. I am dealing with a lot of data so don’t want to do table
>>> modifications with deletes and upserts, this would be incredibly slow.
>>> Furthermore I don’t want to disable the table for more than a tiny span of
>>> time.
>>> 
>>> Is it possible to have 2 tables and rename both in an atomic action, or
>>> change some alias to point to the new table in an atomic action. If not
>>> what is the quickest way to achieve this to minimize time disabled.
>

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread anil gupta

I figured out the problem. We have phoenix.upsert.batch.size set to 10 in
hbase-site.xml but somehow that property is **not getting picked up in our
oozie workflow**
When i am explicitly setting phoenix.upsert.batch.size property in my oozie
workflow then my job ran successfully.

By default, phoenix.upsert.batch.size is 1000. Hence, the commits were
failing with a huge batch size of 1000.

Thanks,
Anil Gupta


On Sun, Feb 14, 2016 at 8:03 PM, Heng Chen <heng.chen.1...@gmail.com> wrote:

> I am not sure whether "upsert batch size in phoenix" equals HBase Client
> batch puts size or not.
>
> But as log shows, it seems there are 2000 actions send to hbase one time.
>
> 2016-02-15 11:38 GMT+08:00 anil gupta <anilgupt...@gmail.com>:
>
>> My phoenix upsert batch size is 50. You mean to say that 50 is also a lot?
>>
>> However, AsyncProcess is complaining about 2000 actions.
>>
>> I tried with upsert batch size of 5 also. But it didnt help.
>>
>> On Sun, Feb 14, 2016 at 7:37 PM, anil gupta <anilgupt...@gmail.com>
>> wrote:
>>
>> > My phoenix upsert batch size is 50. You mean to say that 50 is also a
>> lot?
>> >
>> > However, AsyncProcess is complaining about 2000 actions.
>> >
>> > I tried with upsert batch size of 5 also. But it didnt help.
>> >
>> >
>> > On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com>
>> > wrote:
>> >
>> >> 2016-02-14 12:34:23,593 INFO [main]
>> >> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions to finish
>> >>
>> >> It means your writes are too many,  please decrease the batch size of
>> your
>> >> puts,  and balance your requests on each RS.
>> >>
>> >> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>:
>> >>
>> >> > After a while we also get this error:
>> >> > 2016-02-14 12:45:10,515 WARN [main]
>> >> > org.apache.phoenix.execute.MutationState: Swallowing exception and
>> >> > retrying after clearing meta cache on connection.
>> >> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached
>> index
>> >> > metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
>> >> > cached index metadata.  key=-594230549321118802
>> >> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7.
>> Index
>> >> > update failed
>> >> >
>> >> > We have already set:
>> >> >
>> >> >
>> >>
>> phoenix.coprocessor.maxServerCacheTimeToLiveMs18
>> >> >
>> >> > Upset batch size is 50. Write are quite frequent so the cache would
>> >> > not timeout in 18ms
>> >> >
>> >> >
>> >> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4).
>> >> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on
>> >> > failing
>> >> > > due to following error:
>> >> > >
>> >> > > 2016-02-14 12:29:43,182 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:29:53,197 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:30:03,212 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:30:13,225 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:30:23,239 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:30:33,253 INFO [main]
>> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> >> actions
>> >> > to finish
>> >> > > 2016-02-14 12:30:43,266 INFO [main]
>> >> > org.apache.had

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread anil gupta

My phoenix upsert batch size is 50. You mean to say that 50 is also a lot?

However, AsyncProcess is complaining about 2000 actions.

I tried with upsert batch size of 5 also. But it didnt help.

On Sun, Feb 14, 2016 at 7:37 PM, anil gupta <anilgupt...@gmail.com> wrote:

> My phoenix upsert batch size is 50. You mean to say that 50 is also a lot?
>
> However, AsyncProcess is complaining about 2000 actions.
>
> I tried with upsert batch size of 5 also. But it didnt help.
>
>
> On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
>> 2016-02-14 12:34:23,593 INFO [main]
>> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions to finish
>>
>> It means your writes are too many,  please decrease the batch size of your
>> puts,  and balance your requests on each RS.
>>
>> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>:
>>
>> > After a while we also get this error:
>> > 2016-02-14 12:45:10,515 WARN [main]
>> > org.apache.phoenix.execute.MutationState: Swallowing exception and
>> > retrying after clearing meta cache on connection.
>> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index
>> > metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
>> > cached index metadata.  key=-594230549321118802
>> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index
>> > update failed
>> >
>> > We have already set:
>> >
>> >
>> phoenix.coprocessor.maxServerCacheTimeToLiveMs18
>> >
>> > Upset batch size is 50. Write are quite frequent so the cache would
>> > not timeout in 18ms
>> >
>> >
>> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4).
>> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on
>> > failing
>> > > due to following error:
>> > >
>> > > 2016-02-14 12:29:43,182 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:29:53,197 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:03,212 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:13,225 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:23,239 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:33,253 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:43,266 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:30:53,279 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:03,293 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:13,305 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:23,318 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:33,331 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:43,345 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:31:53,358 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:32:03,371 INFO [main]
>> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
>> actions
>> > to finish
>> > > 2016-02-14 12:32:13,385 INFO [main]
>> > org.apa

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread anil gupta

My phoenix upsert batch size is 50. You mean to say that 50 is also a lot?

However, AsyncProcess is complaining about 2000 actions.

I tried with upsert batch size of 5 also. But it didnt help.


On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com> wrote:

> 2016-02-14 12:34:23,593 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions to finish
>
> It means your writes are too many,  please decrease the batch size of your
> puts,  and balance your requests on each RS.
>
> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>:
>
> > After a while we also get this error:
> > 2016-02-14 12:45:10,515 WARN [main]
> > org.apache.phoenix.execute.MutationState: Swallowing exception and
> > retrying after clearing meta cache on connection.
> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index
> > metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
> > cached index metadata.  key=-594230549321118802
> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index
> > update failed
> >
> > We have already set:
> >
> >
> phoenix.coprocessor.maxServerCacheTimeToLiveMs18
> >
> > Upset batch size is 50. Write are quite frequent so the cache would
> > not timeout in 18ms
> >
> >
> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4).
> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on
> > failing
> > > due to following error:
> > >
> > > 2016-02-14 12:29:43,182 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:29:53,197 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:03,212 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:13,225 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:23,239 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:33,253 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:43,266 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:30:53,279 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:03,293 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:13,305 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:23,318 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:33,331 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:43,345 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:31:53,358 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:03,371 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:13,385 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:23,399 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:33,412 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:43,428 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> > to finish
> > > 2016-02-14 12:32:53,443 INFO [main]
> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> actions
> >

org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread anil gupta

Hi,

We are using phoenix4.4, hbase 1.1(hdp2.3.4).
I have a MR job that is using PhoenixOutputFormat. My job keeps on failing
due to following error:

2016-02-14 12:29:43,182 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:29:53,197 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:03,212 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:13,225 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:23,239 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:33,253 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:43,266 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:30:53,279 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:03,293 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:13,305 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:23,318 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:33,331 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:43,345 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:31:53,358 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:03,371 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:13,385 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:23,399 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:33,412 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:43,428 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:32:53,443 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:03,457 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:13,472 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:23,486 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:33,524 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:43,538 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:33:53,551 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:34:03,565 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:34:03,953 INFO [hconnection-0xe82ca6e-shared--pool2-t16]
org.apache.hadoop.hbase.client.AsyncProcess: #1, table=BI.SALES,
attempt=10/35 failed=2000ops, last exception: null on
hdp3.truecar.com,16020,1455326291512, tracking started null, retrying
after=10086ms, replay=2000ops
2016-02-14 12:34:13,578 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish
2016-02-14 12:34:23,593 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish

I have never seen anything like this. Can anyone give me pointers about
this problem?

-- 
Thanks & Regards,
Anil Gupta

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread anil gupta

After a while we also get this error:
2016-02-14 12:45:10,515 WARN [main]
org.apache.phoenix.execute.MutationState: Swallowing exception and
retrying after clearing meta cache on connection.
java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index
metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
cached index metadata.  key=-594230549321118802
region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index
update failed

We have already set:
phoenix.coprocessor.maxServerCacheTimeToLiveMs18

Upset batch size is 50. Write are quite frequent so the cache would
not timeout in 18ms


On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi,
>
> We are using phoenix4.4, hbase 1.1(hdp2.3.4).
> I have a MR job that is using PhoenixOutputFormat. My job keeps on failing
> due to following error:
>
> 2016-02-14 12:29:43,182 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:29:53,197 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:03,212 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:13,225 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:23,239 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:33,253 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:43,266 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:30:53,279 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:03,293 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:13,305 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:23,318 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:33,331 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:43,345 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:31:53,358 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:03,371 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:13,385 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:23,399 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:33,412 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:43,428 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:32:53,443 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:03,457 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:13,472 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:23,486 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:33,524 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:43,538 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:33:53,551 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:34:03,565 INFO [main] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions to 
> finish
> 2016-02-14 12:34:03,953 INFO [hconnection-0xe82ca6e-shared--pool2-t16] 
> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=BI.SALES, 
> attempt=10/35 failed=2000ops, last exception: null on 
> hdp3.truecar.com,16020,1455326291512, tracking started null, retrying 
> after=10086ms, replay=2000ops
> 2016-02-14 12:34:13,578 INFO [main] 
> org.

Re: Java API vs Hbase Thrift

2016-01-23 Thread Anil Gupta

You are not gonna gain much by using Rest service of HBase. You need to use 
native java api of HBase for gaining performance.
Similar to thrift, Rest service also has an extra hop.

Sent from my iPhone

> On Jan 21, 2016, at 1:03 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com> 
> wrote:
> 
> Hi,
> 
>   As you all said I have tried Rest web service using Hbase Java API to
> get data from Hbase table but it seems to be slower than that of one using
> Hbase thrift server.
> 
>  can any one tell how ?
> 
> Thanks
> 
>> On Sat, Jan 16, 2016 at 5:41 PM, Zheng Shen <zhengshe...@outlook.com> wrote:
>> 
>> Java API is at least 10 times faster than thrift on Hbase write
>> operations  based on my experience in production environment (cloudera
>> 5.4.7, hbase 1.0.0)
>> 
>> Zheng
>> 
>> ---Original---
>> From: "Vladimir Rodionov "<vladrodio...@gmail.com>
>> Date: 2016/1/15 06:31:34
>> To: "user@hbase.apache.org"<user@hbase.apache.org>;
>> Subject: Re: Java API vs Hbase Thrift
>> 
>> 
>>>> I have to access hbase using Java API will it be fast like thrift.
>> 
>> Bear in mind that when you use Thrift Gateway/Thrift API you access HBase
>> RegionServer through the single gateway server,
>> when you use Java API - you access Region Server directly.
>> Java API is much more scalable.
>> 
>> -Vlad
>> 
>>> On Tue, Jan 12, 2016 at 7:36 AM, Anil Gupta <anilgupt...@gmail.com> wrote:
>>> 
>>> Java api should be same or better in performance as compared to Thrift
>> api.
>>> With Thrift api there is an extra hop. So, most of the time java api
>> would
>>> be better for performance.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Jan 12, 2016, at 4:29 AM, Rajeshkumar J <
>> rajeshkumarit8...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am currently accessing records via Hbase thrift server and it is
>> fast.
>>>> If I have to access hbase using Java API will it be fast like thrift.
>>>> 
>>>> Thanks
>> 
>>

Re: Run hbase shell script from java

2016-01-13 Thread anil gupta

Hey Serega,

Have you tried using Java API of HBase to create table? IMO, invoking a
shell script from java program to create a table might not be the most
elegant way.
Have a look at
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html

HTH,
Anil Gupta

On Wed, Jan 13, 2016 at 1:30 PM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Hi, is there any easy way/example/howto to run 'create table ' shell script
> from java?
> Usecase: I'm tired to write table DDL in shell script and in Java for
> integration testing. I want to run shell script table DDL from java.
> Thanks!
>

-- 
Thanks & Regards,
Anil Gupta

Re: Java API vs Hbase Thrift

2016-01-12 Thread Anil Gupta

Java api should be same or better in performance as compared to Thrift api. 
With Thrift api there is an extra hop. So, most of the time java api would be 
better for performance.

Sent from my iPhone

> On Jan 12, 2016, at 4:29 AM, Rajeshkumar J  
> wrote:
> 
> Hi,
> 
>  I am currently accessing records via Hbase thrift server and it is fast.
> If I have to access hbase using Java API will it be fast like thrift.
> 
> Thanks

Re: Type of Scan to be used for real time analysis

2015-12-18 Thread anil gupta

Hi RajeshKumar,

IMO, type of scan is not decided on the basis of response time. Its decided
on the basis of your query logic and data model.
Also, Response time cannot be directly correlated to any filter or scan.
Response time is more about how much data needs to read, cpu, network IO,
etc to suffice requirement of your query.
So, you will need to look at your data model and pick the best query.

HTH,
Anil

On Thu, Dec 17, 2015 at 10:17 PM, Rajeshkumar J <rajeshkumarit8...@gmail.com
> wrote:

> Hi,
>
>My hbase table holds 10 million rows and I need to query it and I want
> hbase to return the query within one or two seconds. Help me to choose
> which type of scan do I have to use for this - range scan or rowfilter scan
>
> Thanks
>

-- 
Thanks & Regards,
Anil Gupta

Re: Type of Scan to be used for real time analysis

2015-12-18 Thread anil gupta

If you know exact rowkey of row that you need to fetch then you just need
to use GET. If you know just the prefix of rowkey, then you can use range
scans in HBase. Does the above 2 scenario's cover your use case?

On Fri, Dec 18, 2015 at 4:29 AM, Rajeshkumar J <rajeshkumarit8...@gmail.com>
wrote:

> Hi Anil,
>
>I have about 10 million rows with each rows having more than 10k
> columns. I need to query this table based on row key and which will be the
> apt query process for this
>
> Thanks
>
> On Fri, Dec 18, 2015 at 5:43 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > Hi RajeshKumar,
> >
> > IMO, type of scan is not decided on the basis of response time. Its
> decided
> > on the basis of your query logic and data model.
> > Also, Response time cannot be directly correlated to any filter or scan.
> > Response time is more about how much data needs to read, cpu, network IO,
> > etc to suffice requirement of your query.
> > So, you will need to look at your data model and pick the best query.
> >
> > HTH,
> > Anil
> >
> > On Thu, Dec 17, 2015 at 10:17 PM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > >My hbase table holds 10 million rows and I need to query it and I
> want
> > > hbase to return the query within one or two seconds. Help me to choose
> > > which type of scan do I have to use for this - range scan or rowfilter
> > scan
> > >
> > > Thanks
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Replicating only One column column family of HBase table

2015-10-29 Thread anil gupta

Hi,

We have a requirement in which we want to replicate only one CF of a table
whereas that table has 2 CF.

I believe, its possible because replication_scope is set on CF level(in my
case, i'll set replication_scope=1 on only one CF). Unfortunately, i dont
have access to infrastructure to test this hypothesis.
So, i would like to confirm this on mailing list. Please let me know.

-- 
Thanks & Regards,
Anil Gupta

Re: Replicating only One column column family of HBase table

2015-10-29 Thread anil gupta

Hi Ted,

So, as per the jira, answer to my question is YES.
 We are running HDP2.3.0. That jira got fixed in 0.98.1. So, we should be
fine.

Thanks,
Anil Gupta

On Thu, Oct 29, 2015 at 12:27 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Please take a look at:
> https://issues.apache.org/jira/browse/HBASE-8751
>
> On Thu, Oct 29, 2015 at 11:33 AM, anil gupta <anilgupt...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We have a requirement in which we want to replicate only one CF of a
> table
> > whereas that table has 2 CF.
> >
> > I believe, its possible because replication_scope is set on CF level(in
> my
> > case, i'll set replication_scope=1 on only one CF). Unfortunately, i dont
> > have access to infrastructure to test this hypothesis.
> > So, i would like to confirm this on mailing list. Please let me know.
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Replicating only One column column family of HBase table

2015-10-29 Thread anil gupta

Update: We tried and it worked.

On Thu, Oct 29, 2015 at 1:24 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi Ted,
>
> So, as per the jira, answer to my question is YES.
>  We are running HDP2.3.0. That jira got fixed in 0.98.1. So, we should be
> fine.
>
> Thanks,
> Anil Gupta
>
> On Thu, Oct 29, 2015 at 12:27 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Please take a look at:
>> https://issues.apache.org/jira/browse/HBASE-8751
>>
>> On Thu, Oct 29, 2015 at 11:33 AM, anil gupta <anilgupt...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > We have a requirement in which we want to replicate only one CF of a
>> table
>> > whereas that table has 2 CF.
>> >
>> > I believe, its possible because replication_scope is set on CF level(in
>> my
>> > case, i'll set replication_scope=1 on only one CF). Unfortunately, i
>> dont
>> > have access to infrastructure to test this hypothesis.
>> > So, i would like to confirm this on mailing list. Please let me know.
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: Opinions wanted: new site skin

2015-10-28 Thread anil gupta

Hi,

Sample website does not looks good on Iphone6. Its content is unreadable
since page layout is not using width of iphone screen.

Thanks,
Anil

On Tue, Oct 27, 2015 at 6:29 PM, Misty Stanley-Jones <
mstanleyjo...@cloudera.com> wrote:

> If you looked right away, please look again. I didn't realize that a weird
> font was being used from Google Fonts, because it was not loading locally
> for me. That's been fixed now and a more normal readable font (in my
> opinion) is being used.
>
> On Wed, Oct 28, 2015 at 10:03 AM, Misty Stanley-Jones <
> mstanleyjo...@cloudera.com> wrote:
>
> > All,
> >
> > Here is another version for your consideration. Please check it out at
> > different resolutions and browser sizes if you can.
> > http://mstanleyjones.github.io/hbase/reflow_update/index.html
> >
> > If you go to
> > http://mstanleyjones.github.io/hbase/reflow_update/dependency-info.html
> > and a few other parts of the site, you will notice the built-in syntax
> > highlighting.
> >
> > This version does not have a site search, and I have no clue how to add
> > the Hadoop site search, Stack. Maybe that can be a phase 2 where someone
> > smarter can help me figure it out.
> >
> > Thanks for your help,
> > Misty
> >
> > On Fri, Oct 23, 2015 at 3:17 PM, Misty Stanley-Jones <
> > mstanleyjo...@cloudera.com> wrote:
> >
> >> Hi all,
> >>
> >> We are currently using the reFlow Maven site skin. I went looking around
> >> and found Fluido, which seems to be a bit more extensible. I built and
> >> staged a version of the site at
> >> http://mstanleyjones.github.io/hbase/index.html. Note the Github ribbon
> >> and the Google site search. I'm curious to know what you think.
> >>
> >> I also put the 0.94 docs menu as a submenu of the Documentation menu, to
> >> see how it looked.
> >>
> >> Thanks,
> >> Misty
> >>
> >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Opinions wanted: new site skin

2015-10-28 Thread Anil Gupta

Here u go:



Sent from my iPhone

> On Oct 28, 2015, at 3:40 PM, Misty Stanley-Jones <mstanleyjo...@cloudera.com> 
> wrote:
> 
> You're looking at the wrong staged site. Please look at the one in the
> reflow_update/ directory.
> 
>> On Oct 29, 2015, at 8:38 AM, Andrew Purtell <apurt...@apache.org> wrote:
>> 
>> Can we remove the "fork me on GitHub banner"? We're not currently accepting
>> pull requests. Remove this and I'll be +1. Until then -1, although
>> otherwise it looks great.
>> 
>> 
>>> On Wed, Oct 28, 2015 at 2:54 PM, Elliott Clark <ecl...@apache.org> wrote:
>>> 
>>> Looks great with the white. +1
>>> 
>>> On Wed, Oct 28, 2015 at 2:52 PM, Misty Stanley-Jones <
>>> mstanleyjo...@cloudera.com> wrote:
>>> 
>>>> The grey background was inadvertent and has now been changed to white, if
>>>> you refresh.
>>>> 
>>>> Please click around and try the menus etc, as well.
>>>> 
>>>> By the way, I know that the docs don't look great on a mobile phone, but
>>>> that's a totally different issue to solve, not related to the Maven site
>>>> styling.
>>>> 
>>>>> On Thu, Oct 29, 2015 at 4:13 AM, Stack <st...@duboce.net> wrote:
>>>>> 
>>>>> It looks lovely on a nexus (smile).
>>>>> 
>>>>> Site looks good to me. Not sure about background light grey but all the
>>>>> rest I like.
>>>>> 
>>>>> St.Ack
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Oct 28, 2015 at 11:08 AM, anil gupta <anilgupt...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sample website does not looks good on Iphone6. Its content is
>>>> unreadable
>>>>>> since page layout is not using width of iphone screen.
>>>>>> 
>>>>>> Thanks,
>>>>>> Anil
>>>>>> 
>>>>>> On Tue, Oct 27, 2015 at 6:29 PM, Misty Stanley-Jones <
>>>>>> mstanleyjo...@cloudera.com> wrote:
>>>>>> 
>>>>>>> If you looked right away, please look again. I didn't realize that
>>> a
>>>>>> weird
>>>>>>> font was being used from Google Fonts, because it was not loading
>>>>> locally
>>>>>>> for me. That's been fixed now and a more normal readable font (in
>>> my
>>>>>>> opinion) is being used.
>>>>>>> 
>>>>>>> On Wed, Oct 28, 2015 at 10:03 AM, Misty Stanley-Jones <
>>>>>>> mstanleyjo...@cloudera.com> wrote:
>>>>>>> 
>>>>>>>> All,
>>>>>>>> 
>>>>>>>> Here is another version for your consideration. Please check it
>>> out
>>>>> at
>>>>>>>> different resolutions and browser sizes if you can.
>>>>>>>> http://mstanleyjones.github.io/hbase/reflow_update/index.html
>>>>>>>> 
>>>>>>>> If you go to
>>>> http://mstanleyjones.github.io/hbase/reflow_update/dependency-info.html
>>>>>>>> and a few other parts of the site, you will notice the built-in
>>>>> syntax
>>>>>>>> highlighting.
>>>>>>>> 
>>>>>>>> This version does not have a site search, and I have no clue how
>>> to
>>>>> add
>>>>>>>> the Hadoop site search, Stack. Maybe that can be a phase 2 where
>>>>>> someone
>>>>>>>> smarter can help me figure it out.
>>>>>>>> 
>>>>>>>> Thanks for your help,
>>>>>>>> Misty
>>>>>>>> 
>>>>>>>> On Fri, Oct 23, 2015 at 3:17 PM, Misty Stanley-Jones <
>>>>>>>> mstanleyjo...@cloudera.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> We are currently using the reFlow Maven site skin. I went
>>> looking
>>>>>> around
>>>>>>>>> and found Fluido, which seems to be a bit more extensible. I
>>> built
>>>>> and
>>>>>>>>> staged a version of the site at
>>>>>>>>> http://mstanleyjones.github.io/hbase/index.html. Note the
>>> Github
>>>>>> ribbon
>>>>>>>>> and the Google site search. I'm curious to know what you think.
>>>>>>>>> 
>>>>>>>>> I also put the 0.94 docs menu as a submenu of the Documentation
>>>>> menu,
>>>>>> to
>>>>>>>>> see how it looked.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Misty
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>  - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)

Re: start_replication command not available in hbase shell in HBase0.98

2015-10-27 Thread anil gupta

Hi Ashish,

Sorry for such a late reply.

We had a "-" in peer name so we ran into
https://issues.apache.org/jira/browse/HBASE-11394.
Thanks for offering help.

~Anil

On Tue, Oct 13, 2015 at 8:40 PM, Ashish Singhi <
ashish.singhi.apa...@gmail.com> wrote:

> Hi Anil.
>
> I did not check this in 0.98.
> By default when ever we add a peer, its state will be ENABLED.
>
> There is no child node for peer-state so its 'ls' output will be empty, you
> can use ZK 'get' command to find its value but the output will not be in
> human readable format.
>
> To check the peer-state value you can use zk_dump command in hbase shell or
> from web UI.
>
> Did you find any errors in the RS logs for replication ?
>
> Regards,
> Ashish Singhi
>
> On Wed, Oct 14, 2015 at 5:04 AM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > I found that those command are deprecated as per this Jira:
> > https://issues.apache.org/jira/browse/HBASE-8861
> >
> > Still, after enabling peers the replication is not starting. We looked
> into
> > zk. Its peer state value is null/blank:
> > zknode:  ls /hbase-unsecure/replication/peers/prod-hbase/peer-state
> > []
> >
> > Can anyone tell me what is probably going on?
> >
> > On Tue, Oct 13, 2015 at 3:56 PM, anil gupta <anilgupt...@gmail.com>
> wrote:
> >
> > > Hi All,
> > >
> > > I am using HBase 0.98(HDP2.2).
> > > As per the documentation here:
> > >
> > >
> >
> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html
> > >
> > > I am trying to run start_replication command. But, i m getting
> following
> > > error:
> > > hbase(main):013:0> start_replication
> > > NameError: undefined local variable or method `start_replication' for
> > > #
> > >
> > > Is start_replication not a valid command in HBase0.98? If its
> deprecated
> > > then what is the alternate command?
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Re: transfer data from hbase0.98 to hbase1.1.0 using exportSnapShot

2015-10-17 Thread anil gupta

Hi,
As far as i know, export snapshot from 0.98 ->1.0 should work.
Maybe, you can verify this by creating a test table, putting couple of rows
in it, export a snapshot of that table, and clone exported snapshot on
remote cluster.

Thanks,
Anil Gupta

On Sat, Oct 17, 2015 at 12:30 AM, whodarewin2006 <whodarewin2...@126.com>
wrote:

>
>
> hi,Ted
> I have read the web page you give,thanks a lot.But the page didn't
> mention if we can use ExportSnapShot to transfer data between different
> version of hbase(0.98.6->1.0.1.1),do you know this?
> Thanks again!
>
>
>
>
>
>
>
> At 2015-10-15 23:06:04, "Ted Yu" <yuzhih...@gmail.com> wrote:
> >See recent thread: http://search-hadoop.com/m/YGbbQfg0W1Onv5j
> >
> >On Thu, Oct 15, 2015 at 3:42 AM, whodarewin2006 <whodarewin2...@126.com>
> >wrote:
> >
> >> sorry,the subject is wrong,we want to transfer data from hbase0.98.6 to
> >> hbase 1.0.1.1
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> At 2015-10-15 18:34:17, "whodarewin2006" <whodarewin2...@126.com>
> wrote:
> >> >hi,
> >> >We  upgrade our hbase cluster from hbase0.98.6 to hbase1.0.1.1,and
> we
> >> want to transfer our data from old cluster to new cluster using
> >> ExportSnapshot,is this OK?Will this operation crash our new cluster down
> >> cause different file format?
> >>
>



-- 
Thanks & Regards,
Anil Gupta

Export Snapshot to remote cluster and then Clone_Snapshot from exported data

2015-10-14 Thread anil gupta

Hi,

I exported snapshot of a table to remote cluster. Now, i want to create
table on remote cluster using that exported snapshot. I have done this
around 2 years ago(on 0.94) but unfortunately, i dont remember steps now.

I tried to search mailing list archive and HBase documentation but i can
find steps to accomplish my task. Can anyone provide me the steps or point
me to documentation?

-- 
Thanks & Regards,
Anil Gupta

Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data

2015-10-14 Thread anil gupta

I am using 0.98. I used that doc instructions to export the snapshot. What do 
you mean by not exporting it to correct directory?
I am using HDP. Do you mean to that i just need to copy this exported in same 
directory structure as other snapshots?

> On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com> wrote:
> What version of hbase you are using ? What did you use to export snapshots
> to remote cluster? Please take look
> http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not exported
> snapshots to correct directory. Check your hdfs directories to locate
> snapshots.
> 
> Regards
> Samir
> 
> On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com> wrote:
> 
> > I dont see the snapshot when i run "list_snapshot" on destination
> > cluster.(i checked that initially but forgot to mention in my post)
> > Is it supposed to be listed in output of "list_snapshots" command?
> >
> > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > Can you see snapshot on remote cluster? If you can see snapshot you can
> > use
> > > clone snapshot command from hbase shell to create table.
> > > Regards
> > > Samir
> > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I exported snapshot of a table to remote cluster. Now, i want to create
> > > > table on remote cluster using that exported snapshot. I have done this
> > > > around 2 years ago(on 0.94) but unfortunately, i dont remember steps
> > now.
> > > >
> > > > I tried to search mailing list archive and HBase documentation but i
> > can
> > > > find steps to accomplish my task. Can anyone provide me the steps or
> > > point
> > > > me to documentation?
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >



-- 
Thanks & Regards,
Anil Gupta

Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data

2015-10-14 Thread anil gupta

I dont see the snapshot when i run "list_snapshot" on destination
cluster.(i checked that initially but forgot to mention in my post)
Is it supposed to be listed in output of "list_snapshots" command?

On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com> wrote:

> Hi,
> Can you see snapshot on remote cluster? If you can see snapshot you can use
> clone snapshot command from hbase shell to create table.
> Regards
> Samir
> On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com> wrote:
>
> > Hi,
> >
> > I exported snapshot of a table to remote cluster. Now, i want to create
> > table on remote cluster using that exported snapshot. I have done this
> > around 2 years ago(on 0.94) but unfortunately, i dont remember steps now.
> >
> > I tried to search mailing list archive and HBase documentation but i can
> > find steps to accomplish my task. Can anyone provide me the steps or
> point
> > me to documentation?
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data

2015-10-14 Thread anil gupta

Hi Samir,

You are right. But, HBase documentation didnt mention strict requirement of
correct hbase directory. So, i have to do few more trials to come up with
correct destination directory. As per my analysis, export directory should
be .

In cdh, rootdir is "/hbase" while in HDP, its "/apps/hbase/data". Hence, i
ran into this problem.
I am going to open documentation bug in HBase.
Thanks for your help.
Anil

On Wed, Oct 14, 2015 at 1:27 PM, Samir Ahmic <ahmic.sa...@gmail.com> wrote:

> If you exported snapshot with ExportSnapshot tool you shoud have "archive"
> and ".hbase-snapshot" directories on destination cluster in
> hbase.root.dir(usually /hbase). Inside ".hbase-snapshot" directory  you
> should see your snapshot. If your snapshot data is copied somewhere else
> you will not see snapshots with list_snapshots command. Try to locate
> snapshot directories on destination cluster and move data to correct
> locations.
>
> Regards
> Samir
>
> On Wed, Oct 14, 2015 at 9:10 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > I am using 0.98. I used that doc instructions to export the snapshot.
> What
> > do you mean by not exporting it to correct directory?
> > I am using HDP. Do you mean to that i just need to copy this exported in
> > same directory structure as other snapshots?
> >
> > > On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com>
> > wrote:
> > > What version of hbase you are using ? What did you use to export
> > snapshots
> > > to remote cluster? Please take look
> > > http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not
> > exported
> > > snapshots to correct directory. Check your hdfs directories to locate
> > > snapshots.
> > >
> > > Regards
> > > Samir
> > >
> > > On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com>
> > wrote:
> > >
> > > > I dont see the snapshot when i run "list_snapshot" on destination
> > > > cluster.(i checked that initially but forgot to mention in my post)
> > > > Is it supposed to be listed in output of "list_snapshots" command?
> > > >
> > > > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <ahmic.sa...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > > Can you see snapshot on remote cluster? If you can see snapshot you
> > can
> > > > use
> > > > > clone snapshot command from hbase shell to create table.
> > > > > Regards
> > > > > Samir
> > > > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com>
> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I exported snapshot of a table to remote cluster. Now, i want to
> > create
> > > > > > table on remote cluster using that exported snapshot. I have done
> > this
> > > > > > around 2 years ago(on 0.94) but unfortunately, i dont remember
> > steps
> > > > now.
> > > > > >
> > > > > > I tried to search mailing list archive and HBase documentation
> but
> > i
> > > > can
> > > > > > find steps to accomplish my task. Can anyone provide me the steps
> > or
> > > > > point
> > > > > > me to documentation?
> > > > > >
> > > > > > --
> > > > > > Thanks & Regards,
> > > > > > Anil Gupta
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Export Snapshot to remote cluster and then Clone_Snapshot from exported data

2015-10-14 Thread anil gupta

Created this: https://issues.apache.org/jira/browse/HBASE-14612

On Wed, Oct 14, 2015 at 10:18 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi Samir,
>
> You are right. But, HBase documentation didnt mention strict requirement
> of correct hbase directory. So, i have to do few more trials to come up
> with correct destination directory. As per my analysis, export directory
> should be .
>
> In cdh, rootdir is "/hbase" while in HDP, its "/apps/hbase/data". Hence, i
> ran into this problem.
> I am going to open documentation bug in HBase.
> Thanks for your help.
> Anil
>
> On Wed, Oct 14, 2015 at 1:27 PM, Samir Ahmic <ahmic.sa...@gmail.com>
> wrote:
>
>> If you exported snapshot with ExportSnapshot tool you shoud have "archive"
>> and ".hbase-snapshot" directories on destination cluster in
>> hbase.root.dir(usually /hbase). Inside ".hbase-snapshot" directory  you
>> should see your snapshot. If your snapshot data is copied somewhere else
>> you will not see snapshots with list_snapshots command. Try to locate
>> snapshot directories on destination cluster and move data to correct
>> locations.
>>
>> Regards
>> Samir
>>
>> On Wed, Oct 14, 2015 at 9:10 PM, anil gupta <anilgupt...@gmail.com>
>> wrote:
>>
>> > I am using 0.98. I used that doc instructions to export the snapshot.
>> What
>> > do you mean by not exporting it to correct directory?
>> > I am using HDP. Do you mean to that i just need to copy this exported in
>> > same directory structure as other snapshots?
>> >
>> > > On Wed, Oct 14, 2015 at 11:36 AM, Samir Ahmic <ahmic.sa...@gmail.com>
>> > wrote:
>> > > What version of hbase you are using ? What did you use to export
>> > snapshots
>> > > to remote cluster? Please take look
>> > > http://hbase.apache.org/book.html#ops.snapshots. Maybe you did not
>> > exported
>> > > snapshots to correct directory. Check your hdfs directories to locate
>> > > snapshots.
>> > >
>> > > Regards
>> > > Samir
>> > >
>> > > On Wed, Oct 14, 2015 at 8:25 PM, anil gupta <anilgupt...@gmail.com>
>> > wrote:
>> > >
>> > > > I dont see the snapshot when i run "list_snapshot" on destination
>> > > > cluster.(i checked that initially but forgot to mention in my post)
>> > > > Is it supposed to be listed in output of "list_snapshots" command?
>> > > >
>> > > > On Wed, Oct 14, 2015 at 11:19 AM, Samir Ahmic <
>> ahmic.sa...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi,
>> > > > > Can you see snapshot on remote cluster? If you can see snapshot
>> you
>> > can
>> > > > use
>> > > > > clone snapshot command from hbase shell to create table.
>> > > > > Regards
>> > > > > Samir
>> > > > > On Oct 14, 2015 6:38 PM, "anil gupta" <anilgupt...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I exported snapshot of a table to remote cluster. Now, i want to
>> > create
>> > > > > > table on remote cluster using that exported snapshot. I have
>> done
>> > this
>> > > > > > around 2 years ago(on 0.94) but unfortunately, i dont remember
>> > steps
>> > > > now.
>> > > > > >
>> > > > > > I tried to search mailing list archive and HBase documentation
>> but
>> > i
>> > > > can
>> > > > > > find steps to accomplish my task. Can anyone provide me the
>> steps
>> > or
>> > > > > point
>> > > > > > me to documentation?
>> > > > > >
>> > > > > > --
>> > > > > > Thanks & Regards,
>> > > > > > Anil Gupta
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Thanks & Regards,
>> > > > Anil Gupta
>> > > >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

start_replication command not available in hbase shell in HBase0.98

2015-10-13 Thread anil gupta

Hi All,

I am using HBase 0.98(HDP2.2).
As per the documentation here:
http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html

I am trying to run start_replication command. But, i m getting following
error:
hbase(main):013:0> start_replication
NameError: undefined local variable or method `start_replication' for
#

Is start_replication not a valid command in HBase0.98? If its deprecated
then what is the alternate command?

-- 
Thanks & Regards,
Anil Gupta

Re: start_replication command not available in hbase shell in HBase0.98

2015-10-13 Thread anil gupta

I found that those command are deprecated as per this Jira:
https://issues.apache.org/jira/browse/HBASE-8861

Still, after enabling peers the replication is not starting. We looked into
zk. Its peer state value is null/blank:
zknode:  ls /hbase-unsecure/replication/peers/prod-hbase/peer-state
[]

Can anyone tell me what is probably going on?

On Tue, Oct 13, 2015 at 3:56 PM, anil gupta <anilgupt...@gmail.com> wrote:

> Hi All,
>
> I am using HBase 0.98(HDP2.2).
> As per the documentation here:
>
> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Installation-Guide/cdh4ig_topic_20_11.html
>
> I am trying to run start_replication command. But, i m getting following
> error:
> hbase(main):013:0> start_replication
> NameError: undefined local variable or method `start_replication' for
> #
>
> Is start_replication not a valid command in HBase0.98? If its deprecated
> then what is the alternate command?
>
> --
> Thanks & Regards,
> Anil Gupta
>

-- 
Thanks & Regards,
Anil Gupta

Re: Does adding new columns cause compaction storm?

2015-10-11 Thread anil gupta

Hi Liren,

In short, adding new columns will *not* trigger compaction.


THanks,
Anil Gupta

On Sat, Oct 10, 2015 at 9:20 PM, Liren Ding <sky.gonna.bri...@gmail.com>
wrote:

> Thanks Ted. So far I don't see direct answer yet in any hbase books or
> articles. all resources say that values are ordered by rowkey:cf:column,
> but no one explains how new columns are stored after compaction. I think
> after compaction the store files should still follow the same way to
> organize data. So if a new column need to be added in all rows regularly,
> the compaction might have to extra works I/O operations accordingly. Maybe
> the schema design better to keep old data intact instead of keep adding new
> columns into it.
>
> On Sat, Oct 10, 2015 at 7:55 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Please take a look at:
> >
> > http://hbase.apache.org/book.html#_compaction
> > http://hbase.apache.org/book.html#exploringcompaction.policy
> >
> >
> http://hbase.apache.org/book.html#compaction.ratiobasedcompactionpolicy.algorithm
> >
> > FYI
> >
> > On Sat, Oct 10, 2015 at 6:53 PM, Liren Ding <sky.gonna.bri...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to design a schema for time series events data. The row key
> > is
> > > eventId, and event data is added into new "date" columns daily. So in a
> > > query I only need to set filter on columns to find all data for
> specified
> > > events. The table should look like following:
> > >
> > > rowkey  |  09-01-2015 | 09-02-2015 | ..
> > >
> > > eventid1  data11  data12
> > > eventid2  data21  data22
> > > eventid3  ..,..
> > > ...
> > >
> > > I know during compaction the data with same row key will be stored
> > > together. So with this design, will new columns cause compaction storm?
> > Or
> > > any other issues?
> > > Appreciate!
> > >
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: alter column family - possible operational impacts on big tables

2015-10-09 Thread Anil Gupta

Hi Nicolas,

For a table with 5k regions, it should not take more than 10 min for alter 
table operations.
Also, in HBase 1.0+, alter table operations does not require disabling the 
table. So, you are encouraged to upgrade.

Sent from my iPhone

> On Oct 9, 2015, at 1:15 AM, Nicolae Marasoiu  
> wrote:
> 
> Hi,
> 
> Indeed, we have tables with 1-5000 regions, distributed on 10-15 RSs.
> 
> A few hours are sufficient to do the alter one a single such table, right?
> 
> Thanks,
> Nicu
> 
> 
> From: Jean-Marc Spaggiari 
> Sent: Thursday, October 8, 2015 10:19 PM
> To: user
> Subject: Re: alter column family - possible operational impacts on big tables
> 
> Hi Nicu,
> 
> Indeed, with 0.94 you have to disable the table before doing the alter.
> However, for 30 regions, it should be pretty fast. When you say 30+, are
> you talking about like 1K regions? Or more like 32? The alter will only
> update the meta table, so not that much impact on the servers. And no
> compactions required for that. The ttl will only take effect at the next
> compaction by, as you said, filtering out more records.
> 
> JM
> 
> 2015-10-08 10:49 GMT-04:00 Nicolae Marasoiu :
> 
>> Hi,
>> 
>> 
>> If we run at night an alter column family, set ttl, my understanding is
>> that it will disable the table, make the alter, and re-enable the table,
>> which can be some time for large tables with 30+ regions (hbase version
>> 0.94 [image: ☹] ).
>> 
>> 
>> Do you have any advice about this? How long can it take per region? What
>> is the operational hit at the time of the alter command being issued, and
>> what when compaction runs on the table? I imagine that compaction is not
>> too affected by this, just by filtering out more records when re-writing
>> the new HFiles, is this correct?
>> 
>> 
>> Thanks,
>> 
>> Nicu
>>

Re: Exporting a snapshot to external cluster

2015-09-24 Thread Anil Gupta

Hi Akmal,

It will be better if you use name service value. You will not need to worry 
about which NN is active. I believe you can find that property in Hadoop's 
core-site.xml file. 

Sent from my iPhone

On Sep 24, 2015, at 7:23 AM, Akmal Abbasov  wrote:

>> My suggestion is different. You should put remote NN HA configuration in
>> hdfs-site.xml.
> ok, in case I’ll put it, still how I can determine which of those 2 namenodes 
> is active?
> 
>> On 24 Sep 2015, at 15:56, Serega Sheypak  wrote:
>> 
>> Have no Idea, some guys try to use "curl" to determine active NN.
>> My suggestion is different. You should put remote NN HA configuration in
>> hdfs-site.xml.
>> 
>> 2015-09-24 14:33 GMT+02:00 Akmal Abbasov :
>> 
 add remote cluster HA configuration to your "local" hdfs client
 configuration
>>> I am using the following command in script
>>> $HBASE_PATH/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot
>>> -snapshot snapshot-name -copy-to hdfs://remote_hbase_master/hbase
>>> 
>>> In this case how I can know which namenode is active?
>>> 
>>> Thanks!
>>> 
> On 23 Sep 2015, at 12:14, Serega Sheypak 
 wrote:
 
> 1. to know which of the HDFS namenode is active
 add remote cluster HA configuration to your "local" hdfs client
 configuration
 
> Afaik, it should be done through zookeeper, but through which API it
>>> will
 be more convenient?
 no,no,no
 use hdfs-site.xml configuration.
 You need to add configuration for remote NN HA and your local hdfs client
 would correctly resolve active NN.
 
 2015-09-23 11:32 GMT+02:00 Akmal Abbasov :
 
> Hi all,
> I would like to know the best practice when exporting a snapshot to
>>> remote
> hbase cluster with ha configuration.
> My assumption is:
> 1. to know which of the HDFS namenode is active
> 2. export snapshot to active namenode
> 
> Since I need to do this programmatically what is the best way to know
> which namenode is active?
> Afaik, it should be done through zookeeper, but through which API it
>>> will
> be more convenient?
> 
> Thanks.
>

Re: Hbase import/export change number of rows

2015-09-22 Thread Anil Gupta

How many rows are expected?
Can you do sanity checking in your data to make sure there are no duplicate 
rowkeys?

Sent from my iPhone

> On Sep 22, 2015, at 8:35 AM, OM PARKASH Nain  
> wrote:
> 
> I using two methods for row count:
> 
> hbase shell:
> 
> count "Table1"
> 
> another is:
> 
> hbase org.apache.hadoop.hbase.mapreduce.RowCounter "Table1"
> 
> Both give same number of row but export have different number of rows.
> 
>  hbase org.apache.hadoop.hbase.mapreduce.Export "Table1" "hdfs path"
> 
> 
> 
> 
> On Tue, Sep 22, 2015 at 5:33 PM, OM PARKASH Nain > wrote:
> 
>> I am using Hbase export using command.
>> 
>>  hbase org.apache.hadoop.hbase.mapreduce.Export "Table1" "hdfs path"
>> 
>> Then I use import command from HDFS to Hbase Table;
>> 
>> hbase org.apache.hadoop.hbase.mapreduce.Import "hdfs path" "Table2"
>> 
>> Then I count number of row in both tables, I found mismatch number of rows
>> 
>> Table1:8301 Table2:8032
>> 
>> Please define what goes wrong with my system.
>>

Re: Problem with HBase + Kerberos

2015-08-31 Thread Anil Gupta

ext(Unknown Source)
>at sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown
> Source)
>at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)
>at sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)
>... 19 more
> 2015-08-31 10:15:27,911 WARN  [regionserver60020]
> regionserver.HRegionServer: reportForDuty failed; sleeping and then
> retrying.
> 
> Is there kind of an expiration limit for keytab credentials ?
> Thanks for your help,
> 
> 
> Loïc
> 
> Loïc CHANEL
> Engineering student at TELECOM Nancy
> Trainee at Worldline - Villeurbanne
> 
> 2015-08-27 18:24 GMT+02:00 anil gupta <anilgupt...@gmail.com>:
> 
>> Maybe, this is related to some Ambari setup? Can you also ask on Ambari
>> mailing list.
>> IMO, secure HBase cluster connectivity has been working in HBase for a very
>> long time.
>> 
>> On Thu, Aug 27, 2015 at 12:48 AM, Loïc Chanel <
>> loic.cha...@telecomnancy.net>
>> wrote:
>> 
>>> I did not, but as I Kerberized my cluster with Ambari, it did the
>> mandatory
>>> modifications.
>>> 
>>> Loïc CHANEL
>>> Engineering student at TELECOM Nancy
>>> Trainee at Worldline - Villeurbanne
>>> 
>>> 2015-08-27 1:17 GMT+02:00 Laurent H <laurent.hat...@gmail.com>:
>>> 
>>>> Do you change some stuff in your hbase-site.xml when you've installed
>>>> Kerberos ?
>>>> 
>>>> --
>>>> Laurent HATIER - Consultant Big Data & Business Intelligence chez
>>> CapGemini
>>>> fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
>>>> <http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>
>>>> 
>>>> 2015-08-21 9:44 GMT+02:00 Loïc Chanel <loic.cha...@telecomnancy.net>:
>>>> 
>>>>> Sorry if I didn't mention that, but yeah, I ran kinit before invoking
>>>> hbase
>>>>> shell, and klists command says that my user has a ticket.
>>>>> [root@host /]# klist
>>>>> Ticket cache: FILE:/tmp/krb5cc_0
>>>>> Default principal: testuser@REALM
>>>>> 
>>>>> Valid starting ExpiresService principal
>>>>> 08/21/15 09:39:33  08/22/15 09:39:33  krbtgt/REALM@REALM
>>>>>renew until 08/21/15 09:39:33
>>>>> 
>>>>> 
>>>>> Loïc CHANEL
>>>>> Engineering student at TELECOM Nancy
>>>>> Trainee at Worldline - Villeurbanne
>>>>> 
>>>>> 2015-08-21 6:12 GMT+02:00 anil gupta <anilgupt...@gmail.com>:
>>>>> 
>>>>>> Did you run kinit command before invoking "hbase shell"? What does
>>>> klist
>>>>>> command says?
>>>>>> 
>>>>>> On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel <
>>>>> loic.cha...@telecomnancy.net
>>>>>> wrote:
>>>>>> 
>>>>>>> By the way, as this may help to find my issue, I just tested
>> typing
>>>>>> *whoami
>>>>>>> *in HBase shell : this returned me exactly what it should :
>>>>>>> testuser@REALM (auth:KERBEROS)
>>>>>>>groups: nobody, toast
>>>>>>> 
>>>>>>> Loïc CHANEL
>>>>>>> Engineering student at TELECOM Nancy
>>>>>>> Trainee at Worldline - Villeurbanne
>>>>>>> 
>>>>>>> 2015-08-20 15:17 GMT+02:00 Loïc Chanel <
>>> loic.cha...@telecomnancy.net
>>>>> :
>>>>>>> 
>>>>>>>> Nothing more with your option :/
>>>>>>>> 
>>>>>>>> Loïc CHANEL
>>>>>>>> Engineering student at TELECOM Nancy
>>>>>>>> Trainee at Worldline - Villeurbanne
>>>>>>>> 
>>>>>>>> 2015-08-20 15:04 GMT+02:00 Loïc Chanel <
>>>> loic.cha...@telecomnancy.net
>>>>>> :
>>>>>>>> 
>>>>>>>>> I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2.
>>>>>>>>> I have unlimited strength JCE installed.
>>>>>>>>> 
>>>>>>>>> I'll try to have more clues with this option.
>>>>>>>>> 
>>>>>>>>> Loïc CHANEL
>>>>>>>>> Engineering student at TELECOM Nancy
>>>>>>>

Re: Problem with HBase + Kerberos

2015-08-27 Thread anil gupta

Maybe, this is related to some Ambari setup? Can you also ask on Ambari
mailing list.
IMO, secure HBase cluster connectivity has been working in HBase for a very
long time.

On Thu, Aug 27, 2015 at 12:48 AM, Loïc Chanel loic.cha...@telecomnancy.net
wrote:

 I did not, but as I Kerberized my cluster with Ambari, it did the mandatory
 modifications.

 Loïc CHANEL
 Engineering student at TELECOM Nancy
 Trainee at Worldline - Villeurbanne

 2015-08-27 1:17 GMT+02:00 Laurent H laurent.hat...@gmail.com:

  Do you change some stuff in your hbase-site.xml when you've installed
  Kerberos ?
 
  --
  Laurent HATIER - Consultant Big Data  Business Intelligence chez
 CapGemini
  fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
  http://fr.linkedin.com/pub/laurent-h/25/36b/a86/
 
  2015-08-21 9:44 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net:
 
   Sorry if I didn't mention that, but yeah, I ran kinit before invoking
  hbase
   shell, and klists command says that my user has a ticket.
   [root@host /]# klist
   Ticket cache: FILE:/tmp/krb5cc_0
   Default principal: testuser@REALM
  
   Valid starting ExpiresService principal
   08/21/15 09:39:33  08/22/15 09:39:33  krbtgt/REALM@REALM
   renew until 08/21/15 09:39:33
  
  
   Loïc CHANEL
   Engineering student at TELECOM Nancy
   Trainee at Worldline - Villeurbanne
  
   2015-08-21 6:12 GMT+02:00 anil gupta anilgupt...@gmail.com:
  
Did you run kinit command before invoking hbase shell? What does
  klist
command says?
   
On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel 
   loic.cha...@telecomnancy.net

wrote:
   
 By the way, as this may help to find my issue, I just tested typing
*whoami
 *in HBase shell : this returned me exactly what it should :
 testuser@REALM (auth:KERBEROS)
 groups: nobody, toast

 Loïc CHANEL
 Engineering student at TELECOM Nancy
 Trainee at Worldline - Villeurbanne

 2015-08-20 15:17 GMT+02:00 Loïc Chanel 
 loic.cha...@telecomnancy.net
  :

  Nothing more with your option :/
 
  Loïc CHANEL
  Engineering student at TELECOM Nancy
  Trainee at Worldline - Villeurbanne
 
  2015-08-20 15:04 GMT+02:00 Loïc Chanel 
  loic.cha...@telecomnancy.net
   :
 
  I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2.
  I have unlimited strength JCE installed.
 
  I'll try to have more clues with this option.
 
  Loïc CHANEL
  Engineering student at TELECOM Nancy
  Trainee at Worldline - Villeurbanne
 
  2015-08-20 14:58 GMT+02:00 Ted Yu yuzhih...@gmail.com:
 
  Which hbase / hadoop release are you using ?
 
  Running with -Dsun.security.krb5.debug=true will provide more
  clue.
 
  Do you have unlimited strength JCE installed ?
 
  Cheers
 
  On Thu, Aug 20, 2015 at 5:46 AM, Loïc Chanel 
  loic.cha...@telecomnancy.net
  wrote:
 
   Hi all,
  
   Since I kerberized my cluster, it seems like I can't use
 HBase
 anymore
  ...
   For example, executing  create 'toto','titi' on HBase shell
   results
 in
  the
   printing of this line endlessly :
   WARN  [main] security.UserGroupInformation: Not attempting to
 re-login
   since the last re-login was attempted less than 600 seconds
   before.
  
   And nothing else happens.
   I tried to restart HDFS and HBase, and to re-generate
  credentials
and
   keytabs, but nothing changed.
   As for the logs, they are not very explicits, as the only
 thing
they
  say
   (and keep saying) is :
  
   2015-08-20 13:50:12,697 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Created SASL server with mechanism = GSSAPI
   2015-08-20 13:50:12,698 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Have read input token of size 650 for
 processing
   by
   saslServer.evaluateResponse()
   2015-08-20 13:50:12,704 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Will send token of size 108 from saslServer.
   2015-08-20 13:50:12,706 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Have read input token of size 0 for processing
  by
   saslServer.evaluateResponse()
   2015-08-20 13:50:12,707 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Will send token of size 32 from saslServer.
   2015-08-20 13:50:12,708 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING
   client
   192.168.6.148:43014 because read count=-1. Number of active
  connections: 3
  
   Do anyone has an idea about where this might come from, or
 how
  to
  solve it
   ? Because I couldn't find much documentation about this.
   Thanks in advance for your help !
  
  
   Loïc
  
   Loïc CHANEL
   Engineering student at TELECOM Nancy
   Trainee at Worldline - Villeurbanne

Re: Using HBase with a shared filesystem (gluster, nfs, s3, etc)

2015-08-25 Thread anil gupta

AFAIK, region movement does not moves the data of region on the
(distributed)FileSystem. It should only, update metadata of HBase.
Did you check diskio stats during region movement?

On Tue, Aug 25, 2015 at 10:40 AM, Ted Yu yuzhih...@gmail.com wrote:

 Please see http://hbase.apache.org/book.html#regions.arch.assignment

 On Tue, Aug 25, 2015 at 10:37 AM, donmai dood...@gmail.com wrote:

  NFS
  0.98.10
  Will get to you as soon as I am able, on travel
 
  Is my general understanding correct, though, that there shouldn't be any
  data movement from a region reassignment?
 
  On Tue, Aug 25, 2015 at 12:40 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   Can you give a bit more information:
  
   which filesystem you use
   which hbase release you use
   master log snippet for the long region assignment
  
   Thanks
  
   On Tue, Aug 25, 2015 at 9:30 AM, donmai dood...@gmail.com wrote:
  
Hi,
   
I'm curious about how exactly region movement works with regard to
 data
transfer. To my understanding from the docs given an HDFS-backed
   cluster, a
region movement / transition involves changing things in meta only,
 all
data movement for locality is handled by HDFS. In the case where
  rootdir
   is
a shared file system, there shouldn't be any data movement with a
  region
reassignment, correct? I'm running into performance issues where
 region
assignment takes a very long time and I'm trying to figure out why.
   
Thanks!
   
  
 




-- 
Thanks  Regards,
Anil Gupta

Re: Problem with HBase + Kerberos

2015-08-20 Thread anil gupta

Did you run kinit command before invoking hbase shell? What does klist
command says?

On Thu, Aug 20, 2015 at 6:47 AM, Loïc Chanel loic.cha...@telecomnancy.net
wrote:

 By the way, as this may help to find my issue, I just tested typing *whoami
 *in HBase shell : this returned me exactly what it should :
 testuser@REALM (auth:KERBEROS)
 groups: nobody, toast

 Loïc CHANEL
 Engineering student at TELECOM Nancy
 Trainee at Worldline - Villeurbanne

 2015-08-20 15:17 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net:

  Nothing more with your option :/
 
  Loïc CHANEL
  Engineering student at TELECOM Nancy
  Trainee at Worldline - Villeurbanne
 
  2015-08-20 15:04 GMT+02:00 Loïc Chanel loic.cha...@telecomnancy.net:
 
  I'm using HDP 2.2.4.2, with HBase 0.98.4.2.2.
  I have unlimited strength JCE installed.
 
  I'll try to have more clues with this option.
 
  Loïc CHANEL
  Engineering student at TELECOM Nancy
  Trainee at Worldline - Villeurbanne
 
  2015-08-20 14:58 GMT+02:00 Ted Yu yuzhih...@gmail.com:
 
  Which hbase / hadoop release are you using ?
 
  Running with -Dsun.security.krb5.debug=true will provide more clue.
 
  Do you have unlimited strength JCE installed ?
 
  Cheers
 
  On Thu, Aug 20, 2015 at 5:46 AM, Loïc Chanel 
  loic.cha...@telecomnancy.net
  wrote:
 
   Hi all,
  
   Since I kerberized my cluster, it seems like I can't use HBase
 anymore
  ...
   For example, executing  create 'toto','titi' on HBase shell results
 in
  the
   printing of this line endlessly :
   WARN  [main] security.UserGroupInformation: Not attempting to
 re-login
   since the last re-login was attempted less than 600 seconds before.
  
   And nothing else happens.
   I tried to restart HDFS and HBase, and to re-generate credentials and
   keytabs, but nothing changed.
   As for the logs, they are not very explicits, as the only thing they
  say
   (and keep saying) is :
  
   2015-08-20 13:50:12,697 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Created SASL server with mechanism = GSSAPI
   2015-08-20 13:50:12,698 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Have read input token of size 650 for processing by
   saslServer.evaluateResponse()
   2015-08-20 13:50:12,704 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Will send token of size 108 from saslServer.
   2015-08-20 13:50:12,706 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Have read input token of size 0 for processing by
   saslServer.evaluateResponse()
   2015-08-20 13:50:12,707 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: Will send token of size 32 from saslServer.
   2015-08-20 13:50:12,708 DEBUG [RpcServer.reader=2,port=6]
   ipc.RpcServer: RpcServer.listener,port=6: DISCONNECTING client
   192.168.6.148:43014 because read count=-1. Number of active
  connections: 3
  
   Do anyone has an idea about where this might come from, or how to
  solve it
   ? Because I couldn't find much documentation about this.
   Thanks in advance for your help !
  
  
   Loïc
  
   Loïc CHANEL
   Engineering student at TELECOM Nancy
   Trainee at Worldline - Villeurbanne
  
 
 
 
 




-- 
Thanks  Regards,
Anil Gupta

Re: hbase doubts

2015-08-18 Thread anil gupta

:
 
   For #1, take a look at the following in
  hbase-default.xml :
  
   namehbase.client.keyvalue.maxsize/name
   value10485760/value
  
   For #2, it would be easier to answer if you can outline
access
patterns
  in
   your app.
  
   For #3, adjustment according to current region
 boundaries
   is
 done
 client
   side. Take a look at the javadoc for LoadQueueItem
   in LoadIncrementalHFiles.java
  
   Cheers
  
   On Mon, Aug 17, 2015 at 6:45 AM, Shushant Arora 
  shushantaror...@gmail.com
   
   wrote:
  
1.Is there any max limit on key size of hbase table.
2.Is multiple small tables vs one large table which
 one
   is
preferred.
3.for bulk load -when  LoadIncremantalHfile is run it
   again
  recalculates
the region splits based on region boundary - is this
division
happens
  on
client side or server side again at region server or
   hbase
  master
and
   then
it assigns the splits which cross target region
  boundary
   to
   desired
regionserver.
   
  
 

   
  
 

   
  
 




-- 
Thanks  Regards,
Anil Gupta

Re: groupby(prefix(rowkey)) with multiple custom aggregated columns

2015-08-08 Thread anil gupta

Hi Nicu,

Have you taken a look at Phoenix. It supports group by :
https://phoenix.apache.org/language/index.html
It will also provide you much more sql like querying on HBase.

On Fri, Aug 7, 2015 at 2:19 PM, Ted Yu yuzhih...@gmail.com wrote:

 Please take a look
 at
 hbase-client/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 which shows several other aggregations.

 BTW group by functionality would involve some more work since rows for the
 same group may span multiple regions.

 Cheers

 On Fri, Aug 7, 2015 at 9:42 AM, Nicolae Marasoiu 
 nicolae.maras...@gmail.com
  wrote:

  Hi,
 
  I need to implement a limited sql like filter+group+order, and the group
 is
  on a fixed-length prefix of the rowkey (fixed per query), and the results
  are multiple metrics including some custom ones like statistical unique
  counts.
 
  I noticed that available tooling with coprocessors, like
  ColumnAggregationProtocol, involve just one metric e.g. one sum(column).
 We
  collect many, and of course it is more efficient to scan the data once.
 
  Please advise,
  Nicu
 




-- 
Thanks  Regards,
Anil Gupta

Re: Disable Base64 encoding in Stargate request and Return as String

2015-08-06 Thread anil gupta

Thanks Andrew. I didnt to change behavior of hbase shell. I intend to
provide an enhancement to HBase Rest while not impacting its default
behavior.

On Thu, Aug 6, 2015 at 5:29 PM, Andrew Purtell apurt...@apache.org wrote:

  returned from the shell

 Meant returned from the REST gateway.


 On Thu, Aug 6, 2015 at 5:28 PM, Andrew Purtell apurt...@apache.org
 wrote:

  Unfortunately we can't change the current set of representations are
  returned from the shell, that would be a backwards compatibility problem.
  We can however add new representations (selectable by way of the Accept
  header, e.g. Accept: text/plain). If you'd like to propose a patch we'd
  certainly look at it.
 
  Thanks.
 
 
  On Wed, Aug 5, 2015 at 12:51 AM, anil gupta anilgupt...@gmail.com
 wrote:
 
  Hi Andrew,
 
  Thanks for sharing your thoughts. Sorry for late reply as i recently
 came
  back from vacation.
  I understand that HBase stores byte arrays, so its hard for HBase to
  figure
  out the data type.
  What if, the client knows that all the columns in the Rest request are
  Strings. In that case, can we give the option of setting a request
 header
  StringDecoding:True. By default, we can assume StringDecoding:
 false.
  Just some food for thought.
 
  Also, if we can replicate the Encoding that we do in HBase Shell(where
  string are shown in readable format and we hex encode all binary data).
  That would be best. In this case, it would be really convenient use of
  Rest
  service rather than invoking hbase shell. Right now, IMO, due to lack
 of
  readability its only good to fetch images.(we store images in HBase)
 
  Provided my employer allows me to contribute, I am willing to work on
  this.
  Would HBase accept a patch?
 
  Thanks,
  Anil Gupta
 
  On Fri, Jul 17, 2015 at 4:57 PM, Andrew Purtell apurt...@apache.org
  wrote:
 
   
   
   The closest you can get to just a string is have your client use an
  accept
   header of Accept: application/octet-stream with making a query. This
  will
   return zero or one value in the response. If a value is present in the
   table at the requested location, the response body will be the
 unencoded
   bytes. If you've stored a string, you'll get back a string. If you've
   stored an image, you'll get back the raw image bytes. Note that using
 an
   accept header of application/octet-stream implicitly limits you to
   queries that only return zero or one values. (Strictly speaking, per
 the
   package doc: If binary encoding is requested, only one cell can be
   returned, the first to match the resource specification. The row,
  column,
   and timestamp associated with the cell will be transmitted in X
 headers:
   X-Row, X-Column, and X-Timestamp, respectively. Depending on the
  precision
   of the resource specification, some of the X-headers may be elided as
   redundant.)
   
   In general, the REST gateway supports
  
   several 
   alternate encodings. See
  
  
 
 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html
   for some examples.
  
   Note that HBase
   cell data
is binary
   , not string.
   It
   does not
make sense to turn off base64 encoding for the default response
  encoding,
   XML, because
   that 
   would produce invalid XML
    if a value happens to include non XML safe bytes
   .
   HBase can't know that in advance. We need to encode keys and values
 in
  a
   safe manner to avoid blowing up your
   client's XML.
  
   The same is roughly true for JSON.
  
  
   If your client sends an accept header of Accept:
 application/protobuf
   you'll get back a protobuf encoded object. Your client will need to be
   prepared to handle that representation. This is probably not what you
  want.
  
   Why are we
   even 
   talking about using XML
   , JSON,
    or
protobuf to
   encode
responses? Because for many types of REST queries, HBase
   must return 
   a structured response.
   The client has asked for more than
   simply
   one value, simply one string
   . The response
   must include
   key
   s
   ,
   values
   ,
   timestamps
   ;
maybe a whole row
   's worth
   of
   keys, values, and timestamps
   ;
maybe multiple rows. It depends on the query you issued.
    (See the '
   Cell or Row Query (Multiple Values)
   ' section in the package doc.)
  
  
  
  
   On Fri, Jul 17, 2015 at 2:20 PM, anil gupta anilgupt...@gmail.com
  wrote:
  
Hi All,
   
We have a String Rowkey. We have String values of cells.
Still, Stargate returns the data with Base64 encoding due to which a
  user
cant read the data. Is there a way to disable Base64 encoding and
 then
   Rest
request would just return Strings.
   
--
Thanks  Regards,
Anil Gupta
   
  
  
  
   --
   Best regards,
  
  - Andy
  
   Problems worthy of attack prove their worth by hitting back. - Piet
 Hein
   (via Tom White)
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta

Re: Disable Base64 encoding in Stargate request and Return as String

2015-08-05 Thread anil gupta

Hi Andrew,

Thanks for sharing your thoughts. Sorry for late reply as i recently came
back from vacation.
I understand that HBase stores byte arrays, so its hard for HBase to figure
out the data type.
What if, the client knows that all the columns in the Rest request are
Strings. In that case, can we give the option of setting a request header
StringDecoding:True. By default, we can assume StringDecoding: false.
Just some food for thought.

Also, if we can replicate the Encoding that we do in HBase Shell(where
string are shown in readable format and we hex encode all binary data).
That would be best. In this case, it would be really convenient use of Rest
service rather than invoking hbase shell. Right now, IMO, due to lack of
readability its only good to fetch images.(we store images in HBase)

Provided my employer allows me to contribute, I am willing to work on this.
Would HBase accept a patch?

Thanks,
Anil Gupta

On Fri, Jul 17, 2015 at 4:57 PM, Andrew Purtell apurt...@apache.org wrote:

 
 
 The closest you can get to just a string is have your client use an accept
 header of Accept: application/octet-stream with making a query. This will
 return zero or one value in the response. If a value is present in the
 table at the requested location, the response body will be the unencoded
 bytes. If you've stored a string, you'll get back a string. If you've
 stored an image, you'll get back the raw image bytes. Note that using an
 accept header of application/octet-stream implicitly limits you to
 queries that only return zero or one values. (Strictly speaking, per the
 package doc: If binary encoding is requested, only one cell can be
 returned, the first to match the resource specification. The row, column,
 and timestamp associated with the cell will be transmitted in X headers:
 X-Row, X-Column, and X-Timestamp, respectively. Depending on the precision
 of the resource specification, some of the X-headers may be elided as
 redundant.)
 
 In general, the REST gateway supports

 several 
 alternate encodings. See

 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/rest/package-summary.html
 for some examples.

 Note that HBase
 cell data
  is binary
 , not string.
 It
 does not
  make sense to turn off base64 encoding for the default response encoding,
 XML, because
 that 
 would produce invalid XML
  if a value happens to include non XML safe bytes
 .
 HBase can't know that in advance. We need to encode keys and values in a
 safe manner to avoid blowing up your
 client's XML.

 The same is roughly true for JSON.


 If your client sends an accept header of Accept: application/protobuf
 you'll get back a protobuf encoded object. Your client will need to be
 prepared to handle that representation. This is probably not what you want.

 Why are we
 even 
 talking about using XML
 , JSON,
  or
  protobuf to
 encode
  responses? Because for many types of REST queries, HBase
 must return 
 a structured response.
 The client has asked for more than
 simply
 one value, simply one string
 . The response
 must include
 key
 s
 ,
 values
 ,
 timestamps
 ;
  maybe a whole row
 's worth
 of
 keys, values, and timestamps
 ;
  maybe multiple rows. It depends on the query you issued.
  (See the '
 Cell or Row Query (Multiple Values)
 ' section in the package doc.)




 On Fri, Jul 17, 2015 at 2:20 PM, anil gupta anilgupt...@gmail.com wrote:

  Hi All,
 
  We have a String Rowkey. We have String values of cells.
  Still, Stargate returns the data with Base64 encoding due to which a user
  cant read the data. Is there a way to disable Base64 encoding and then
 Rest
  request would just return Strings.
 
  --
  Thanks  Regards,
  Anil Gupta
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)




-- 
Thanks  Regards,
Anil Gupta

Re: [DISCUSS] Split up the book again?

2015-07-30 Thread anil gupta

Hi All,

Since we are talking about HBase documentation. Is it possible to have docs
for Specific versions. Right now, JavaDocs refer to  0.94 or HBase2.0.
Its not convenient to look at 2.0 docs while working on 0.98 or 1.0. I hope
this should not be super difficult to accomplish.
Apache Kafka, ElasticSearch, and many other product make the docs available
for all the currently supported versions.

It would be nice if we can just change the version in this url:
http://hbase.apache.org/hbase_version/apidocs/index.html  and look at the
docs. That's how many Apache TLP's do.


On Thu, Jul 30, 2015 at 9:41 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 +1 too.  Even if cleaner and nicer,  searching in it is a pain compares to
 before.
 Le 2015-07-30 07:17, Shane O'Donnell sha...@knownormal.com a écrit :

  +1.
 
  One specific case where this is an issue is if you are entering the book
  with an anchor link.  If you try this, it appears to just hang.
 
  Shane O.
 
  On Thu, Jul 30, 2015 at 10:07 AM, Stack st...@duboce.net wrote:
 
   On Thu, Jul 30, 2015 at 2:06 PM, Lars Francke lars.fran...@gmail.com
   wrote:
  
While I like the new and better layout of the book it is painful to
  use -
at least for me - because of its size.
   
   
   I've started to notice this too. It'd be sweet if it loaded more
  promptly.
   Thanks for starting the discussion.
   St.Ack
  
 




-- 
Thanks  Regards,
Anil Gupta

Disable Base64 encoding in Stargate request and Return as String

2015-07-17 Thread anil gupta

Hi All,

We have a String Rowkey. We have String values of cells.
Still, Stargate returns the data with Base64 encoding due to which a user
cant read the data. Is there a way to disable Base64 encoding and then Rest
request would just return Strings.

-- 
Thanks  Regards,
Anil Gupta

Re: HBase co-processor performance

2015-07-16 Thread anil gupta

Yes, If possible, try to denormalize data and reduce number of calls. Its
ok to store some redundant data with each row due to denormalization.


On Thu, Jul 16, 2015 at 6:18 AM, Chandrashekhar Kotekar 
shekhar.kote...@gmail.com wrote:

 Hi,

 Thanks for the inputs. As you said, it is better to change database design
 than moving this business logic to co-processors, and sorry for duplicate
 mail. I guess duplicate mail was in my mobile's outbox and after syncing
 mobile that mail was sent.


 Regards,
 Chandrash3khar Kotekar
 Mobile - +91 8600011455

 On Wed, Jul 15, 2015 at 12:40 PM, anil gupta anilgupt...@gmail.com
 wrote:

  Using coprocessor to make calls to other Tables or remote Regions is an
  ANTI-PATTERN. It will create cyclic dependency between RS in your
 cluster.
  Coprocessors should be strictly used for operation on local Regions.
 Search
  mailing archives for more detailed discussion on this topic.
 
  How about denormalizing the data and then just doing ONE call? Now, this
  becomes more of a data modeling question.
 
  Thanks,
  Anil Gupta
 
 
  On Tue, Jul 14, 2015 at 11:39 PM, Chandrashekhar Kotekar 
  shekhar.kote...@gmail.com wrote:
 
   Hi,
  
   REST APIs of my project make 2-3 calls to different tables in HBase.
  These
   calls are taking 10s of milli seconds to finish.
  
   I would like to know
  
   1) If moving business logic to HBase co-processors and/or observer will
   improve performance?
  
   Idea is like to pass all the related information to HBase co-processors
   and/or observer, co-processor will make those 2-3 calls to different
  HBase
   tables and return result to the client.
  
   2) I wonder if this approach will reduce time to finish or is it a bad
   approach?
  
   3) If co-processor running on one region server fetches data from other
   region server then it will be same as tomcat server fetching that data
  from
   HBase region server. Isn't it?
  
   Regards,
   Chandrash3khar Kotekar
   Mobile - +91 8600011455
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta

Re: HBase co-processor performance

2015-07-15 Thread anil gupta

Using coprocessor to make calls to other Tables or remote Regions is an
ANTI-PATTERN. It will create cyclic dependency between RS in your cluster.
Coprocessors should be strictly used for operation on local Regions. Search
mailing archives for more detailed discussion on this topic.

How about denormalizing the data and then just doing ONE call? Now, this
becomes more of a data modeling question.

Thanks,
Anil Gupta


On Tue, Jul 14, 2015 at 11:39 PM, Chandrashekhar Kotekar 
shekhar.kote...@gmail.com wrote:

 Hi,

 REST APIs of my project make 2-3 calls to different tables in HBase. These
 calls are taking 10s of milli seconds to finish.

 I would like to know

 1) If moving business logic to HBase co-processors and/or observer will
 improve performance?

 Idea is like to pass all the related information to HBase co-processors
 and/or observer, co-processor will make those 2-3 calls to different HBase
 tables and return result to the client.

 2) I wonder if this approach will reduce time to finish or is it a bad
 approach?

 3) If co-processor running on one region server fetches data from other
 region server then it will be same as tomcat server fetching that data from
 HBase region server. Isn't it?

 Regards,
 Chandrash3khar Kotekar
 Mobile - +91 8600011455




-- 
Thanks  Regards,
Anil Gupta

Re: Performance of co-processor and observer while fetching data from other RS

2015-07-15 Thread Anil Gupta

I think this is a duplicate post. Please avoid posting same questions. Please 
use previous  thread where I replied.

Sent from my iPhone

 On Jul 14, 2015, at 11:17 PM, Chandrashekhar Kotekar 
 shekhar.kote...@gmail.com wrote:
 
 Hi,
 
 REST APIs of my project make 2-3 calls to different tables in HBase. These
 calls are taking 10s of milli seconds to finish.
 
 I would like to know
 
 1) If moving business logic to HBase co-processors and/or observer will
 improve performance?
 
 Idea is like to pass all the related information to HBase co-processors
 and/or observer, co-processor will make those 2-3 calls to different HBase
 tables and return result to the client.
 
 2) I wonder if this approach will reduce time to finish or is it a bad
 approach?
 
 3) If co-processor running on one region server fetches data from other
 region server then it will be same as tomcat server fetching that data from
 HBase region server. Isn't it?
 
 
 Regards,
 Chandrash3khar Kotekar
 Mobile - +91 8600011455

Re: HConnection thread waiting on blocking queue indefinitely

2015-07-07 Thread anil gupta

 also facing the
  same
 issue
 that
 client
 connection thread is waiting at


   
  
 

   
  
 
 org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1200).

 Any help is appreciated.

 Regards,
 Praneesh

   
  
 

   
  
 




-- 
Thanks  Regards,
Anil Gupta

Re: Fix Number of Regions per Node ?

2015-06-17 Thread anil gupta

Hi Rahul,

I dont think, there is anything like that.
But, you can effectively do that by setting Region size. However, if
hardware configuration varies across the cluster, then this property would
not be helpful because AFAIK, region size can be set on table basis
only(not on node basis). It would be best to avoid having diff in hardware
in cluster machines.

Thanks,
Anil Gupta

On Wed, Jun 17, 2015 at 5:12 PM, rahul malviya malviyarahul2...@gmail.com
wrote:

 Hi,

 Is it possible to configure HBase to have only fix number of regions per
 node per table in hbase. For example node1 serves 2 regions, node2 serves 3
 regions etc for any table created ?

 Thanks,
 Rahul




-- 
Thanks  Regards,
Anil Gupta

Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)

2015-06-14 Thread anil gupta

Thanks Stack.

On Wed, Jun 10, 2015 at 8:06 AM, Stack st...@duboce.net wrote:

 On Mon, Jun 8, 2015 at 10:27 PM, anil gupta anilgupt...@gmail.com wrote:

  So, if we have to match against non-string data in hbase shell. We should
  always use double quotes?


 Double-quotes means the shell (ruby) will interpret and undo any escaping
 -- e..g. showing as hex -- of binary characters. What we emit on the shell
 is a combo of ruby escaping and our running all through

 https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toStringBinary(byte[])
 first.

 If you type 'help' in the shell on the end we try to say this but could do
 a better job:

 If you are using binary keys or values and need to enter them in the
 shell, use
 double-quote'd hexadecimal representation. For example:

   hbase get 't1', key\x03\x3f\xcd
   hbase get 't1', key\003\023\011
   hbase put 't1', test\xef\xff, 'f1:', \x01\x33\x40

 St.Ack




  Even for matching values of cells?
 
  On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   Double quotes allow you to do string interpolation.
   Aother difference (one pertinent to Anil's question) is that 'escape
   sequence' does not work using single quote.
  
   Cheers
  
   On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com
  wrote:
  
Hi Jean,
   
My bad. I gave a wrong illustration. This is the query is was trying
 on
   my
composite key:
hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW =
'110\x00' , LIMIT=1}
ROW
COLUMN+CELL
   
 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
  column=A:BODYSTYLE,
timestamp=1432899595317,
value=SEDAN
   
 0
   
 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan.
   
I do have this rowkey:
  110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00
So, i was expecting to get that row.
   
   
Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 ,
LIMIT=1}
   
I dont really know what's the difference between single quotes and
  double
quotes in startrow. Can anyone explain? Also, It would help others,
 if
  it
can be documented somewhere.
   
Thanks,
Anil
   
   
On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org
 wrote:
   
 Hi Anil,

 Can you please clarify what seems to be wrong for you?

 You asked for start row 33078. Which mean Rows starting with a
 3,
 followed by a 3, a 0, etc. and the first row returned start
 with
  a
4
 which is correct given the startrow you have specified.

 You seems to have a composite key. And you seems to scan without
   building
 the composite key. How have you created your table and what is your
  key
 design?

 JM

 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com:

  Hi All,
 
  I m having a lot of trouble dealing with HBase shell. I am
 running
  following query:
 
  scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' ,
 LIMIT=1}
 
  ROW
  COLUMN+CELL
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT
  UTILITY
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLESLUG, timestamp=1430280906358,
  value=sport-utility
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358,
  value=\x01
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARTYPE, timestamp=1430280906358, value={isLuxury:
  false,
  isTruck: false, isSedan: false, isCoupe: false, isSuv:
  true,
  isConvertible: false, isVan: false, isWagon: false,
  isEasyCareQualified: true}
 
  I specified, startRow='33078'. Then how come this result shows
 up?
What's
  going over here?
 
  --
  Thanks  Regards,
  Anil Gupta
 

   
   
   
--
Thanks  Regards,
Anil Gupta
   
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta

Re: Where can I find the apidoc for newer version of Hbase?

2015-06-14 Thread anil gupta

+1 on getting the docs of all current releases on HBase website.
IMHO, It's not convenient to tell people to download stuff just to see
docs. Especially, when you are trying to make people adopt/learn HBase(i
have faced resistance from some of my colleagues on this.)

I like that ElasticSearch website exposes this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

It would be great if we can do something like this.

~Anil

On Sun, Jun 14, 2015 at 8:37 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:

 version 1.0.1.1, but I'd like to read them online.






 -- Original --
 From:  Sean Busbey;bus...@cloudera.com;
 Send time: Sunday, Jun 14, 2015 9:55 AM
 To: useruser@hbase.apache.org;

 Subject:  Re: Where can I find the apidoc for newer version of Hbase?



 What version are you looking for, specifically? If you download a binary
 artifact, it will have a copy of the javadocs for that version. If you
 download a source artifact, you can build the javadocs using the site
 maven goal.

 On Sat, Jun 13, 2015 at 8:33 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:

  Hi,
 
 
  I can only find documentations for 0.94 version of Hbase at
  http://hbase.apache.org/0.94/apidocs/index.html,
   but where can I find the URL for newer version?
 
 
  Thanks




 --
 Sean




-- 
Thanks  Regards,
Anil Gupta

Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)

2015-06-10 Thread anil gupta

Yes, Lets say,  from hbase shell i would like to filter(
SingleColumnValueFilter) rows on basis of cell value that is stored as an
Int.
Lets assume the column name and value to be USER:AGE=5

On Tue, Jun 9, 2015 at 9:26 PM, Ted Yu yuzhih...@gmail.com wrote:

 bq. if we have to match against non-string data in hbase shell. We
 should always
 use double quotes?

 I think so.

 bq. Even for matching values of cells?

 Did you mean through use of some Filter ?

 Cheers

 On Mon, Jun 8, 2015 at 10:27 PM, anil gupta anilgupt...@gmail.com wrote:

  So, if we have to match against non-string data in hbase shell. We should
  always use double quotes? Even for matching values of cells?
 
  On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   Double quotes allow you to do string interpolation.
   Aother difference (one pertinent to Anil's question) is that 'escape
   sequence' does not work using single quote.
  
   Cheers
  
   On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com
  wrote:
  
Hi Jean,
   
My bad. I gave a wrong illustration. This is the query is was trying
 on
   my
composite key:
hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW =
'110\x00' , LIMIT=1}
ROW
COLUMN+CELL
   
 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
  column=A:BODYSTYLE,
timestamp=1432899595317,
value=SEDAN
   
 0
   
 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan.
   
I do have this rowkey:
  110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00
So, i was expecting to get that row.
   
   
Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 ,
LIMIT=1}
   
I dont really know what's the difference between single quotes and
  double
quotes in startrow. Can anyone explain? Also, It would help others,
 if
  it
can be documented somewhere.
   
Thanks,
Anil
   
   
On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org
 wrote:
   
 Hi Anil,

 Can you please clarify what seems to be wrong for you?

 You asked for start row 33078. Which mean Rows starting with a
 3,
 followed by a 3, a 0, etc. and the first row returned start
 with
  a
4
 which is correct given the startrow you have specified.

 You seems to have a composite key. And you seems to scan without
   building
 the composite key. How have you created your table and what is your
  key
 design?

 JM

 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com:

  Hi All,
 
  I m having a lot of trouble dealing with HBase shell. I am
 running
  following query:
 
  scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' ,
 LIMIT=1}
 
  ROW
  COLUMN+CELL
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT
  UTILITY
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLESLUG, timestamp=1430280906358,
  value=sport-utility
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358,
  value=\x01
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARTYPE, timestamp=1430280906358, value={isLuxury:
  false,
  isTruck: false, isSedan: false, isCoupe: false, isSuv:
  true,
  isConvertible: false, isVan: false, isWagon: false,
  isEasyCareQualified: true}
 
  I specified, startRow='33078'. Then how come this result shows
 up?
What's
  going over here?
 
  --
  Thanks  Regards,
  Anil Gupta
 

   
   
   
--
Thanks  Regards,
Anil Gupta
   
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta

HBase shell providing wrong results with startrow(with composite key having String and Ints)

2015-06-08 Thread anil gupta

Hi All,

I m having a lot of trouble dealing with HBase shell. I am running
following query:

scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1}

ROW
COLUMN+CELL

 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT
UTILITY

 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:BODYSTYLESLUG, timestamp=1430280906358,
value=sport-utility

 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358,
value=\x01

 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false,
isTruck: false, isSedan: false, isCoupe: false, isSuv: true,
isConvertible: false, isVan: false, isWagon: false,
isEasyCareQualified: true}

I specified, startRow='33078'. Then how come this result shows up? What's
going over here?

-- 
Thanks  Regards,
Anil Gupta

Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)

2015-06-08 Thread anil gupta

So, if we have to match against non-string data in hbase shell. We should
always use double quotes? Even for matching values of cells?

On Mon, Jun 8, 2015 at 9:23 PM, Ted Yu yuzhih...@gmail.com wrote:

 Double quotes allow you to do string interpolation.
 Aother difference (one pertinent to Anil's question) is that 'escape
 sequence' does not work using single quote.

 Cheers

 On Mon, Jun 8, 2015 at 9:11 PM, anil gupta anilgupt...@gmail.com wrote:

  Hi Jean,
 
  My bad. I gave a wrong illustration. This is the query is was trying on
 my
  composite key:
  hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW =
  '110\x00' , LIMIT=1}
  ROW
  COLUMN+CELL
 
   12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE,
  timestamp=1432899595317,
  value=SEDAN
 
   0
 
   12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
  column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan.
 
  I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00
  So, i was expecting to get that row.
 
 
  Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 ,
  LIMIT=1}
 
  I dont really know what's the difference between single quotes and double
  quotes in startrow. Can anyone explain? Also, It would help others, if it
  can be documented somewhere.
 
  Thanks,
  Anil
 
 
  On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari 
  jean-m...@spaggiari.org
   wrote:
 
   Hi Anil,
  
   Can you please clarify what seems to be wrong for you?
  
   You asked for start row 33078. Which mean Rows starting with a 3,
   followed by a 3, a 0, etc. and the first row returned start with a
  4
   which is correct given the startrow you have specified.
  
   You seems to have a composite key. And you seems to scan without
 building
   the composite key. How have you created your table and what is your key
   design?
  
   JM
  
   2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com:
  
Hi All,
   
I m having a lot of trouble dealing with HBase shell. I am running
following query:
   
scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1}
   
ROW
COLUMN+CELL
   
 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT
UTILITY
   
 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:BODYSTYLESLUG, timestamp=1430280906358,
value=sport-utility
   
 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358,
value=\x01
   
 4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false,
isTruck: false, isSedan: false, isCoupe: false, isSuv: true,
isConvertible: false, isVan: false, isWagon: false,
isEasyCareQualified: true}
   
I specified, startRow='33078'. Then how come this result shows up?
  What's
going over here?
   
--
Thanks  Regards,
Anil Gupta
   
  
 
 
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta

Re: HBase shell providing wrong results with startrow(with composite key having String and Ints)

2015-06-08 Thread anil gupta

Hi Jean,

My bad. I gave a wrong illustration. This is the query is was trying on my
composite key:
hbase(main):017:0 scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW =
'110\x00' , LIMIT=1}
ROW
COLUMN+CELL

 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0 column=A:BODYSTYLE,
timestamp=1432899595317,
value=SEDAN

 0

 12\x0010123\x0019XFB2F56CE026679\x00\x80\x00\x00\x0
column=A:BODYSTYLESLUG, timestamp=1432899595317, value=sedan.

I do have this rowkey: 110\x0033078\x001C4AJWAG0CL260823\x00\x80\x00\x00
So, i was expecting to get that row.


Solution: scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = 110\x00 ,
LIMIT=1}

I dont really know what's the difference between single quotes and double
quotes in startrow. Can anyone explain? Also, It would help others, if it
can be documented somewhere.

Thanks,
Anil


On Mon, Jun 8, 2015 at 4:07 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org
 wrote:

 Hi Anil,

 Can you please clarify what seems to be wrong for you?

 You asked for start row 33078. Which mean Rows starting with a 3,
 followed by a 3, a 0, etc. and the first row returned start with a 4
 which is correct given the startrow you have specified.

 You seems to have a composite key. And you seems to scan without building
 the composite key. How have you created your table and what is your key
 design?

 JM

 2015-06-08 16:56 GMT-04:00 anil gupta anilgupt...@gmail.com:

  Hi All,
 
  I m having a lot of trouble dealing with HBase shell. I am running
  following query:
 
  scan 'CAR_ARCHIVE' , {COLUMNS='A', STARTROW = '33078' , LIMIT=1}
 
  ROW
  COLUMN+CELL
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLE, timestamp=1430280906358, value=SPORT
  UTILITY
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:BODYSTYLESLUG, timestamp=1430280906358,
  value=sport-utility
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARFAXREPORTAVAILABLE, timestamp=1430280906358,
  value=\x01
 
   4\x0010135\x001C4BJWEG2CL117550\x00\x7F\xFF\xFF\xFF
  column=A:CARTYPE, timestamp=1430280906358, value={isLuxury: false,
  isTruck: false, isSedan: false, isCoupe: false, isSuv: true,
  isConvertible: false, isVan: false, isWagon: false,
  isEasyCareQualified: true}
 
  I specified, startRow='33078'. Then how come this result shows up? What's
  going over here?
 
  --
  Thanks  Regards,
  Anil Gupta
 




-- 
Thanks  Regards,
Anil Gupta

Re: Hbase vs Cassandra

2015-05-29 Thread anil gupta

Hey Ajay,

Your topic of discussion of too broad.
There are tons of comparison on HBase vs Cassandra:
https://www.google.com/search?q=hbase+vs+cassandraie=utf-8oe=utf-8

Which one you should use, boils down to your use case? strong consistency?
range scans? need deeper integration with hadoop ecosystem?,etc
Please explain your use case and share your thoughts after doing some
preliminary reading.

Thanks,
Anil Gupta

On Fri, May 29, 2015 at 12:20 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 As for the #4 you might be interested in reading
 https://aphyr.com/posts/294-call-me-maybe-cassandra
 Not sure if there is comparable article about HBase (anybody knows?) but it
 can give you another perspective about what else to keep an eye on
 regarding these systems.

 Regards,
 Lukas

 On Fri, May 29, 2015 at 9:12 PM, Ajay ajay.ga...@gmail.com wrote:

  Hi,
 
  I need some info on Hbase vs Cassandra as a data store (in general plus
  specific to time series data).
 
  The comparison in the following helps:
  1: features
  2: deployment and monitoring
  3: performance
  4: anything else
 
  Thanks
  Ajay
 




-- 
Thanks  Regards,
Anil Gupta

Re: HBase failing to restart in single-user mode

2015-05-18 Thread anil gupta

 20:39:19,224 INFO
   [M:0;localhost:49807-SendThread(localhost:2181)] zookeeper.ClientCnxn:
   Session establishment complete on server localhost/127.0.0.1:2181,
   sessionid = 0x14d651aaec2, negotiated timeout = 400
   2015-05-17 20:39:19,249 INFO  [M:0;localhost:49807]
   regionserver.HRegionServer: ClusterId :
   6ad7eddd-2886-4ff0-b377-a2ff42c8632f
   2015-05-17 20:39:49,208 ERROR [main] master.HMasterCommandLine: Master
  exiting
   java.lang.RuntimeException: Master not active after 30 seconds
  at
 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:194)
  at
 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:445)
  at
 
 org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:197)
  at
 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
  at
 
 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
  at
 org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2002)
  
  
   I noticed that this has something to do with the ZooKeeper data.  If I
   rm -rf $TMPDIR/hbase-tsuna/zookeeper then I can start HBase again.
   But of course HBase won’t work properly because while some tables
   exist on the filesystem, they no longer exist in ZK, etc.
  
   Does anybody know what could be left behind in ZK that could make it
   hang during startup?  I looked at a jstack output while it was paused
   during 30s and didn’t find anything noteworthy.
  
   --
   Benoit tsuna Sigoure
 



 --
 Benoit tsuna Sigoure




-- 
Thanks  Regards,
Anil Gupta

Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)

2015-05-14 Thread anil gupta

 to separate them enough to not cause an issue.
  Thankfully we have not moved to secure HBase yet.  That's actually on the
  to-do list, but hoping to do it *after* the CDH upgrade.
 
  ---
 
  Thanks again guys.  I'm expecting this will be a drawn out process
  considering our scope, but will be happy to keep updates here as I
 proceed.
 
  On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez este...@cloudera.com
 
  wrote:
 
  Just to a little bit to what StAck said:
 
  --
  Cloudera, Inc.
 
 
  On Tue, May 5, 2015 at 3:53 PM, Stack st...@duboce.net wrote:
 
   On Tue, May 5, 2015 at 8:58 AM, Bryan Beaudreault 
   bbeaudrea...@hubspot.com
   wrote:
  
Hello,
   
I'm about to start tackling our upgrade path for 0.94 to 1.0+. We
  have 6
production hbase clusters, 2 hadoop clusters, and hundreds of
APIs/daemons/crons/etc hitting all of these things.  Many of these
   clients
hit multiple clusters in the same process.  Daunting to say the
 least.
   
   
   Nod.
  
  
  
We can't take full downtime on any of these, though we can take
   read-only.
And ideally we could take read-only on each cluster in a staggered
   fashion.
   
From a client perspective, all of our code currently assumes an
HTableInterface, which gives me some wiggle room I think.  With that
  in
mind, here's my current plan:
   
  
   You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
   mistakenly dropped anything you need? (I see that stuff has moved
 around
   but HTI should have everything still from 0.94)
  
  
   
- Shade CDH5 to something like org.apache.hadoop.cdh5.hbase.
- Create a shim implementation of HTableInterface.  This shim would
delegate to either the old cdh4 APIs or the new shaded CDH5 classes,
depending on the cluster being talked to.
- Once the shim is in place across all clients, I will put each
  cluster
into read-only (a client side config of ours), migrate data to a new
  CDH5
cluster, then bounce affected services so they look there instead. I
  will
do this for each cluster in sequence.
   
   
   Sounds like you have experience copying tables in background in a
 manner
   that minimally impinges serving given you have dev'd your own in-house
   cluster cloning tools?
  
   You will use the time while tables are read-only to 'catch-up' the
   difference between the last table copy and data that has come in
 since?
  
  
  
This provides a great rollback strategy, and with our existing
  in-house
cluster cloning tools we can minimize the read-only window to a few
   minutes
if all goes well.
   
There are a couple gotchas I can think of with the shim, which I'm
  hoping
some of you might have ideas/opinions on:
   
1) Since protobufs are used for communication, we will have to avoid
shading those particular classes as they need to match the
package/classnames on the server side.  I think this should be fine,
  as
these are net-new, not conflicting with CDH4 artifacts.  Any
additions/concerns here?
   
   
   CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5?
  
 
  If your clients are interacting with HDFS then you need to go the route
 of
  shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0
 use
  PBs in the RPC protocol and it shouldn't be any problem as long as you
  don't need security (this is mostly because the client does a UGI in the
  client and its easy to patch on both 0.94 and 1.0 to avoid to call UGI).
  Another option is to move your application to asynchbase and it should
 be
  clever enough to handle both HBase versions.
 
 
 
   I myself have little experience going a shading route so have little
 to
   contribute. Can you 'talk out loud' as you try stuff Bryan and if we
  can't
   help highlevel, perhaps we can help on specifics.
  
   St.Ack
  
 
  cheers,
  esteban.
 
 
 




-- 
Thanks  Regards,
Anil Gupta

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread anil gupta

Inline.

On Wed, May 13, 2015 at 10:31 AM, rahul malviya malviyarahul2...@gmail.com
wrote:

 *How many mapper/reducers are running per node for this job?*
 I am running 7-8 mappers per node. The spike is seen in mapper phase so no
 reducers where running at that point of time.

 *Also how many mappers are running as data local mappers?*
 How to determine this ?

On the counter web page of your job. Look for Data-local map tasks
counter.



 * You load/data equally distributed?*
 Yes as we use presplit hash keys in our hbase cluster and data is pretty
 evenly distributed.

 Thanks,
 Rahul


 On Wed, May 13, 2015 at 10:25 AM, Anil Gupta anilgupt...@gmail.com
 wrote:

  How many mapper/reducers are running per node for this job?
  Also how many mappers are running as data local mappers?
  You load/data equally distributed?
 
  Your disk, cpu ratio looks ok.
 
  Sent from my iPhone
 
   On May 13, 2015, at 10:12 AM, rahul malviya 
 malviyarahul2...@gmail.com
  wrote:
  
   *The High CPU may be WAIT IOs,  which would mean that you’re cpu is
  waiting
   for reads from the local disks.*
  
   Yes I think thats what is going on but I am trying to understand why it
   happens only in case of snapshot MR but if I run the same job without
  using
   snapshot everything is normal. What is the difference in snapshot
 version
   which can cause such a spike ? I looking through the code for snapshot
   version if I can find something.
  
   cores / disks == 24 / 12 or 40 / 12.
  
   We are using 10K sata drives on our datanodes.
  
   Rahul
  
   On Wed, May 13, 2015 at 10:00 AM, Michael Segel 
  michael_se...@hotmail.com
   wrote:
  
   Without knowing your exact configuration…
  
   The High CPU may be WAIT IOs,  which would mean that you’re cpu is
  waiting
   for reads from the local disks.
  
   What’s the ratio of cores (physical) to disks?
   What type of disks are you using?
  
   That’s going to be the most likely culprit.
   On May 13, 2015, at 11:41 AM, rahul malviya 
  malviyarahul2...@gmail.com
   wrote:
  
   Yes.
  
   On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com
 wrote:
  
   Have you enabled short circuit read ?
  
   Cheers
  
   On Wed, May 13, 2015 at 9:37 AM, rahul malviya 
   malviyarahul2...@gmail.com
   wrote:
  
   Hi,
  
   I have recently started running MR on hbase snapshots but when the
 MR
   is
   running there is pretty high CPU usage on datanodes and I start
  seeing
   IO
   wait message in datanode logs and as soon I kill the MR on Snapshot
   everything come back to normal.
  
   What could be causing this ?
  
   I am running cdh5.2.0 distribution.
  
   Thanks,
   Rahul
  
  
 




-- 
Thanks  Regards,
Anil Gupta

Re: MR against snapshot causes High CPU usage on Datanodes

2015-05-13 Thread Anil Gupta

How many mapper/reducers are running per node for this job?
Also how many mappers are running as data local mappers?
You load/data equally distributed?

Your disk, cpu ratio looks ok. 

Sent from my iPhone

 On May 13, 2015, at 10:12 AM, rahul malviya malviyarahul2...@gmail.com 
 wrote:
 
 *The High CPU may be WAIT IOs,  which would mean that you’re cpu is waiting
 for reads from the local disks.*
 
 Yes I think thats what is going on but I am trying to understand why it
 happens only in case of snapshot MR but if I run the same job without using
 snapshot everything is normal. What is the difference in snapshot version
 which can cause such a spike ? I looking through the code for snapshot
 version if I can find something.
 
 cores / disks == 24 / 12 or 40 / 12.
 
 We are using 10K sata drives on our datanodes.
 
 Rahul
 
 On Wed, May 13, 2015 at 10:00 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 Without knowing your exact configuration…
 
 The High CPU may be WAIT IOs,  which would mean that you’re cpu is waiting
 for reads from the local disks.
 
 What’s the ratio of cores (physical) to disks?
 What type of disks are you using?
 
 That’s going to be the most likely culprit.
 On May 13, 2015, at 11:41 AM, rahul malviya malviyarahul2...@gmail.com
 wrote:
 
 Yes.
 
 On Wed, May 13, 2015 at 9:40 AM, Ted Yu yuzhih...@gmail.com wrote:
 
 Have you enabled short circuit read ?
 
 Cheers
 
 On Wed, May 13, 2015 at 9:37 AM, rahul malviya 
 malviyarahul2...@gmail.com
 wrote:
 
 Hi,
 
 I have recently started running MR on hbase snapshots but when the MR
 is
 running there is pretty high CPU usage on datanodes and I start seeing
 IO
 wait message in datanode logs and as soon I kill the MR on Snapshot
 everything come back to normal.
 
 What could be causing this ?
 
 I am running cdh5.2.0 distribution.
 
 Thanks,
 Rahul

Re: [VOTE] First release candidate for HBase 1.1.0 (RC0) is available.

2015-04-30 Thread anil gupta

time
  on
2015-05-06 as to whether we should release these bits as
  HBase
 1.1.0.
   
Thanks,
Nick
   
  
 
 
 
 
 
 

   
   
   
   
   
   
  
 




-- 
Thanks  Regards,
Anil Gupta

hbase.apache.org homepage looks weird on Chrome and Firefox

2015-04-16 Thread anil gupta

Hi,

I am aware that recently there were some updates done on HBase website. For
last few months, more often than not, the homepage is displayed in weird
way in chrome and firefox. Is there a bug on homepage that is leading to
this view:
https://www.dropbox.com/s/jcpfnu4jwim28zg/Screen%20Shot%202015-04-15%20at%2011.18.46%20PM.png?dl=0
https://www.dropbox.com/s/o7xminppnzll6x7/Screen%20Shot%202015-04-15%20at%2011.19.55%20PM.png?dl=0

IMO, if the homepage looks broken then its hard to proceed ahead to read
the docs. My two cents.

Also, it would be nice if we could move docs of startgate from here:
https://wiki.apache.org/hadoop/Hbase/Stargate to hbase.apache.org.


-- 
Thanks  Regards,
Anil Gupta

Re: hbase.apache.org homepage looks weird on Chrome and Firefox

2015-04-16 Thread anil gupta

In chrome, i did Clear Browsing Data and then revisited
http://hbase.apache.org/;. It came up properly. Thanks for the pointer,
Nick.

On Thu, Apr 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org
wrote:

Looks fine for me, Chrome and Firefox tested. As Nick says Looks like the
CSS asset didn't load at Anil's location for whatever reason.

On Thu, Apr 16, 2015 at 8:36 AM, Stack st...@duboce.net wrote:

Are others running into the issue Anil sees?
Thanks,
St.Ack

On Thu, Apr 16, 2015 at 8:13 AM, anil gupta anilgupt...@gmail.com
wrote:

Chrome: Version 42.0.2311.90 (64-bit) on Mac

But, firefox(34.0.5) also displays the page in same way.

On Thu, Apr 16, 2015 at 12:58 AM, Ted Yu yuzhih...@gmail.com wrote:

Which Chrome version do you use ?

I use 41.0.2272.104 (64-bit) (on Mac) and the page renders fine.

Cheers

On Wed, Apr 15, 2015 at 11:27 PM, anil gupta anilgupt...@gmail.com
wrote:

Hi,

I am aware that recently there were some updates done on HBase
website.
For
last few months, more often than not, the homepage is displayed in
weird
way in chrome and firefox. Is there a bug on homepage that is
leading
to
this view:

https://www.dropbox.com/s/jcpfnu4jwim28zg/Screen%20Shot%202015-04-15%20at%2011.18.46%20PM.png?dl=0

https://www.dropbox.com/s/o7xminppnzll6x7/Screen%20Shot%202015-04-15%20at%2011.19.55%20PM.png?dl=0

IMO, if the homepage looks broken then its hard to proceed ahead to
read
the docs. My two cents.

Also, it would be nice if we could move docs of startgate from
here:
https://wiki.apache.org/hadoop/Hbase/Stargate to hbase.apache.org.

--
Thanks Regards,
Anil Gupta

--
Best regards,

- Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

--
Thanks Regards,
Anil Gupta

Getting binary data from HBase rest server without Base64 encoding

2015-04-15 Thread anil gupta

Hi All,

I want to fetch an image file from HBase using its Rest server. Right now,
i get a xml where the image byte array(cell value) is Base64 encoded. I
need to decode Base64 to Binary to view the image.
Is there are way where we can ask rest server not to perform base64
encoding and just return the Cell value(ie: the image file)

If its not there, and we were to do it. what kind of effort it would take?
Any pointers to code that i would need to modify would be appreciated.
-- 
Thanks  Regards,
Anil Gupta

1 2 3 4 >

1 - 100 of 368 matches

Mail list logo