Re: HBase Region Size of 2.5 TB

2016-08-26 Thread Ted Yu
>From IncreasingToUpperBoundRegionSplitPolicy#configureForRegion(): initialSize = conf.getLong("hbase.increasing.policy.initial.size", -1); ... if (initialSize <= 0) { initialSize = 2 * conf.getLong(HConstants.HREGION_MEMSTORE_FLUSH_SIZE, HTab

HBase Region Size of 2.5 TB

2016-08-26 Thread yeshwanth kumar
Hi we are using CDH 5.7 HBase 1.2 we are doing a performance testing over HBase through regular Load, which has 4 Region Servers. Input Data is compressed binary files around 2TB, which we process and write as Key-Value pairs to HBase. the output data size in HBase is almost 4 times around 8TB,

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hey.. Approx we have 294 regions in 42 region servers. Manish On Fri, Aug 26, 2016 at 3:05 PM, Ted Yu wrote: > I currently don't have concrete numbers but the impact is not big. > > How many regions are there in the table(s) ? > > Cheers > > On Fri, Aug 26, 2016 at 2:57 PM, Manish Maheshwari >

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Ted. On Fri, Aug 26, 2016 at 3:16 PM, Ted Yu wrote: > For #1, please look at the following method in HTable.java : > > public NavigableMap getRegionLocations() throws > IOException { > > Cheers > > On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari > wrote: > > > Thanks Rahul. > > > >

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
For #1, please look at the following method in HTable.java : public NavigableMap getRegionLocations() throws IOException { Cheers On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari wrote: > Thanks Rahul. > > 1 - I understand the idea of listing the usage on each of the disks that we > have H

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Rahul. 1 - I understand the idea of listing the usage on each of the disks that we have HBase running on for that table. However how do I map the Nodes to Regions. I looked at RegionLocator - getStartEndKeys. But these just give me the values and not the Hostnames where each region is curre

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
I currently don't have concrete numbers but the impact is not big. How many regions are there in the table(s) ? Cheers On Fri, Aug 26, 2016 at 2:57 PM, Manish Maheshwari wrote: > Thanks Ted. I looked into using JMX. Unfortunately it requires us to > restart HBase after the config changes. In t

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Thanks Ted. I looked into using JMX. Unfortunately it requires us to restart HBase after the config changes. In the production environment we are unable to do so. The table size is small. Around 9.6 TB. We have around 42 nodes each with 10 TB storage. The scan will take time, but would need a HBase

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread rahul gidwani
If you want to see which regionservers are currently hot, then jmx would be the best way to get that data. If you want to see overall what is hot, you can do this without the use of a scan (it will be a pretty decent estimate) you can do: hdfs dfs -du /hbase/data/default// with that data you ca

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
Have you looked at /jmx endpoint on the servers ? Below is a sample w.r.t. the metrics that would be of interest to you: "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b50551_metric_appendCount" : 0, "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576ad_metric_scanNext_

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hi Ted, I understand the region crash/migration/splitting impact. Currently we have hotspotting on few region servers. I am trying to collect the row stats at region server and region levels to see how bad the skew of the data is. Manish On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu wrote: > Can yo

[DISCUSS] 0.98 branch disposition

2016-08-26 Thread Andrew Purtell
Greetings, HBase 0.98.0 was released in February of 2014. We have had 21 releases in 2 1/2 years at a fairly regular cadence, a terrific run for any software product. However as 0.98 RM I think it's now time to discuss winding down 0.98. I want to give you notice of this as far in advance as possi

Re: HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Ted Yu
Can you elaborate on your use case ? Suppose row A is on server B, after you retrieve row A, the region for row A gets moved to server C (load balancer or server crash). Server B would no longer be relevant. Cheers On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari wrote: > Hi, > > I looked a

HBase - Count Rows in Regions and Region Servers

2016-08-26 Thread Manish Maheshwari
Hi, I looked at the HBase Count functionality to count rows in a Table. Is there a way that we can count the number of rows in Regions & Region Servers? When we use a HBase scan, we dont get the Region ID or Region Server of the row. Is there a way to do this via Scans? Thanks, Manish

Re: Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Ted Yu
Looks like the image didn't go through. Can you pastebin the error ? Cheers On Fri, Aug 26, 2016 at 7:28 AM, Manjeet Singh wrote: > Adding > I am getting below error on truncating the table > > [image: Inline image 1] > > On Fri, Aug 26, 2016 at 7:56 PM, Manjeet Singh > wrote: > >> Hi All >>

Re: Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Manjeet Singh
Adding I am getting below error on truncating the table [image: Inline image 1] On Fri, Aug 26, 2016 at 7:56 PM, Manjeet Singh wrote: > Hi All > > I am using wide table approach where I have might have more 1,00, > column qualifier > > I am getting problem as below > Heap size problem by u

Hbase Heap Size problem and Native API response is slow

2016-08-26 Thread Manjeet Singh
Hi All I am using wide table approach where I have might have more 1,00, column qualifier I am getting problem as below Heap size problem by using scan on shell , as a solution I increase java heap size by using cloudera manager to 4 GB second I have below Native API code It took very long

Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Ted Yu
Can you take a look at the replication bridge [0] Jeffrey wrote ? It used both client library versions through JarJar [1] to avoid name collision. [0]: https://github.com/hortonworks/HBaseReplicationBridgeServer [1]: https://code.google.com/p/jarjar/ On Fri, Aug 26, 2016 at 12:26 AM, Enrico Oliv

Re: adding a column to exiting tables

2016-08-26 Thread Ted Yu
Switching to user@ http://hbase.apache.org/book.html#datamodel By column I guess you mean column qualifier. The addition of column qualifier in future writes can be performed based on existing schema. On application side, when row retrieved doesn't contain the new column qualifier, you can inter

Re: Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread sudhir patil
Great, thanks Ted. On Aug 26, 2016 7:29 PM, "Ted Yu" wrote: > Replication between 0.98.6 and 1.2.0 should work. > > Thanks > > > On Aug 26, 2016, at 1:59 AM, spats wrote: > > > > > > Does hbase replication works between different versions 0.98.6 and 1.2.0? > > > > We are in the process of upgra

Re: Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread Ted Yu
Replication between 0.98.6 and 1.2.0 should work. Thanks > On Aug 26, 2016, at 1:59 AM, spats wrote: > > > Does hbase replication works between different versions 0.98.6 and 1.2.0? > > We are in the process of upgrading our clusters & during that time we want > to make sure if replication

Hbase replication between 0.98.6 and 1.2.0 versions

2016-08-26 Thread spats
Does hbase replication works between different versions 0.98.6 and 1.2.0? We are in the process of upgrading our clusters & during that time we want to make sure if replication will work fine across clusters. It would be really helpful if anyone can share about hbase replication with different v

Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Dima Spivak
Sadly, there is no "easy" way to do it (blame filesystem changes and the rpc differences, among other things). A while back, someone posted about how he was able to do a snapshot export between 0.94 and 0.98 [1] but this is not officially supported. Perhaps someone else has ideas? 1. http://mail-

Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Enrico Olivelli - Diennea
Thank you Dima for your quick answer. Do you think that it would be possible to create a shaded version of the 0.94 client (with all the dependencies) and let it live inside the same JVM of a pure 1.2.2 client ? My real need is to copy data from a 0.94 cluster to a new 1.2.2 installation, but

Re: Accessing different HBase versions from the same JVM

2016-08-26 Thread Dima Spivak
I would say no; 0.94 is not wire compatible with 1.2.2 because the former uses Hadoop IPC and the latter uses protocol buffers. Sorry, Enrico. On Friday, August 26, 2016, Enrico Olivelli - Diennea < enrico.olive...@diennea.com> wrote: > Hi, > I would like to connect to both a 0.94 hbase cluster

Accessing different HBase versions from the same JVM

2016-08-26 Thread Enrico Olivelli - Diennea
Hi, I would like to connect to both a 0.94 hbase cluster and a 1.2.2 hbase cluster from the same JVM I think that 0.94 client code is not compatible with 1.2.2 do you think it is possible ? Thank you -- Enrico Olivelli Software Development Manager @Diennea Tel.: (+39) 0546 066100 - Int. 925 Vi