Re: High CPU utilization by meta region

2016-11-21 Thread Stack
Can we see configs -- encodings? -- and a thread dump?  Any I/O? If you
look in HDFS, many files under hbase:meta? Is it big? When was last time it
major compacted?

Thanks,
S

On Mon, Nov 21, 2016 at 5:50 PM, Timothy Brown  wrote:

> Hi,
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> Here's some more info about our cluster:
> HBase version 1.2
> Number of regions: 72
> Number of tables: 97
> Approx. requests per second to meta region server: 3k
> Approx. requests per second to entire HBase cluster: 90k
>
> Let me know what other information would be useful.
>
> Thanks for the help,
> Tim
>


High CPU utilization by meta region

2016-11-21 Thread Timothy Brown
Hi,

We are seeing about 80% CPU utilization on the Region Server that solely
serves the meta table while other region servers typically have under 50%
CPU utilization. Is this expected?

Here's some more info about our cluster:
HBase version 1.2
Number of regions: 72
Number of tables: 97
Approx. requests per second to meta region server: 3k
Approx. requests per second to entire HBase cluster: 90k

Let me know what other information would be useful.

Thanks for the help,
Tim


Re: WrongRowIOException

2016-11-21 Thread Julian Jaffe
Both clusters were are running the same version of hbase, and hadoop, with
matching compile dates and checksums. Also, `hbase hbck` showed no
inconsistencies in the source hbase instance.


Stack trace:

org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of
TaskAttempt attempt_1470780584817_0448_m_000346_3 is : 0.17224467
2016-11-20 06:06:01,200 FATAL [IPC Server handler 28 on 39202]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task:
attempt_1470780584817_0448_m_000346_3 - exited :
org.apache.hadoop.hbase.client.WrongRowIOException: The row in
\x00\x00\x0710356613704525352\x00\x80\x00\x00\x00\x85B\xBCU/IN:crdAt/1462937192582/Put/vlen=12/seqid=0
doesn't match the original one
\x00\x00\x0710156613704525352\x00\x80\x00\x00\x00\x85B\xBCU
at org.apache.hadoop.hbase.client.Put.add(Put.java:321)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.addPutToKv(Import.java:215)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.processKV(Import.java:195)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.writeResult(Import.java:158)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:143)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:126)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-11-20 06:06:01,200 INFO [IPC Server handler 28 on 39202]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report
from attempt_1470780584817_0448_m_000346_3: Error:
org.apache.hadoop.hbase.client.WrongRowIOException: The row in
\x00\x00\x0710356613704525352\x00\x80\x00\x00\x00\x85B\xBCU/IN:crdAt/1462937192582/Put/vlen=12/seqid=0
doesn't match the original one
\x00\x00\x0710156613704525352\x00\x80\x00\x00\x00\x85B\xBCU
at org.apache.hadoop.hbase.client.Put.add(Put.java:321)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.addPutToKv(Import.java:215)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.processKV(Import.java:195)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.writeResult(Import.java:158)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:143)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:126)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2016-11-20 06:06:01,200 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
Diagnostics report from attempt_1470780584817_0448_m_000346_3: Error:
org.apache.hadoop.hbase.client.WrongRowIOException: The row in
\x00\x00\x0710356613704525352\x00\x80\x00\x00\x00\x85B\xBCU/IN:crdAt/1462937192582/Put/vlen=12/seqid=0
doesn't match the original one
\x00\x00\x0710156613704525352\x00\x80\x00\x00\x00\x85B\xBCU
at org.apache.hadoop.hbase.client.Put.add(Put.java:321)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.addPutToKv(Import.java:215)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.processKV(Import.java:195)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.writeResult(Import.java:158)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:143)
at 
org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:126)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


On Mon, Nov 21, 2016 at 4:58 PM, Ted Yu  wrote:

> Can you give the whole stack trace 

Re: WrongRowIOException

2016-11-21 Thread Ted Yu
Can you give the whole stack trace for WrongRowIOException ?

Was the cluster running Export using the same version of hbase (
1.0.0-cdh5.5.2) ?

Thanks

On Mon, Nov 21, 2016 at 4:35 PM, Julian Jaffe 
wrote:

> Hbase Version: 1.0.0-cdh5.5.2
>
> We're importing the data using `hbase
> org.apache.hadoop.hbase.mapreduce.Import  'table.name' /path/to/backup`
> (The data was exported from an HBase instance on another cluster using
> `hbase org.apache.hadoop.hbase.mapreduce.Export` and then distcp'd between
> the clusters).
>
> On Mon, Nov 21, 2016 at 4:29 PM, Ted Yu  wrote:
>
> > I did a quick search - there was no relevant JIRA or discussion thread at
> > first glance.
> >
> > Which hbase release are you using ?
> >
> > How do you import the data ?
> >
> > More details would be helpful.
> >
> > Thanks
> >
> > On Mon, Nov 21, 2016 at 2:48 PM, Julian Jaffe 
> > wrote:
> >
> > > When importing data into a fresh HBase instance, after some time the
> > import
> > > throws the following exception:
> > >
> > > Error: org.apache.hadoop.hbase.client.WrongRowIOException: The row in
> > > \x00\x00\x0767341283611_10153807927108612\x00\x80\x00\
> > > x00\x00\x84)L\xA7/IN:nme/1461847340445/Put/vlen=42/seqid=0
> > > doesn't match the original one
> > > \x00\x00\x0767341283611_10153805927108612\x00\x80\x00\
> x00\x00\x84)L\xA7
> > >
> > > (The non-matching row differs on different runs).
> > >
> > > If the import is allowed to run to completion, the row count of the
> data
> > > imported is less than the row count of the source data.
> > >
> > > Googling for this error only turns up the source code that generates
> the
> > > error, so it doesn't seem to be a common problem.
> > >
> > > Can anyone provide any guidance?
> > >
> > > Julian Jaffe
> > >
> >
>


Re: WrongRowIOException

2016-11-21 Thread Ted Yu
I did a quick search - there was no relevant JIRA or discussion thread at
first glance.

Which hbase release are you using ?

How do you import the data ?

More details would be helpful.

Thanks

On Mon, Nov 21, 2016 at 2:48 PM, Julian Jaffe 
wrote:

> When importing data into a fresh HBase instance, after some time the import
> throws the following exception:
>
> Error: org.apache.hadoop.hbase.client.WrongRowIOException: The row in
> \x00\x00\x0767341283611_10153807927108612\x00\x80\x00\
> x00\x00\x84)L\xA7/IN:nme/1461847340445/Put/vlen=42/seqid=0
> doesn't match the original one
> \x00\x00\x0767341283611_10153805927108612\x00\x80\x00\x00\x00\x84)L\xA7
>
> (The non-matching row differs on different runs).
>
> If the import is allowed to run to completion, the row count of the data
> imported is less than the row count of the source data.
>
> Googling for this error only turns up the source code that generates the
> error, so it doesn't seem to be a common problem.
>
> Can anyone provide any guidance?
>
> Julian Jaffe
>


WrongRowIOException

2016-11-21 Thread Julian Jaffe
When importing data into a fresh HBase instance, after some time the import
throws the following exception:

Error: org.apache.hadoop.hbase.client.WrongRowIOException: The row in
\x00\x00\x0767341283611_10153807927108612\x00\x80\x00\x00\x00\x84)L\xA7/IN:nme/1461847340445/Put/vlen=42/seqid=0
doesn't match the original one
\x00\x00\x0767341283611_10153805927108612\x00\x80\x00\x00\x00\x84)L\xA7

(The non-matching row differs on different runs).

If the import is allowed to run to completion, the row count of the data
imported is less than the row count of the source data.

Googling for this error only turns up the source code that generates the
error, so it doesn't seem to be a common problem.

Can anyone provide any guidance?

Julian Jaffe


Re: How Hbase perform distribution of rowkey in region server

2016-11-21 Thread Ted Yu
Manjeet:
With 3 regions (actually 4, considering the region with empty start key)
for the table, data wouldn't be distributed onto 100 nodes - there are not
enough regions to spread across all the region servers.

Assuming the table would receive much data, you can split the table so that
the regions spread more evenly across the nodes.

On Mon, Nov 21, 2016 at 12:48 AM, Manjeet Singh 
wrote:

> Hi Anoop its clear to me how Hbase distribute data among regions
>
> as you told
>
> Rowkeys starting with a,b will be assigned to region 1
>
> Rowkeys starting with b,c will be assigned to region 2
>
> Rowkeys starting with c,.. will be assigned to region 3
>
>
> my question is if i have 100 region server ok and by using this pre-split
> policy Hbase create empty region on every data note i.e. actually region
> server so if i insert one rk as showing below
>
> a_data123
>
> how hbase distrubute this data into all 100 nodes? does it make 100 copy
> and put into all RS?
>
>
> Thanks
>
> Manjeet
>
> On Mon, Nov 21, 2016 at 2:11 PM, Anoop John  wrote:
>
> > When u create table this way, it will have 4 regions totally.  ( , a],
> > (a, b], (b, c], (c, ]..It is not like every RS will get 3 splits.
> >   The Load balancer in master will distribute these regions on to
> > available RSs.  So when a rk comes say a_data1, it corresponds to
> > exactly ONE region which is in exactly ONE RS.  So the client reaches
> > that RS and region for this RK.
> >
> > -Anoop-
> >
> > On Mon, Nov 21, 2016 at 1:59 PM, Manjeet Singh
> >  wrote:
> > > Hi All
> > >
> > > My question is very simple I have created my table using pre-split
> > > as showing below
> > >
> > >  create 'test_table', 'CF1', SPLITS=> ['a', 'b', 'c']
> > >
> > > and i have 4 region server
> > > can anyone help me how hbase distribute rowkey into the regio server
> > please
> > > mind I am asking resion server not regions
> > >
> > > what I am assuming
> > >
> > > based on pre-split all region will have 3 region created on on each
> > resion
> > > server
> > > if one rowkey comes say a_data1, it will go on all refion server in
> > region
> > > where region boundry lie in between start rowekey and end rowkey
> > >
> > > please clear if I am on wrong assuming
> > >
> > > Thanks
> > > Manjeet
> > >
> > > --
> > > luv all
> >
>
>
>
> --
> luv all
>


Re: hbase/spark - Delegation Token can be issued only with kerberos or web authentication

2016-11-21 Thread Nkechi Achara
I am still convinced that it could be due to class path issues but I might
be missing something.

Just to make sure Have you checked the use of the principal / keytab
only on the driver only so you can make sure the tgt is valid.

I am using the same config but with CDH 5.5.2, but I am using a retrofit of
cloudera labs hbase on spark.

Thanks

On 21 Nov 2016 5:32 p.m., "Abel Fernández"  wrote:

> I have included into the spark-submit and into all nodemanagers and drivers
> the krb5.conf and the jaas.conf, but I am still having the same problem.
>
> I think the problem is this piece of code, it is trying to execute a
> function into the executors and for some reason, the executors cannot get a
> valid credentials.
>
> /**
>  * A simple enrichment of the traditional Spark RDD foreachPartition.
>  * This function differs from the original in that it offers the
>  * developer access to a already connected Connection object
>  *
>  * Note: Do not close the Connection object.  All Connection
>  * management is handled outside this method
>  *
>  * @param rdd  Original RDD with data to iterate over
>  * @param fFunction to be given a iterator to iterate through
>  * the RDD values and a Connection object to interact
>  * with HBase
>  */
> def foreachPartition[T](rdd: RDD[T],
> f: (Iterator[T], Connection) => Unit):Unit = {
>   rdd.foreachPartition(
> it => hbaseForeachPartition(broadcastedConf, it, f))
> }
>
>
> The first thing is trying to do the hbaseForeachPartition is getting the
> credentials but I think this code is never executed:
>
> /**
>  *  underlining wrapper all foreach functions in HBaseContext
>  */
> private def hbaseForeachPartition[T](configBroadcast:
>
> Broadcast[SerializableWritable[Configuration]],
>   it: Iterator[T],
>   f: (Iterator[T], Connection) =>
> Unit) = {
>
>   val config = getConf(configBroadcast)
>
>   applyCreds
>   // specify that this is a proxy user
>   val smartConn = HBaseConnectionCache.getConnection(config)
>   f(it, smartConn.connection)
>   smartConn.close()
> }
>
>
> This is the latest spark-submit I am using:
> #!/bin/bash
>
> SPARK_CONF_DIR=conf-hbase spark-submit --master yarn-cluster \
>   --executor-memory 6G \
>   --num-executors 10 \
>   --queue cards \
>   --executor-cores 4 \
>   --driver-java-options "-Dlog4j.configuration=file:log4j.properties" \
>   --driver-java-options "-Djava.security.krb5.conf=/etc/krb5.conf" \
>   --driver-java-options
> "-Djava.security.auth.login.config=/opt/company/conf/jaas.conf" \
>   --driver-class-path "$2" \
>   --jars file:/opt/company/lib/rocksdbjni-4.5.1.jar \
>   --conf
> "spark.driver.extraClassPath=/var/cloudera/parcels/CDH/lib/
> hbase/lib/htrace-core-3.2.0-incubating.jar:/var/cloudera/
> parcels/CDH/jars/hbase-server-1.0.0-cdh5.5.4.jar:/var/
> cloudera/parcels/CDH/jars/hbase-common-1.0.0-cdh5.5.4.
> jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-client-1.
> 0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/lib/
> hbase/lib/hbase-protocol-1.0.0-cdh5.5.4.jar:/opt/orange/
> lib/rocksdbjni-4.5.1.jar:/var/cloudera/parcels/CLABS_
> PHOENIX-4.5.2-1.clabs_phoenix1.2.0.p0.774/lib/
> phoenix/lib/phoenix-core-1.2.0.jar:/var/cloudera/parcels/
> CDH/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.5.4.jar"
> \
>   --conf
> "spark.executor.extraClassPath=/var/cloudera/parcels/CDH/lib/hbase/lib/
> htrace-core-3.2.0-incubating.jar:/var/cloudera/parcels/CDH/
> jars/hbase-server-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/
> CDH/jars/hbase-common-1.0.0-cdh5.5.4.jar:/var/cloudera/
> parcels/CDH/lib/hbase/lib/hbase-client-1.0.0-cdh5.5.4.
> jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-protocol-
> 1.0.0-cdh5.5.4.jar:/opt/orange/lib/rocksdbjni-4.5.1.
> jar:/var/cloudera/parcels/CLABS_PHOENIX-4.5.2-1.clabs_
> phoenix1.2.0.p0.774/lib/phoenix/lib/phoenix-core-1.2.
> 0.jar:/var/cloudera/parcels/CDH/jars/hadoop-mapreduce-
> client-core-2.6.0-cdh5.5.4.jar"\
>   --principal hb...@company.corp \
>   --keytab /opt/company/conf/hbase.keytab \
>   --files
> "owl.properties,conf-hbase/log4j.properties,conf-hbase/
> hbase-site.xml,conf-hbase/core-site.xml,$2"
> \
>   --class $1 \
>   cards-batch-$3-jar-with-dependencies.jar $2
>
>
>
> On Fri, 18 Nov 2016 at 16:37 Abel Fernández  wrote:
>
> > No worries.
> >
> > This is the spark version we are using:  1.5.0-cdh5.5.4
> >
> > I have to use Hbase context, it is the first parameter for the method I
> am
> > using to generate the HFiles (HbaseRDDFunctions.hbaseBulkLoadThinRows)
> >
> > On Fri, 18 Nov 2016 at 16:06 Nkechi Achara 
> > wrote:
> >
> > Sorry on my way to a flight.
> >
> > Read is required for a keytab to be permissioned properly. So that looks
> > fine in your case.
> >
> > I do not have my PC with me, but have you tried to use Hbase without
> using
> > Hbase context.
> >
> > Also which version of Spark 

Re: hbase/spark - Delegation Token can be issued only with kerberos or web authentication

2016-11-21 Thread Abel Fernández
I have included into the spark-submit and into all nodemanagers and drivers
the krb5.conf and the jaas.conf, but I am still having the same problem.

I think the problem is this piece of code, it is trying to execute a
function into the executors and for some reason, the executors cannot get a
valid credentials.

/**
 * A simple enrichment of the traditional Spark RDD foreachPartition.
 * This function differs from the original in that it offers the
 * developer access to a already connected Connection object
 *
 * Note: Do not close the Connection object.  All Connection
 * management is handled outside this method
 *
 * @param rdd  Original RDD with data to iterate over
 * @param fFunction to be given a iterator to iterate through
 * the RDD values and a Connection object to interact
 * with HBase
 */
def foreachPartition[T](rdd: RDD[T],
f: (Iterator[T], Connection) => Unit):Unit = {
  rdd.foreachPartition(
it => hbaseForeachPartition(broadcastedConf, it, f))
}


The first thing is trying to do the hbaseForeachPartition is getting the
credentials but I think this code is never executed:

/**
 *  underlining wrapper all foreach functions in HBaseContext
 */
private def hbaseForeachPartition[T](configBroadcast:

Broadcast[SerializableWritable[Configuration]],
  it: Iterator[T],
  f: (Iterator[T], Connection) => Unit) = {

  val config = getConf(configBroadcast)

  applyCreds
  // specify that this is a proxy user
  val smartConn = HBaseConnectionCache.getConnection(config)
  f(it, smartConn.connection)
  smartConn.close()
}


This is the latest spark-submit I am using:
#!/bin/bash

SPARK_CONF_DIR=conf-hbase spark-submit --master yarn-cluster \
  --executor-memory 6G \
  --num-executors 10 \
  --queue cards \
  --executor-cores 4 \
  --driver-java-options "-Dlog4j.configuration=file:log4j.properties" \
  --driver-java-options "-Djava.security.krb5.conf=/etc/krb5.conf" \
  --driver-java-options
"-Djava.security.auth.login.config=/opt/company/conf/jaas.conf" \
  --driver-class-path "$2" \
  --jars file:/opt/company/lib/rocksdbjni-4.5.1.jar \
  --conf
"spark.driver.extraClassPath=/var/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.2.0-incubating.jar:/var/cloudera/parcels/CDH/jars/hbase-server-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/jars/hbase-common-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-client-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-protocol-1.0.0-cdh5.5.4.jar:/opt/orange/lib/rocksdbjni-4.5.1.jar:/var/cloudera/parcels/CLABS_PHOENIX-4.5.2-1.clabs_phoenix1.2.0.p0.774/lib/phoenix/lib/phoenix-core-1.2.0.jar:/var/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.5.4.jar"
\
  --conf
"spark.executor.extraClassPath=/var/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.2.0-incubating.jar:/var/cloudera/parcels/CDH/jars/hbase-server-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/jars/hbase-common-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-client-1.0.0-cdh5.5.4.jar:/var/cloudera/parcels/CDH/lib/hbase/lib/hbase-protocol-1.0.0-cdh5.5.4.jar:/opt/orange/lib/rocksdbjni-4.5.1.jar:/var/cloudera/parcels/CLABS_PHOENIX-4.5.2-1.clabs_phoenix1.2.0.p0.774/lib/phoenix/lib/phoenix-core-1.2.0.jar:/var/cloudera/parcels/CDH/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.5.4.jar"\
  --principal hb...@company.corp \
  --keytab /opt/company/conf/hbase.keytab \
  --files
"owl.properties,conf-hbase/log4j.properties,conf-hbase/hbase-site.xml,conf-hbase/core-site.xml,$2"
\
  --class $1 \
  cards-batch-$3-jar-with-dependencies.jar $2



On Fri, 18 Nov 2016 at 16:37 Abel Fernández  wrote:

> No worries.
>
> This is the spark version we are using:  1.5.0-cdh5.5.4
>
> I have to use Hbase context, it is the first parameter for the method I am
> using to generate the HFiles (HbaseRDDFunctions.hbaseBulkLoadThinRows)
>
> On Fri, 18 Nov 2016 at 16:06 Nkechi Achara 
> wrote:
>
> Sorry on my way to a flight.
>
> Read is required for a keytab to be permissioned properly. So that looks
> fine in your case.
>
> I do not have my PC with me, but have you tried to use Hbase without using
> Hbase context.
>
> Also which version of Spark are you using?
>
> On 18 Nov 2016 16:01, "Abel Fernández"  wrote:
>
> > Yep, the keytab is also in the driver into the same location.
> >
> > -rw-r--r-- 1 hbase root  370 Nov 16 17:13 hbase.keytab
> >
> > Do you know what are the permissions that the keytab should have?
> >
> >
> >
> > On Fri, 18 Nov 2016 at 14:19 Nkechi Achara 
> > wrote:
> >
> > > Sorry just realised you had the submit command in the attached docs.
> > >
> > > Can I ask if the keytab is also on the driver in the same location?
> > >
> > > The spark option normally requires the keytab to be on the driver so it
> > can
> > > pick it up and pass it to yarn 

Re: How Hbase perform distribution of rowkey in region server

2016-11-21 Thread Manjeet Singh
Hi Anoop its clear to me how Hbase distribute data among regions

as you told

Rowkeys starting with a,b will be assigned to region 1

Rowkeys starting with b,c will be assigned to region 2

Rowkeys starting with c,.. will be assigned to region 3


my question is if i have 100 region server ok and by using this pre-split
policy Hbase create empty region on every data note i.e. actually region
server so if i insert one rk as showing below

a_data123

how hbase distrubute this data into all 100 nodes? does it make 100 copy
and put into all RS?


Thanks

Manjeet

On Mon, Nov 21, 2016 at 2:11 PM, Anoop John  wrote:

> When u create table this way, it will have 4 regions totally.  ( , a],
> (a, b], (b, c], (c, ]..It is not like every RS will get 3 splits.
>   The Load balancer in master will distribute these regions on to
> available RSs.  So when a rk comes say a_data1, it corresponds to
> exactly ONE region which is in exactly ONE RS.  So the client reaches
> that RS and region for this RK.
>
> -Anoop-
>
> On Mon, Nov 21, 2016 at 1:59 PM, Manjeet Singh
>  wrote:
> > Hi All
> >
> > My question is very simple I have created my table using pre-split
> > as showing below
> >
> >  create 'test_table', 'CF1', SPLITS=> ['a', 'b', 'c']
> >
> > and i have 4 region server
> > can anyone help me how hbase distribute rowkey into the regio server
> please
> > mind I am asking resion server not regions
> >
> > what I am assuming
> >
> > based on pre-split all region will have 3 region created on on each
> resion
> > server
> > if one rowkey comes say a_data1, it will go on all refion server in
> region
> > where region boundry lie in between start rowekey and end rowkey
> >
> > please clear if I am on wrong assuming
> >
> > Thanks
> > Manjeet
> >
> > --
> > luv all
>



-- 
luv all