[jira] [Comment Edited] (HBASE-17503) [C++] Configuration should be settable and used w/o XML files

2017-01-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834162#comment-15834162
 ] 

Heng Chen edited comment on HBASE-17503 at 1/23/17 10:05 AM:
-

{code}
+void Configuration::SetInt(const std::string , const int32_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetLong(const std::string , const int64_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetDouble(const std::string , const double value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetBool(const std::string , const bool value) {
+  Set(key, boost::lexical_cast(value));
+}
{code}

Hints: 
 * According to google c++ code style,  pass-in value should not use "const", 
only reference need it. :)
 * About GTest  EXPECT_EQ function,  the first param should the expected one,  
it seems we make the wrong order in the patch. :)


was (Author: chenheng):
{code}
+void Configuration::SetInt(const std::string , const int32_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetLong(const std::string , const int64_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetDouble(const std::string , const double value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetBool(const std::string , const bool value) {
+  Set(key, boost::lexical_cast(value));
+}
{code}

Hints: 
According to google c++ code style,  pass-in value should not use "const", only 
reference need it. :)

> [C++] Configuration should be settable and used w/o XML files
> -
>
> Key: HBASE-17503
> URL: https://issues.apache.org/jira/browse/HBASE-17503
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: HBASE-14850
>
> Attachments: hbase-17503_v1.patch
>
>
> Configuration right now is read-only, and there is only XML based 
> configuration loader. 
> However, in testing, we need the Config object w/o the XML files, and we need 
> to be able to set specific values in the conf.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17503) [C++] Configuration should be settable and used w/o XML files

2017-01-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15834162#comment-15834162
 ] 

Heng Chen commented on HBASE-17503:
---

{code}
+void Configuration::SetInt(const std::string , const int32_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetLong(const std::string , const int64_t value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetDouble(const std::string , const double value) {
+  Set(key, boost::lexical_cast(value));
+}
+
+void Configuration::SetBool(const std::string , const bool value) {
+  Set(key, boost::lexical_cast(value));
+}
{code}

Hints: 
According to google c++ code style,  pass-in value should not use "const", only 
reference need it. :)

> [C++] Configuration should be settable and used w/o XML files
> -
>
> Key: HBASE-17503
> URL: https://issues.apache.org/jira/browse/HBASE-17503
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: HBASE-14850
>
> Attachments: hbase-17503_v1.patch
>
>
> Configuration right now is read-only, and there is only XML based 
> configuration loader. 
> However, in testing, we need the Config object w/o the XML files, and we need 
> to be able to set specific values in the conf.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.19+

2016-11-15 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-17096:
--
Fix Version/s: 0.98.24
   Status: Patch Available  (was: Open)

> checkAndMutateApi doesn't work correctly on 0.98.19+
> 
>
> Key: HBASE-17096
> URL: https://issues.apache.org/jira/browse/HBASE-17096
> Project: HBase
>  Issue Type: Bug
>Reporter: Samarth Jain
> Fix For: 0.98.24
>
> Attachments: HBASE-17096-0.98.patch, HBASE-17096-0.98.v2.patch
>
>
> Below is the test case. It uses some Phoenix APIs for getting hold of admin 
> and HConnection but should be easily adopted for an HBase IT test. The second 
> checkAndMutate should return false but it is returning true. This test fails 
> with HBase-0.98.23 and works fine with HBase-0.98.17
> {code}
> @Test
> public void testCheckAndMutateApi() throws Exception {
> byte[] row = Bytes.toBytes("ROW");
> byte[] tableNameBytes = Bytes.toBytes(generateUniqueName());
> byte[] family = Bytes.toBytes(generateUniqueName());
> byte[] qualifier = Bytes.toBytes("QUALIFIER");
> byte[] oldValue = null;
> byte[] newValue = Bytes.toBytes("VALUE");
> Put put = new Put(row);
> put.add(family, qualifier, newValue);
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class);
> try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) {
> HTableDescriptor tableDesc = new HTableDescriptor(
> TableName.valueOf(tableNameBytes));
> HColumnDescriptor columnDesc = new HColumnDescriptor(family);
> columnDesc.setTimeToLive(120);
> tableDesc.addFamily(columnDesc);
> admin.createTable(tableDesc);
> HTableInterface tableDescriptor = 
> admin.getConnection().getTable(tableNameBytes);
> assertTrue(tableDescriptor.checkAndPut(row, family, 
> qualifier, oldValue, put));
> Delete delete = new Delete(row);
> RowMutations mutations = new RowMutations(row);
> mutations.add(delete);
> assertTrue(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> assertFalse(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> }
> }
> }
> {code}
> FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.19+

2016-11-15 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-17096:
--
Attachment: HBASE-17096-0.98.v2.patch

Really sorry for this issue,  i miss the condition in request of 
checkAndMutate. 
patch_v2 fix the problem.

> checkAndMutateApi doesn't work correctly on 0.98.19+
> 
>
> Key: HBASE-17096
> URL: https://issues.apache.org/jira/browse/HBASE-17096
> Project: HBase
>  Issue Type: Bug
>Reporter: Samarth Jain
> Fix For: 0.98.24
>
> Attachments: HBASE-17096-0.98.patch, HBASE-17096-0.98.v2.patch
>
>
> Below is the test case. It uses some Phoenix APIs for getting hold of admin 
> and HConnection but should be easily adopted for an HBase IT test. The second 
> checkAndMutate should return false but it is returning true. This test fails 
> with HBase-0.98.23 and works fine with HBase-0.98.17
> {code}
> @Test
> public void testCheckAndMutateApi() throws Exception {
> byte[] row = Bytes.toBytes("ROW");
> byte[] tableNameBytes = Bytes.toBytes(generateUniqueName());
> byte[] family = Bytes.toBytes(generateUniqueName());
> byte[] qualifier = Bytes.toBytes("QUALIFIER");
> byte[] oldValue = null;
> byte[] newValue = Bytes.toBytes("VALUE");
> Put put = new Put(row);
> put.add(family, qualifier, newValue);
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class);
> try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) {
> HTableDescriptor tableDesc = new HTableDescriptor(
> TableName.valueOf(tableNameBytes));
> HColumnDescriptor columnDesc = new HColumnDescriptor(family);
> columnDesc.setTimeToLive(120);
> tableDesc.addFamily(columnDesc);
> admin.createTable(tableDesc);
> HTableInterface tableDescriptor = 
> admin.getConnection().getTable(tableNameBytes);
> assertTrue(tableDescriptor.checkAndPut(row, family, 
> qualifier, oldValue, put));
> Delete delete = new Delete(row);
> RowMutations mutations = new RowMutations(row);
> mutations.add(delete);
> assertTrue(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> assertFalse(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> }
> }
> }
> {code}
> FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17096) checkAndMutateApi doesn't work correctly on 0.98.19+

2016-11-15 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667090#comment-15667090
 ] 

Heng Chen commented on HBASE-17096:
---

Let me check it.  :)

> checkAndMutateApi doesn't work correctly on 0.98.19+
> 
>
> Key: HBASE-17096
> URL: https://issues.apache.org/jira/browse/HBASE-17096
> Project: HBase
>  Issue Type: Bug
>Reporter: Samarth Jain
> Attachments: HBASE-17096-0.98.patch
>
>
> Below is the test case. It uses some Phoenix APIs for getting hold of admin 
> and HConnection but should be easily adopted for an HBase IT test. The second 
> checkAndMutate should return false but it is returning true. This test fails 
> with HBase-0.98.23 and works fine with HBase-0.98.17
> {code}
> @Test
> public void testCheckAndMutateApi() throws Exception {
> byte[] row = Bytes.toBytes("ROW");
> byte[] tableNameBytes = Bytes.toBytes(generateUniqueName());
> byte[] family = Bytes.toBytes(generateUniqueName());
> byte[] qualifier = Bytes.toBytes("QUALIFIER");
> byte[] oldValue = null;
> byte[] newValue = Bytes.toBytes("VALUE");
> Put put = new Put(row);
> put.add(family, qualifier, newValue);
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> PhoenixConnection phxConn = conn.unwrap(PhoenixConnection.class);
> try (HBaseAdmin admin = phxConn.getQueryServices().getAdmin()) {
> HTableDescriptor tableDesc = new HTableDescriptor(
> TableName.valueOf(tableNameBytes));
> HColumnDescriptor columnDesc = new HColumnDescriptor(family);
> columnDesc.setTimeToLive(120);
> tableDesc.addFamily(columnDesc);
> admin.createTable(tableDesc);
> HTableInterface tableDescriptor = 
> admin.getConnection().getTable(tableNameBytes);
> assertTrue(tableDescriptor.checkAndPut(row, family, 
> qualifier, oldValue, put));
> Delete delete = new Delete(row);
> RowMutations mutations = new RowMutations(row);
> mutations.add(delete);
> assertTrue(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> assertFalse(tableDescriptor.checkAndMutate(row, family, 
> qualifier, CompareOp.EQUAL, newValue, mutations));
> }
> }
> }
> {code}
> FYI, [~apurtell], [~jamestaylor], [~lhofhansl]. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17092) Both LoadIncrementalHFiles#doBulkLoad() methods should set return value

2016-11-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665637#comment-15665637
 ] 

Heng Chen commented on HBASE-17092:
---

+1

> Both LoadIncrementalHFiles#doBulkLoad() methods should set return value
> ---
>
> Key: HBASE-17092
> URL: https://issues.apache.org/jira/browse/HBASE-17092
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
> Attachments: 17092.v1.txt
>
>
> Currently there are two LoadIncrementalHFiles#doBulkLoad() methods.
> One returns a Map:
> {code}
>   public Map doBulkLoad(Map 
> map, final Admin admin,
> {code}
> The other one is void return type.
> This issue makes both methods record return value which is used by the run() 
> method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-17037) Enhance LoadIncrementalHFiles API to convey loaded files

2016-11-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659254#comment-15659254
 ] 

Heng Chen commented on HBASE-17037:
---

+1 for it. 

> Enhance LoadIncrementalHFiles API to convey loaded files
> 
>
> Key: HBASE-17037
> URL: https://issues.apache.org/jira/browse/HBASE-17037
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
> Attachments: 17037.v2.txt, 17037.v3.txt
>
>
> When Map is passed to LoadIncrementalHFiles, we should 
> provide a means for the caller to get the collection of Paths for the loaded 
> hfiles.
> The functionality added by HBASE-16821 is preserved as shown by the modified 
> TestLoadIncrementalHFiles#testSimpleLoadWithMap



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode

2016-10-31 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624262#comment-15624262
 ] 

Heng Chen commented on HBASE-16938:
---

The patch looks OK to me.  It extracts the updater to be one tool,  so we could 
run it alone and no harm to current logic.  :)

> TableCFsUpdater maybe failed due to no write permission on peerNode
> ---
>
> Key: HBASE-16938
> URL: https://issues.apache.org/jira/browse/HBASE-16938
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16938.patch
>
>
> After HBASE-11393, replication table-cfs use a PB object. So it need copy the 
> old string config to new PB object when upgrade cluster. In our use case, we 
> have different kerberos for different cluster, etc. online serve cluster and 
> offline processing cluster. And we use a unify global admin kerberos for all 
> clusters. The peer node is created by client. So only global admin has the 
> write  permission for it. When upgrade cluster, HMaster doesn't has the write 
> permission on peer node, it maybe failed to copy old table-cfs string to new 
> PB Object. I thought it need a tool for client to do this copy job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode

2016-10-31 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624032#comment-15624032
 ] 

Heng Chen commented on HBASE-16938:
---

It is OK for me to make the Updater to be one tool.  

> TableCFsUpdater maybe failed due to no write permission on peerNode
> ---
>
> Key: HBASE-16938
> URL: https://issues.apache.org/jira/browse/HBASE-16938
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.0.0, 1.4.0
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Attachments: HBASE-16938.patch
>
>
> After HBASE-11393, replication table-cfs use a PB object. So it need copy the 
> old string config to new PB object when upgrade cluster. In our use case, we 
> have different kerberos for different cluster, etc. online serve cluster and 
> offline processing cluster. And we use a unify global admin kerberos for all 
> clusters. The peer node is created by client. So only global admin has the 
> write  permission for it. When upgrade cluster, HMaster doesn't has the write 
> permission on peer node, it maybe failed to copy old table-cfs string to new 
> PB Object. I thought it need a tool for client to do this copy job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16973) Revisiting default value for hbase.client.scanner.caching

2016-10-31 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622248#comment-15622248
 ] 

Heng Chen commented on HBASE-16973:
---

Notice that HBase-11544 has been only applied on branch-1.1.x,  So the default 
value for branch-1.2.x is still 100?  Did our default value has some 
compatibility rules (If not, should we have it)? It confused our users.  And in 
this case, i think we should keep the default value to be small as [~carp84] 
mentioned, and respect all configurations about scanner.

> Revisiting default value for hbase.client.scanner.caching
> -
>
> Key: HBASE-16973
> URL: https://issues.apache.org/jira/browse/HBASE-16973
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: Scan.next_p999.png
>
>
> We are observing below logs for a long-running scan:
> {noformat}
> 2016-10-30 08:51:41,692 WARN  
> [B.defaultRpcServer.handler=50,queue=12,port=16020] ipc.RpcServer:
> (responseTooSlow-LongProcessTime): {"processingtimems":24329,
> "call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)",
> "client":"11.251.157.108:50415","scandetails":"table: ae_product_image 
> region: ae_product_image,494:
> ,1476872321454.33171a04a683c4404717c43ea4eb8978.","param":"scanner_id: 
> 5333521 number_of_rows: 2147483647
> close_scanner: false next_call_seq: 8 client_handles_partials: true 
> client_handles_heartbeats: true",
> "starttimems":1477788677363,"queuetimems":0,"class":"HRegionServer","responsesize":818,"method":"Scan"}
> {noformat}
> From which we found the "number_of_rows" is as big as {{Integer.MAX_VALUE}}
> And we also observed a long filter list on the customized scan. After 
> checking application code we confirmed that there's no {{Scan.setCaching}} or 
> {{hbase.client.scanner.caching}} setting on client side, so it turns out 
> using the default value the caching for Scan will be Integer.MAX_VALUE, which 
> is really a big surprise.
> After checking code and commit history, I found it's HBASE-11544 which 
> changes {{HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING}} from 100 to 
> Integer.MAX_VALUE, and from the release note there I could see below notation:
> {noformat}
> Scan caching default has been changed to Integer.Max_Value 
> This value works together with the new maxResultSize value from HBASE-12976 
> (defaults to 2MB) 
> Results returned from server on basis of size rather than number of rows 
> Provides better use of network since row size varies amongst tables
> {noformat}
> And I'm afraid this lacks of consideration of the case of scan with filters, 
> which may involve many rows but only return with a small result.
> What's more, we still have below comment/code in {{Scan.java}}
> {code}
>   /*
>* -1 means no caching
>*/
>   private int caching = -1;
> {code}
> But actually the implementation does not follow (instead of no caching, we 
> are caching {{Integer.MAX_VALUE}}...).
> So here I'd like to bring up two points:
> 1. Change back the default value of 
> HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING to some small value like 128
> 2. Reenforce the semantic of "no caching"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16954) Unify HTable#checkAndDelete with AP

2016-10-30 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620914#comment-15620914
 ] 

Heng Chen commented on HBASE-16954:
---

commit to master

> Unify HTable#checkAndDelete with AP
> ---
>
> Key: HBASE-16954
> URL: https://issues.apache.org/jira/browse/HBASE-16954
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16954.v0.patch, HBASE-16954.v1.patch
>
>
> The HTable#checkAndDelete(byte[], byte[], byte[], byte[], Delete) can be 
> implemented by HTable#checkAndDelete(byte[], byte[], byte[], byte[], 
> CompareType.EQUAL, Delete). As a result, all HTable#checkAndDelete methods 
> can be unified with AP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16954) Unify HTable#checkAndDelete with AP

2016-10-30 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620917#comment-15620917
 ] 

Heng Chen commented on HBASE-16954:
---

Thanks [~chia7712] for your patch.

> Unify HTable#checkAndDelete with AP
> ---
>
> Key: HBASE-16954
> URL: https://issues.apache.org/jira/browse/HBASE-16954
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16954.v0.patch, HBASE-16954.v1.patch
>
>
> The HTable#checkAndDelete(byte[], byte[], byte[], byte[], Delete) can be 
> implemented by HTable#checkAndDelete(byte[], byte[], byte[], byte[], 
> CompareType.EQUAL, Delete). As a result, all HTable#checkAndDelete methods 
> can be unified with AP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16954) Unify HTable#checkAndDelete with AP

2016-10-30 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16954:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Unify HTable#checkAndDelete with AP
> ---
>
> Key: HBASE-16954
> URL: https://issues.apache.org/jira/browse/HBASE-16954
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16954.v0.patch, HBASE-16954.v1.patch
>
>
> The HTable#checkAndDelete(byte[], byte[], byte[], byte[], Delete) can be 
> implemented by HTable#checkAndDelete(byte[], byte[], byte[], byte[], 
> CompareType.EQUAL, Delete). As a result, all HTable#checkAndDelete methods 
> can be unified with AP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16954) Unify HTable#checkAndDelete with AP

2016-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613849#comment-15613849
 ] 

Heng Chen commented on HBASE-16954:
---

+1

> Unify HTable#checkAndDelete with AP
> ---
>
> Key: HBASE-16954
> URL: https://issues.apache.org/jira/browse/HBASE-16954
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>Reporter: ChiaPing Tsai
>Assignee: ChiaPing Tsai
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16954.v0.patch, HBASE-16954.v1.patch
>
>
> The HTable#checkAndDelete(byte[], byte[], byte[], byte[], Delete) can be 
> implemented by HTable#checkAndDelete(byte[], byte[], byte[], byte[], 
> CompareType.EQUAL, Delete). As a result, all HTable#checkAndDelete methods 
> can be unified with AP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16610) Unify append, increment with AP

2016-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612286#comment-15612286
 ] 

Heng Chen commented on HBASE-16610:
---

Thanks [~chia7712] for your attention.  I am little busy recently,  will do it 
later.  

> Unify append, increment with AP
> ---
>
> Key: HBASE-16610
> URL: https://issues.apache.org/jira/browse/HBASE-16610
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-16610.patch, HBASE-16610.v1.patch, 
> HBASE-16610.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16593) Unify HTable with AP

2016-10-27 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612277#comment-15612277
 ] 

Heng Chen commented on HBASE-16593:
---

Of course we could do it.  Go for it. [~chia7712] :)

> Unify HTable with AP
> 
>
> Key: HBASE-16593
> URL: https://issues.apache.org/jira/browse/HBASE-16593
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Heng Chen
>Assignee: Heng Chen
>
> Currently, HTable has two ways to deal with request,  one is call RPC 
> directly, it is used to processed single action request such as Get, Delete, 
> Append, Increment.  Another one is through AP to deal with multi action 
> requests, such as batch, mutation etc.
> This issue is to unify them with AP only. It has some benefits, for example 
> we could implements async interface easily with AP,  and we could make the 
> client logic more clear just use AP to communicate with Server.
> HBASE-14703 has done some work (unify mutate and checkAndMutate with AP)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-16 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581074#comment-15581074
 ] 

Heng Chen commented on HBASE-16698:
---

patch for branch-1 LGTM.  +1 
We will open it by default on branch-1, right?  Just confirm it with all your 
guys. :)


> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-16698.branch-1.patch, 
> HBASE-16698.branch-1.v2.patch, HBASE-16698.patch, HBASE-16698.v2.patch, 
> hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-16 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15581067#comment-15581067
 ] 

Heng Chen commented on HBASE-16698:
---

{quote}
so in my analysis, waiting for sync and waiting for latch should take the same 
time. Have no idea why waiting for sync is faster
{quote}

[~allan163] Not exactly,  currently we waiting for seqId assigned in one 
handler,  but we do sync in multi threads parallel (5 default). 

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-16698.branch-1.patch, 
> HBASE-16698.branch-1.v2.patch, HBASE-16698.patch, HBASE-16698.v2.patch, 
> hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16853) Regions are assigned to Region Servers in /hbase/draining after HBase Master failover

2016-10-16 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580931#comment-15580931
 ] 

Heng Chen commented on HBASE-16853:
---

+1

> Regions are assigned to Region Servers in /hbase/draining after HBase Master 
> failover
> -
>
> Key: HBASE-16853
> URL: https://issues.apache.org/jira/browse/HBASE-16853
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer, Region Assignment
>Affects Versions: 2.0.0, 1.3.0
>Reporter: David Pope
>Assignee: David Pope
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: 16853.v2.txt, HBASE-16853.branch-1.3-v1.patch, 
> HBASE-16853.branch-1.3-v2.patch
>
>
> h2. Problem
> If there are Region Servers registered as "draining", they will continue to 
> have "draining" znodes after a HMaster failover; however, the balancer will 
> assign regions to them.
> h2. How to reproduce (on hbase master):
> # Add regionserver to /hbase/draining: {{bin/hbase-jruby 
> bin/draining_servers.rb add server1:16205}}
> # Unload the regionserver:  {{bin/hbase-jruby bin/region_mover.rb unload 
> server1:16205}}
> # Kill the Active HMaster and failover to the Backup HMaster
> # Run the balancer: {{hbase shell <<< "balancer"}}
> # Notice regions get assigned on new Active Master to Region Servers in 
> /hbase/draining
> h2. Root Cause
> The Backup HMaster initializes the {{DrainingServerTracker}} before the 
> Region Servers are registered as "online" with the {{ServerManager}}.  As a 
> result, the {{ServerManager.drainingServers}} isn't populated with existing 
> Region Servers in draining when we have an HMaster failover.
> E.g., 
> # We have a region server in draining: {{server1,16205,1000}}
> # The {{RegionServerTracker}} starts up and adds a ZK watcher on the Znode 
> for this RegionServer: {{/hbase/rs/server1,16205,1000}}
> # The {{DrainingServerTracker}} starts and processes each Znode under 
> {{/hbase/draining}}, but the Region Server isn't registered as "online" so it 
> isn't added to the {{ServerManager.drainingServers}} list.
> # The Region Server is added to the {{DrainingServerTracker.drainingServers}} 
> list.
> # The Region Server's Znode watcher is triggered and the ZK watcher is 
> restarted.
> # The Region Server is registered with {{ServerManager}} as "online".
> *END STATE:* The Region Server has a Znode in {{/hbase/draining}}, but it is 
> registered as "online" and the Balancer will start assigning regions to it.
> {code}
> $ bin/hbase-jruby bin/draining_servers.rb list
> [1] server1,16205,1000
> $ grep server1,16205,1000 logs/master-server1.log
> 2016-10-14 16:02:47,713 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
> baseZNode=/hbase Set watcher on existing znode=/hbase/rs/server1,16205,1000
> [2] 2016-10-14 16:02:47,722 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.RegionServerTracker: Added tracking of RS 
> /hbase/rs/server1,16205,1000
> 2016-10-14 16:02:47,730 DEBUG [server1:16001.activeMasterManager] 
> zookeeper.ZKUtil: master:16001-0x157c56adc810014, quorum=localhost:2181, 
> baseZNode=/hbase Set watcher on existing 
> znode=/hbase/draining/server1,16205,1000
> [3] 2016-10-14 16:02:47,731 WARN  [server1:16001.activeMasterManager] 
> master.ServerManager: Server server1,16205,1000 is not currently online. 
> Ignoring request to add it to draining list.
> [4] 2016-10-14 16:02:47,731 INFO  [server1:16001.activeMasterManager] 
> zookeeper.DrainingServerTracker: Draining RS node created, adding to list 
> [server1,16205,1000]
> 2016-10-14 16:02:47,971 DEBUG [main-EventThread] zookeeper.ZKUtil: 
> master:16001-0x157c56adc810014, quorum=localhost:2181, baseZNode=/hbase Set 
> watcher on existing 
> znode=/hbase/rs/dev6918.prn2.facebook.com,16205,1476486047114
> [5] 2016-10-14 16:02:47,976 DEBUG [main-EventThread] 
> zookeeper.RegionServerTracker: Added tracking of RS 
> /hbase/rs/server1,16205,1000
> [6] 2016-10-14 16:02:52,084 INFO  
> [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=16001] 
> master.ServerManager: Registering server=server1,16205,1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574695#comment-15574695
 ] 

Heng Chen commented on HBASE-16653:
---

NO.  Not do it.  And old rs can't parse NS information due to it not support 
it.  So if you create one peer with NS during rolling upgrade,  the old RS 
replication will miss the ns information.

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16821) Enhance LoadIncrementalHFiles to convey missing hfiles if any

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574683#comment-15574683
 ] 

Heng Chen commented on HBASE-16821:
---

Any differenece between v1 and v2?

> Enhance LoadIncrementalHFiles to convey missing hfiles if any
> -
>
> Key: HBASE-16821
> URL: https://issues.apache.org/jira/browse/HBASE-16821
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 16821.v1.txt, 16821.v2.txt
>
>
> When map parameter of run() method is not null:
> {code}
>   public int run(String dirPath, Map map, TableName 
> tableName) throws Exception{
> {code}
> the caller knows the exact files to be bulk loaded.
> This issue is to enhance the run() API so that when certain hfiles turn out 
> to be missing, the return value should indicate the missing files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16785) We are not running all tests

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574538#comment-15574538
 ] 

Heng Chen commented on HBASE-16785:
---

It worked!  Cheering!  
+1 for patch

> We are not running all tests
> 
>
> Key: HBASE-16785
> URL: https://issues.apache.org/jira/browse/HBASE-16785
> Project: HBase
>  Issue Type: Bug
>  Components: build, test
>Reporter: stack
>Assignee: stack
> Attachments: HBASE-16785.master.001.patch, 
> HBASE-16785.master.002.patch, HBASE-16785.master.002.patch
>
>
> Noticed by [~mbertozzi]
> We have some modules where we tried to 'skip' the running of the second part 
> of tests -- medium and larges. That might have made sense once when the 
> module was originally added when there may have been just a few small tests 
> to run but as time goes by and the module accumulates more tests in a few 
> cases we've added mediums and larges but we've not removed the 'skip' config.
> Matteo noticed this happened in hbase-procedure.
> In hbase-client, there is at least a medium test that is being skipped.
> Let me try purging this trick everywhere. It doesn't seem to save us anything 
> going by build time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574527#comment-15574527
 ] 

Heng Chen commented on HBASE-16653:
---

Old RS can't parse peer information with namespace. 

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574517#comment-15574517
 ] 

Heng Chen edited comment on HBASE-16698 at 10/14/16 7:24 AM:
-

I think [~allan163] you are right.  
It is different between branch-1.1 and branch-1.2.   On branch-1.1,  we wait 
for the seqId assigned after sync.  So the issue is invalid for branch-1.1.  

It seems the CountDownLatch could be removed for SYNC_WAL durability? 

[~carp84] your online cluster is branch-1.2, right?


was (Author: chenheng):
I think [~allan163] you are right.  
It is different between branch-1.1 and branch-1.2.   On branch-1.1,  we wait 
for the seqId assigned after sync.  So the issue is invalid for branch-1.1.  

It seems the CountDownLatch could be removed for SYNC_WAL durability? 

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-14 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574517#comment-15574517
 ] 

Heng Chen commented on HBASE-16698:
---

I think [~allan163] you are right.  
It is different between branch-1.1 and branch-1.2.   On branch-1.1,  we wait 
for the seqId assigned after sync.  So the issue is invalid for branch-1.1.  

It seems the CountDownLatch could be removed for SYNC_WAL durability? 

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16821) Enhance LoadIncrementalHFiles to convey missing hfiles if any

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574105#comment-15574105
 ] 

Heng Chen commented on HBASE-16821:
---

+1 for it. LGTM.

> Enhance LoadIncrementalHFiles to convey missing hfiles if any
> -
>
> Key: HBASE-16821
> URL: https://issues.apache.org/jira/browse/HBASE-16821
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 16821.v1.txt
>
>
> When map parameter of run() method is not null:
> {code}
>   public int run(String dirPath, Map map, TableName 
> tableName) throws Exception{
> {code}
> the caller knows the exact files to be bulk loaded.
> This issue is to enhance the run() API so that when certain hfiles turn out 
> to be missing, the return value should indicate the missing files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Fix Version/s: (was: 1.1.8)
   (was: 1.2.5)

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574102#comment-15574102
 ] 

Heng Chen commented on HBASE-16664:
---

I see it.  If so,  i suggest we just leave operationTimeout behavior changed on 
branch-1.3+ .
 
Of course,  we could fix rpcTimeout in ap if you want to do it. :)

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574072#comment-15574072
 ] 

Heng Chen commented on HBASE-16664:
---

I think we should add the setXXTimeout interface in Table for branch-1.1 and 
branch-1.2,  otherwise we could not control operationTimeout for batch and 
singleRequest

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574052#comment-15574052
 ] 

Heng Chen commented on HBASE-16664:
---

branch-1.1 and branch-1.2 patch seems not fix "setXXXTimeout not work for ap in 
HTable"  ?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Fix Version/s: 1.1.8
   1.2.5

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 1.1.8
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574033#comment-15574033
 ] 

Heng Chen commented on HBASE-16664:
---

OH, sorry,  miss the two patch,  let me commit them.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16832) Reduce the default number of versions in Meta table for branch-1

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573956#comment-15573956
 ] 

Heng Chen commented on HBASE-16832:
---

Thanks  [~aoxiang].  +1 for it.

> Reduce the default number of versions in Meta table for branch-1
> 
>
> Key: HBASE-16832
> URL: https://issues.apache.org/jira/browse/HBASE-16832
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16832.branch-1.patch, rpc-handler.png, 
> rpc_processingCallTime.png, rpc_processingCallTimeV2.png, rpc_qps.png, 
> rpc_queueLength.png, rpc_queueSize.png, rpc_scan_latency.png, 
> rpc_scan_latencyV2.png, rpc_totalcalltime.png, rpc_totalcalltimeV2.png
>
>
> I find the DEFAULT_HBASE_META_VERSIONS is still 10 in branch-1, and in master 
> version DEFAULT_HBASE_META_VERSIONS is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16832) Reduce the default number of versions in Meta table for branch-1

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573922#comment-15573922
 ] 

Heng Chen commented on HBASE-16832:
---

[~te...@apache.org] do you know why DEFAULT_HBASE_META_VERSIONS is 10 for 
branch-1 and set back to be 3 on master?

> Reduce the default number of versions in Meta table for branch-1
> 
>
> Key: HBASE-16832
> URL: https://issues.apache.org/jira/browse/HBASE-16832
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16832.branch-1.patch, rpc-handler.png, 
> rpc_processingCallTime.png, rpc_qps.png, rpc_scan_latency.png, 
> rpc_totalcalltime.png
>
>
> I find the DEFAULT_HBASE_META_VERSIONS is still 10 in branch-1, and in master 
> version DEFAULT_HBASE_META_VERSIONS is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573914#comment-15573914
 ] 

Heng Chen commented on HBASE-16653:
---

Due to we modify proto object of ReplicationPeerConfig (add tableCFs field),  
so when we do rolling update,  we have to update original ReplicationPeerConfig 
data on ZK firstly (See TableCFsUpdater.java).  This means during rolling 
update,  if one peer with namespace added, replication on old regionserver 
could not work. We have to wait for the rolling update completed. This is the 
big problem about HBASE-11393 for rolling update.  Could we accepte it for 
already released branch? 

As for the patch v3,  totally it looks good to me.  But why changes for rb 
scripts of shell not be backported?

 

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16832) Reduce the default number of versions in Meta table for branch-1

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573810#comment-15573810
 ] 

Heng Chen commented on HBASE-16832:
---

The default value of major_compaction for your cluster is 7 days? 

> Reduce the default number of versions in Meta table for branch-1
> 
>
> Key: HBASE-16832
> URL: https://issues.apache.org/jira/browse/HBASE-16832
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.3.0
>Reporter: binlijin
>Assignee: binlijin
> Attachments: HBASE-16832.branch-1.patch, rpc-handler.png, 
> rpc_processingCallTime.png, rpc_qps.png, rpc_scan_latency.png, 
> rpc_totalcalltime.png
>
>
> I find the DEFAULT_HBASE_META_VERSIONS is still 10 in branch-1, and in master 
> version DEFAULT_HBASE_META_VERSIONS is 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572129#comment-15572129
 ] 

Heng Chen commented on HBASE-16807:
---

commit to all branches

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
Fix Version/s: 1.1.8
   1.2.5

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.8
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
Fix Version/s: 0.98.24
   1.3.1
   1.4.0

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572030#comment-15572030
 ] 

Heng Chen commented on HBASE-16807:
---

Will you upload patch for branch-1.1 and branch-1.2 ?

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572025#comment-15572025
 ] 

Heng Chen commented on HBASE-16807:
---

That's great you have patch for branch-1+,  let me commit it.

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Release Note: 
This issue fix three bugs:
1.  rpcTimeout configuration not work for one rpc call in AP
2.  operationTimeout configuration not work for multi-request (batch, put) in 
AP 
3.  setRpcTimeout and setOperationTimeout in HTable is not worked for AP and 
BufferedMutator. 



  was:
This issue fix three bugs:
1.  rpcTimeout configuration not work for one rpc call in AP
2.  operationTimeout configuration not work for multi-request (batch, put) in 
AP 
3.  setRpcTimeout and setOperationTimeout in HTable is not worked for ap and 
BufferedMutator. 




> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Release Note: 
This issue fix three bugs:
1.  rpcTimeout configuration not work for one rpc call in AP
2.  operationTimeout configuration not work for multi-request (batch, put) in 
AP 
3.  setRpcTimeout and setOperationTimeout in HTable is not worked for ap and 
BufferedMutator. 



  was:
This issue fix three bugs:
1.  rpcTimeout configuration not work for one rpc call in AP
2.  operationTimeout configuration not work for multi-request in AP 
3.  setRpcTimeout and setOperationTimeout in HTable is not worked for ap and 
BufferedMutator. 




> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: 
This issue fix three bugs:
1.  rpcTimeout configuration not work for one rpc call in AP
2.  operationTimeout configuration not work for multi-request in AP 
3.  setRpcTimeout and setOperationTimeout in HTable is not worked for ap and 
BufferedMutator. 


  Status: Resolved  (was: Patch Available)

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571706#comment-15571706
 ] 

Heng Chen commented on HBASE-16664:
---

push to branch-1.3

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571528#comment-15571528
 ] 

Heng Chen commented on HBASE-16807:
---

push to master. Thanks all the guys!

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16807:
--
  Resolution: Fixed
Release Note: push to master. Thanks all the guys!
  Status: Resolved  (was: Patch Available)

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Attachment: HBASE-16664-branch-1.3.v3.patch

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.3.v3.patch, HBASE-16664-branch-1.v3.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571399#comment-15571399
 ] 

Heng Chen commented on HBASE-16664:
---

Failed testcase of patch v2 on branch-1 is not related. Most of them is caused 
by mr cluster in TestCase.  Will open another issue for it. 

Key testcases have passed locally on patch v3.  Pushed v3 to branch-1.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.v3.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> HBASE-16664-v3.patch, HBASE-16664-v4.patch, HBASE-16664-v5.patch, 
> HBASE-16664-v6.patch, HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Attachment: HBASE-16664-branch-1.v3.patch

upload patch v3 for branch-1,  remove unused tracker.start in HTable.mutate, 
HTable.checkAndMutate, MultiServerCallable.call

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-branch-1.v3.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> HBASE-16664-v3.patch, HBASE-16664-v4.patch, HBASE-16664-v5.patch, 
> HBASE-16664-v6.patch, HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-13 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Fix Version/s: 1.3.1
   1.4.0
   2.0.0

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Fix For: 2.0.0, 1.4.0, 1.3.1
>
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570846#comment-15570846
 ] 

Heng Chen commented on HBASE-16807:
---

Got it. +1 for it. 

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570840#comment-15570840
 ] 

Heng Chen commented on HBASE-16698:
---

Numbers seems 20 regions on one RS.  If you have time,  please upload numbers 
one region on one RS.  I am very inerested about it.  As [~stack] said,  set it 
to off as default is good for me.

BTW.  The patch lgtm. +1 for it.  

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-16698.branch-1.patch, HBASE-16698.patch, 
> HBASE-16698.v2.patch, hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570829#comment-15570829
 ] 

Heng Chen commented on HBASE-16807:
---

You patch seems just skip the cache without consider ServiceException, right? 

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
> Fix For: 2.0.0
>
> Attachments: HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570822#comment-15570822
 ] 

Heng Chen commented on HBASE-16653:
---

Let me take a look of patch v3.  Need some time.

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch, HBASE-16653-branch-1-v3.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570746#comment-15570746
 ] 

Heng Chen commented on HBASE-16664:
---

Failed test cases for v7 seems not related.  If you are not hurry,  i will 
commit it later today.  Otherwise,  you could ask [~Apache9] to commit it.  
Thanks for your working, [~yangzhe1991] !  :)  

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567915#comment-15567915
 ] 

Heng Chen commented on HBASE-16664:
---

remove the line in MultiServerCallable.call in branch-1 patch
{code}
+this.tracker.start();
{code}

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1-v2.patch, 
> HBASE-16664-branch-1.1-v1.patch, HBASE-16664-branch-1.2-v1.patch, 
> HBASE-16664-branch-1.3-v1.patch, HBASE-16664-branch-1.3-v2.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> HBASE-16664-v7.patch, testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567826#comment-15567826
 ] 

Heng Chen commented on HBASE-16664:
---

As for operationTimeout meaning,  yeah,  the operation should from api call 
start to end and include time in queue.  That's good to start time before 
enqueuing.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-12 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567778#comment-15567778
 ] 

Heng Chen commented on HBASE-16664:
---

it should be better for tracker.start() moved into 
CancellableRegionServerCallable.call as original due to the request maybe 
queued in threadPool?  The time will be more accurate.  As race condition for 
tracker.start(),  just make the globalStartTime to be volatile is OK?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, HBASE-16664-v6.patch, 
> testhcm.patch
>
>
> Rpc/operation timeout logic in AsyncProcess is broken. And Table's 
> set*Timeout does not take effect in its AP or BufferedMutator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567711#comment-15567711
 ] 

Heng Chen commented on HBASE-16664:
---

OK.  If all your guys agree with operation timeout should be applied for batch. 
 I will 0 vote.

As for the patch,  tracker in AsyncRequestFutureImpl has race condition,  you 
could create it in constructor if callable is null.  Otherwise the patch v5 is 
ok for me. 

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16807) RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567648#comment-15567648
 ] 

Heng Chen commented on HBASE-16807:
---

Just skip the cache is OK here?

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker 
> will always return old active HM detail to Region server on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
>   while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
>   if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
>   }
>   if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
>   }
>   refresh = true; // let's try pull it from ZK directly
>   if (sleep(200)) {
> interrupted = true;
>   }
>   continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567527#comment-15567527
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
Then just set operation timeout to Long.MAX_VALUE and use retry times to 
control the timeout under this scenario.
But what about the scenario that user knows how many entries he/she has?
{quote}
Although user knows how many entries,  but they still not know how many entries 
each RS will be requested.  So it is also hard to decide the total time out it 
is,  maybe you can use the worst timeout but i still not sure how effective it 
is and you can still use retry number and rpcTimeout to control it without 
total timeout.



> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567465#comment-15567465
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
We can not tell users "some operations don't support a total timeout"
{quote}

Normally, in batch request,  users maybe not know how many put or get included, 
 it is hard for our user to define the total timeout 

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567451#comment-15567451
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
And I think for a user, I call the batch method, or the multi get method, to me 
it is one operation. I can increase the operation timeout if I want.
{quote}

My understanding is single-get is one operation and one call for api,  but 
multi get is multi operations although it is also just one call for api


> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567420#comment-15567420
 ] 

Heng Chen edited comment on HBASE-16664 at 10/12/16 3:34 AM:
-

The typical case in our application is something like TestFromClientSide did

1.  get HTable object 
2.  call batch puts
3.  call get
4.  close HTable

It is ok now.  2 and 3 maybe in different threads






was (Author: chenheng):
The typical case in our application is something like TestFromClientSide did

1.  get HTable object 
2.  call batch puts
3.  call get
4.  close HTable

It is ok now. 





> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567420#comment-15567420
 ] 

Heng Chen commented on HBASE-16664:
---

The typical case in our application is something like TestFromClientSide did

1.  get HTable object 
2.  call batch puts
3.  call get
4.  close HTable

It is ok now. 





> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567379#comment-15567379
 ] 

Heng Chen commented on HBASE-16664:
---

Normally we share one HTable object in our application.   It is hard to avoid 
race condition.
I think we should define the operation timeout firstly to avoid conflicts. 
This is our description for it in hbase-default.xml
{code}
  
hbase.client.operation.timeout
120
Operation timeout is a top-level restriction (millisecond) 
that makes sure a
  blocking operation in Table will not be blocked more than this. In each 
operation, if rpc
  request fails because of timeout or other reason, it will retry until 
success or throw
  RetriesExhaustedException. But if the total time being blocking reach the 
operation timeout
  before retries exhausted, it will break early and throw 
SocketTimeoutException.
  
{code}

what's the one operation?  Batch is one operation?  [~stack]

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567343#comment-15567343
 ] 

Heng Chen commented on HBASE-16664:
---

For example, in our production cluster,   we set operation timeout 3s for 
single get.  
So as the patch, the batch operations will be 3s timeout?  I don't think it is 
a good idea.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-10-11 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567319#comment-15567319
 ] 

Heng Chen commented on HBASE-16664:
---

I think the reason we did not set total timeout for batch is that batch has 
uncertain operations,  it is hard to set  total timeout for it.  But we could 
control retry times.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-branch-1-v1.patch, 
> HBASE-16664-branch-1-v1.patch, HBASE-16664-branch-1.1-v1.patch, 
> HBASE-16664-branch-1.2-v1.patch, HBASE-16664-branch-1.3-v1.patch, 
> HBASE-16664-v1.patch, HBASE-16664-v2.patch, HBASE-16664-v3.patch, 
> HBASE-16664-v4.patch, HBASE-16664-v5.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-09-29 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534842#comment-15534842
 ] 

Heng Chen commented on HBASE-16653:
---

As [~enis] mentioned,  we should leave the deprecated method here for branch-1 
due to our compatibility contact.  And HBASE-11393 has some compatibility 
issues,  if we do backport to branch-1,  we should take care of this.  Please 
upload the patch to review board.  [~zghaobac]

> Backport HBASE-11393 to all branches which support namespace
> 
>
> Key: HBASE-16653
> URL: https://issues.apache.org/jira/browse/HBASE-16653
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0, 1.0.5, 1.3.1, 0.98.22, 1.1.7, 1.2.4
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
> Fix For: 1.4.0
>
> Attachments: HBASE-16653-branch-1-v1.patch, 
> HBASE-16653-branch-1-v2.patch
>
>
> As HBASE-11386 mentioned, the parse code about replication table-cfs config 
> will be wrong when table name contains namespace and we can only config the 
> default namespace's tables in the peer. It is a bug for all branches which 
> support namespace. HBASE-11393 resolved this by use a pb object but it was 
> only merged to master branch. Other branches still have this problem. I 
> thought we should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15529035#comment-15529035
 ] 

Heng Chen commented on HBASE-16664:
---

Yes,  the timeout logic has conflicts.  After HBASE-16607,   
NoncedRegionServerCallable is unified with CancellableRegionServerCallable,  i 
have planed to go on unified append and delete with ap (HBASE-16610),  and we 
will move all retry logic into AP there.  

But it is hanged due to some performance issue,   maybe we could change 
NoncedRegionServerCallable to extends RegionServerCallable directly.  We could 
do it in this issue i think.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528946#comment-15528946
 ] 

Heng Chen edited comment on HBASE-16664 at 9/28/16 8:49 AM:


So we could modify the operation timeout judgement logic in 
CancellableRegionServerCallable?   
Now we use remaining==0 to check whether the timeout is reached,  change it to 
be <= 1?


was (Author: chenheng):
So we could modify the operation timeout judgement logic in 
CancellableRegionServerCallable?   
Now we use remaining==0 to check whether the timeout is reached,  change it to 
be 1?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528946#comment-15528946
 ] 

Heng Chen commented on HBASE-16664:
---

So we could modify the operation timeout judgement logic in 
CancellableRegionServerCallable?   
Now we use remaining==0 to check whether the timeout is reached,  change it to 
be 1?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528761#comment-15528761
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
As I said, we can not use RetryingTimeTracker because it will not return 
non-positive value(See RetryingTimeTracker's Line 42). 
{quote}
Any problem with return the minimum (1) instead of non-positive value?   

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16664:
--
Attachment: 1.patch

1.patch (Maybe compile failed, very rough patch) could describe my thought 
better.  wdyt?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: 1.patch, HBASE-16664-v1.patch, HBASE-16664-v2.patch, 
> testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528636#comment-15528636
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
one is we use remaining time as the rpc timeout in callWithoutRetries, 
{quote}
This one why not just introduce rpc timeout in CancellableRegionServerCallable 
directly?  It seems just serval lines changed.

{quote}
the other is HTable's setTimeout methods don't affect AP/BufferedMutator in it
{quote}
It seems be solved if we introduce rpc timeout in 
CancellableRegionServerCallable?



> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16664-v1.patch, HBASE-16664-v2.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528595#comment-15528595
 ] 

Heng Chen commented on HBASE-16664:
---

Just skim the patch,   not see the big difference with the original logical 
(Except we add rpcTimeout factor for each rpc call in callWithoutRetries).  

The problem we want to solve is to make our "read/write rpc timeout" to work in 
AP,  right?Is there something else i missed?

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16664-v1.patch, HBASE-16664-v2.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528581#comment-15528581
 ] 

Heng Chen commented on HBASE-16698:
---

{quote}
so the main problem is sequential appends and the logic that getting MVCC has 
to wait for the relative append to finish.
{quote}
Yeah,  but just for this sequential dealing,  we could avoid lock to keep mvcc 
and wal in the same order.  
So i am not sure in which workload, the performance will be improved.  And i 
think "per-table configuration" makes sense if we could do it.

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-16698.patch, HBASE-16698.v2.patch, 
> hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16698) Performance issue: handlers stuck waiting for CountDownLatch inside WALKey#getWriteEntry under high writing workload

2016-09-26 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524987#comment-15524987
 ] 

Heng Chen commented on HBASE-16698:
---

How much the performance will be downgrade when ops are just for one region.  
[~carp84] do you have some performance results?  In our production cluster (Not 
big cluster),  many tables have just few regions but QPS is high,  have a 
litter worried about it after we set it to be default.

> Performance issue: handlers stuck waiting for CountDownLatch inside 
> WALKey#getWriteEntry under high writing workload
> 
>
> Key: HBASE-16698
> URL: https://issues.apache.org/jira/browse/HBASE-16698
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Affects Versions: 1.1.6, 1.2.3
>Reporter: Yu Li
>Assignee: Yu Li
> Attachments: HBASE-16698.patch, HBASE-16698.v2.patch, 
> hadoop0495.et2.jstack
>
>
> As titled, on our production environment we observed 98 out of 128 handlers 
> get stuck waiting for the CountDownLatch {{seqNumAssignedLatch}} inside 
> {{WALKey#getWriteEntry}} under a high writing workload.
> After digging into the problem, we found that the problem is mainly caused by 
> advancing mvcc in the append logic. Below is some detailed analysis:
> Under current branch-1 code logic, all batch puts will call 
> {{WALKey#getWriteEntry}} after appending edit to WAL, and 
> {{seqNumAssignedLatch}} is only released when the relative append call is 
> handled by RingBufferEventHandler (see {{FSWALEntry#stampRegionSequenceId}}). 
> Because currently we're using a single event handler for the ringbuffer, the 
> append calls are handled one by one (actually lot's of our current logic 
> depending on this sequential dealing logic), and this becomes a bottleneck 
> under high writing workload.
> The worst part is that by default we only use one WAL per RS, so appends on 
> all regions are dealt with in sequential, which causes contention among 
> different regions...
> To fix this, we could also take use of the "sequential appends" mechanism, 
> that we could grab the WriteEntry before publishing append onto ringbuffer 
> and use it as sequence id, only that we need to add a lock to make "grab 
> WriteEntry" and "append edit" a transaction. This will still cause contention 
> inside a region but could avoid contention between different regions. This 
> solution is already verified in our online environment and proved to be 
> effective.
> Notice that for master (2.0) branch since we already change the write 
> pipeline to sync before writing memstore (HBASE-15158), this issue only 
> exists for the ASYNC_WAL writes scenario.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-25 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16665:
--
   Resolution: Fixed
 Assignee: Heng Chen
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.0
   Status: Resolved  (was: Patch Available)

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>Assignee: Heng Chen
> Fix For: 2.0.0
>
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch, HBASE-16665.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-25 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520300#comment-15520300
 ] 

Heng Chen commented on HBASE-16664:
---

Not sure what is your patch want to do.  

But if you want the rpcTimeout could work,  it seems you could just modify the 
CancellableRegionServerCallable remaining time logic.  Not much code as my 
thought.

> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16664-v1.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16664) Timeout logic in AsyncProcess is broken

2016-09-25 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520289#comment-15520289
 ] 

Heng Chen commented on HBASE-16664:
---

{quote}
The tracker must be started from beginning, not each call.
{quote}
There is no difference start from begining and do it each call,  the logic has 
been controlled in tracker.start inside.

{quote}
And in fact we will create new CancellableRegionServerCallable in each 
retrying, so the operation timeout is broken. 
{quote}
NO,  callable is created outside of AP for delete, mutate. Only batch callable 
will be created each thread.

{quote}
My idea is pass a deadline (currentTime+operationTimeout) when we submit, we 
just check the remaining time and get min of remaining and rpcTimeout for each 
call.
{quote}
It seems you just need to control remaining time for each call with remaining 
operation time and rpcTimeOut




> Timeout logic in AsyncProcess is broken
> ---
>
> Key: HBASE-16664
> URL: https://issues.apache.org/jira/browse/HBASE-16664
> Project: HBase
>  Issue Type: Bug
>Reporter: Phil Yang
>Assignee: Phil Yang
> Attachments: HBASE-16664-v1.patch, testhcm.patch
>
>
> Have not checked the root cause, but I think timeout of all operations in 
> AsyncProcess is broken



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16677) Add table size (total store file size) to table page

2016-09-24 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16677:
--
   Resolution: Fixed
 Assignee: Guang Yang
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.2.4)
   (was: 1.1.7)
   (was: 1.3.1)
   Status: Resolved  (was: Patch Available)

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Assignee: Guang Yang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16677_v0.patch, HBASE-16677_v1.patch, 
> HBASE-16677_v2.patch, HBASE-16677_v3.patch, mini_cluster_master.png, 
> prod_cluster_partial.png, table_page_v3.png
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16677) Add table size (total store file size) to table page

2016-09-24 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520064#comment-15520064
 ] 

Heng Chen commented on HBASE-16677:
---

works locally. Push to master and branch-1

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.7, 1.2.4
>
> Attachments: HBASE-16677_v0.patch, HBASE-16677_v1.patch, 
> HBASE-16677_v2.patch, HBASE-16677_v3.patch, mini_cluster_master.png, 
> prod_cluster_partial.png, table_page_v3.png
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-24 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15519903#comment-15519903
 ] 

Heng Chen commented on HBASE-16665:
---

will commit it if no other concerns.

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch, HBASE-16665.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518113#comment-15518113
 ] 

Heng Chen commented on HBASE-16665:
---

hadoop.hbase.client.TestBlockEvictionFromClient failed w/ or w/o the patch,  it 
will be fixed in HBASE-16702


The other failed or timeout test case could pass locally,  it has no relates 
with the patch.

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch, HBASE-16665.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16677) Add table size (total store file size) to table page

2016-09-23 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16677:
--
Fix Version/s: 1.2.4
   1.1.7
   1.3.1
   1.4.0
   2.0.0

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Priority: Minor
> Fix For: 2.0.0, 1.4.0, 1.3.1, 1.1.7, 1.2.4
>
> Attachments: HBASE-16677_v0.patch, HBASE-16677_v1.patch, 
> HBASE-16677_v2.patch, HBASE-16677_v3.patch, mini_cluster_master.png, 
> prod_cluster_partial.png, table_page_v3.png
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16677) Add table size (total store file size) to table page

2016-09-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15518100#comment-15518100
 ] 

Heng Chen commented on HBASE-16677:
---

LGTM.  +1
Let me test it locally and commit it.  Thanks [~yguang11]

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Priority: Minor
> Attachments: HBASE-16677_v0.patch, HBASE-16677_v1.patch, 
> HBASE-16677_v2.patch, HBASE-16677_v3.patch, mini_cluster_master.png, 
> prod_cluster_partial.png, table_page_v3.png
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16702) TestBlockEvictionFromClient is broken

2016-09-23 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16702:
--
Description: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3682/artifact/patchprocess/patch-unit-hbase-server.txt


> TestBlockEvictionFromClient is broken
> -
>
> Key: HBASE-16702
> URL: https://issues.apache.org/jira/browse/HBASE-16702
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>
> https://builds.apache.org/job/PreCommit-HBASE-Build/3682/artifact/patchprocess/patch-unit-hbase-server.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16702) TestBlockEvictionFromClient is broken

2016-09-23 Thread Heng Chen (JIRA)
Heng Chen created HBASE-16702:
-

 Summary: TestBlockEvictionFromClient is broken
 Key: HBASE-16702
 URL: https://issues.apache.org/jira/browse/HBASE-16702
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-23 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16665:
--
Attachment: HBASE-16665.v3.patch

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch, HBASE-16665.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516028#comment-15516028
 ] 

Heng Chen commented on HBASE-16665:
---

Sounds reasonable.   Empty byte[] will be better.

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-23 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516005#comment-15516005
 ] 

Heng Chen commented on HBASE-16665:
---

v2 Fix failed testcase

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-23 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-16665:
--
Attachment: HBASE-16665.v2.patch

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch, 
> HBASE-16665.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16677) Add table size (total store file size) to table page

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15515236#comment-15515236
 ] 

Heng Chen commented on HBASE-16677:
---

+1 for patch v3,   Not see regions hidden in your upload pic?
{code}
+  This table has <%= numRegions %> regions in total, in order to 
improve the page load time,
+ only <%= numRegionsRendered %> regions are displayed here, click
+ here to see all regions.
+<% } %>
{code}
I mean this hit in your pic.

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Priority: Minor
> Attachments: HBASE-16677_v0.patch, HBASE-16677_v1.patch, 
> HBASE-16677_v2.patch, HBASE-16677_v3.patch, mini_cluster_master.png, 
> prod_cluster_partial.png
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16677) Add table size (total store file size) to table page

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512545#comment-15512545
 ] 

Heng Chen commented on HBASE-16677:
---

That is a good point.  We have the same pain on production cluster (One table 
have 10K+ regions).
Could we add one value something like "all" for regionsParam to list all 
regions?

Upload the pic how table page looks like with the patch?

> Add table size (total store file size) to table page
> 
>
> Key: HBASE-16677
> URL: https://issues.apache.org/jira/browse/HBASE-16677
> Project: HBase
>  Issue Type: New Feature
>  Components: website
>Reporter: Guang Yang
>Priority: Minor
> Attachments: HBASE-16677_v0.patch
>
>
> Currently there is not an easy way to get the table size from the web UI, 
> though we have the region size on the page, it is still convenient to have a 
> table for the table size stat.
> Another pain point is that when the table grow large with tens of thousands 
> of regions, it took extremely long time to load the page, however, sometimes 
> we don't want to check all the regions. An optimization could be to accept a 
> query parameter to specify the number of regions to render.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512417#comment-15512417
 ] 

Heng Chen commented on HBASE-16665:
---

Will open another issue for cleanup,  keep this issue small and clear.  :)   
[~anoop.hbase]

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512330#comment-15512330
 ] 

Heng Chen edited comment on HBASE-16665 at 9/22/16 6:22 AM:


Another thoughts about KeyValueUtil and CellUtil,   we'd better to tell  our 
developers in which scenario should use CellUtil to avoid copy instead of 
KeyValueUtil,   maybe the methods "createXXX" in KeyValueUtil could be removed 
directly (most seems to be used only in test case)?


was (Author: chenheng):
Another thoughts about KeyValueUtil and CellUtil,   we'd better to tell  our 
developers in which scenario should use CellUtil to avoid copy instead of 
KeyValueUtil,   maybe the methods of createXXX could be removed directly (most 
seems to be used only in test case)?

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512330#comment-15512330
 ] 

Heng Chen commented on HBASE-16665:
---

Another thoughts about KeyValueUtil and CellUtil,   we'd better to tell  our 
developers in which scenario should use CellUtil to avoid copy instead of 
KeyValueUtil,   maybe the methods of createXXX could be removed directly (most 
seems to be used only in test case)?

> Check whether KeyValueUtil.createXXX could be replaced by CellUtil without 
> copy
> ---
>
> Key: HBASE-16665
> URL: https://issues.apache.org/jira/browse/HBASE-16665
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
> Attachments: HBASE-16665.patch, HBASE-16665.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16670) Make RpcServer#processRequest logic more robust

2016-09-22 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15512318#comment-15512318
 ] 

Heng Chen commented on HBASE-16670:
---

sounds reasonable.   Will be more friendly to our users.  +1

> Make RpcServer#processRequest logic more robust
> ---
>
> Key: HBASE-16670
> URL: https://issues.apache.org/jira/browse/HBASE-16670
> Project: HBase
>  Issue Type: Bug
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HBASE-16670.patch
>
>
> Currently in {{RpcServer#processRequest}}, we will check whether the request 
> header has parameters but missed handling the abnormal case, so if there's no 
> param in the header, it will throw NPE like below:
> {noformat}
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2269)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
> Caused by: java.lang.NullPointerException
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2211)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >