[jira] [Commented] (HBASE-11768) Register region server in zookeeper by ip address

2017-08-02 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112121#comment-16112121
 ] 

Cheney Sun commented on HBASE-11768:


@stack, I found that you rebased the patch to  master. Is the patch still on 
the roadmap? If so, I would like to fix the issues found by QA bot.

> Register region server in zookeeper by ip address
> -
>
> Key: HBASE-11768
> URL: https://issues.apache.org/jira/browse/HBASE-11768
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 2.0.0
>Reporter: Cheney Sun
>  Labels: patch
> Fix For: 2.0.0
>
> Attachments: HBASE-11768.master.001.patch, HBASE_11768.patch
>
>
> HBase cluster isn't always setup along with a DNS server. But regionservers 
> now register their hostnames in zookeeper, which bring some inconvenience 
> when regionserver isn't in one DNS server. In such situation, clients have to 
> maintain the ip/hostname mapping in their /etc/hosts files in order to 
> resolve the hostname returned from zookeeper to the right address. 
> However, this causes a lot of pain for clients to maintain the mapping, 
> especially when adding new machines to the cluster, or some machines' address 
> changed due to some reason. All clients need to update their host mapping 
> files. 
> The issue is to address this problem above, and try to add an option to let 
> each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address

2014-08-25 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-11768:
---

Fix Version/s: 2.0.0
   Labels: patch  (was: )
   Status: Patch Available  (was: Open)

 Register region server in zookeeper by ip address
 -

 Key: HBASE-11768
 URL: https://issues.apache.org/jira/browse/HBASE-11768
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Cheney Sun
  Labels: patch
 Fix For: 2.0.0

 Attachments: HBASE_11768.patch


 HBase cluster isn't always setup along with a DNS server. But regionservers 
 now register their hostnames in zookeeper, which bring some inconvenience 
 when regionserver isn't in one DNS server. In such situation, clients have to 
 maintain the ip/hostname mapping in their /etc/hosts files in order to 
 resolve the hostname returned from zookeeper to the right address. 
 However, this causes a lot of pain for clients to maintain the mapping, 
 especially when adding new machines to the cluster, or some machines' address 
 changed due to some reason. All clients need to update their host mapping 
 files. 
 The issue is to address this problem above, and try to add an option to let 
 each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11768) Register region server in zookeeper by ip address

2014-08-25 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109147#comment-14109147
 ] 

Cheney Sun commented on HBASE-11768:


Jean, Yes, I have run it in a test cluster with some workload for several days. 
And so far, so good.

 Register region server in zookeeper by ip address
 -

 Key: HBASE-11768
 URL: https://issues.apache.org/jira/browse/HBASE-11768
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Cheney Sun
  Labels: patch
 Fix For: 2.0.0

 Attachments: HBASE_11768.patch


 HBase cluster isn't always setup along with a DNS server. But regionservers 
 now register their hostnames in zookeeper, which bring some inconvenience 
 when regionserver isn't in one DNS server. In such situation, clients have to 
 maintain the ip/hostname mapping in their /etc/hosts files in order to 
 resolve the hostname returned from zookeeper to the right address. 
 However, this causes a lot of pain for clients to maintain the mapping, 
 especially when adding new machines to the cluster, or some machines' address 
 changed due to some reason. All clients need to update their host mapping 
 files. 
 The issue is to address this problem above, and try to add an option to let 
 each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11768) Register region server in zookeeper by ip address

2014-08-18 Thread Cheney Sun (JIRA)
Cheney Sun created HBASE-11768:
--

 Summary: Register region server in zookeeper by ip address
 Key: HBASE-11768
 URL: https://issues.apache.org/jira/browse/HBASE-11768
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Cheney Sun


HBase cluster isn't always setup along with a DNS server. But regionservers now 
register their hostnames in zookeeper, which bring some inconvenience when 
regionserver isn't in one DNS server. In such situation, clients have to 
maintain the ip/hostname mapping in their /etc/hosts files in order to resolve 
the hostname returned from zookeeper to the right address. 
This causes a lot of pain for clients to maintain the mapping, especially when 
adding new machines to the cluster, or some machines' address changed due to 
some reason. All clients need to update their host mapping files. 

The issue is to address this problem above, and try to add an option to let 
each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address

2014-08-18 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-11768:
---

Description: 
HBase cluster isn't always setup along with a DNS server. But regionservers now 
register their hostnames in zookeeper, which bring some inconvenience when 
regionserver isn't in one DNS server. In such situation, clients have to 
maintain the ip/hostname mapping in their /etc/hosts files in order to resolve 
the hostname returned from zookeeper to the right address. 

However, this causes a lot of pain for clients to maintain the mapping, 
especially when adding new machines to the cluster, or some machines' address 
changed due to some reason. All clients need to update their host mapping 
files. 

The issue is to address this problem above, and try to add an option to let 
each regionserver record themself by ip address, instead of hostname only.

  was:
HBase cluster isn't always setup along with a DNS server. But regionservers now 
register their hostnames in zookeeper, which bring some inconvenience when 
regionserver isn't in one DNS server. In such situation, clients have to 
maintain the ip/hostname mapping in their /etc/hosts files in order to resolve 
the hostname returned from zookeeper to the right address. 
This causes a lot of pain for clients to maintain the mapping, especially when 
adding new machines to the cluster, or some machines' address changed due to 
some reason. All clients need to update their host mapping files. 

The issue is to address this problem above, and try to add an option to let 
each regionserver record themself by ip address, instead of hostname only.


 Register region server in zookeeper by ip address
 -

 Key: HBASE-11768
 URL: https://issues.apache.org/jira/browse/HBASE-11768
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Cheney Sun

 HBase cluster isn't always setup along with a DNS server. But regionservers 
 now register their hostnames in zookeeper, which bring some inconvenience 
 when regionserver isn't in one DNS server. In such situation, clients have to 
 maintain the ip/hostname mapping in their /etc/hosts files in order to 
 resolve the hostname returned from zookeeper to the right address. 
 However, this causes a lot of pain for clients to maintain the mapping, 
 especially when adding new machines to the cluster, or some machines' address 
 changed due to some reason. All clients need to update their host mapping 
 files. 
 The issue is to address this problem above, and try to add an option to let 
 each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address

2014-08-18 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-11768:
---

Attachment: HBASE_11768.patch

I like to provide one patch for review.

This patch is rather straightforward, which add one option 
hbase.regionserver.use.ip to control whether to use ip or hostname in 
zookeeper. 

By default, the value is false, to leave the current behavior unchanged. If set 
the value to true, regionserver ip instead of its hostname registered under the 
HBASE_ROOT/rs/ip.xx.xxx.



 Register region server in zookeeper by ip address
 -

 Key: HBASE-11768
 URL: https://issues.apache.org/jira/browse/HBASE-11768
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 2.0.0
Reporter: Cheney Sun
 Attachments: HBASE_11768.patch


 HBase cluster isn't always setup along with a DNS server. But regionservers 
 now register their hostnames in zookeeper, which bring some inconvenience 
 when regionserver isn't in one DNS server. In such situation, clients have to 
 maintain the ip/hostname mapping in their /etc/hosts files in order to 
 resolve the hostname returned from zookeeper to the right address. 
 However, this causes a lot of pain for clients to maintain the mapping, 
 especially when adding new machines to the cluster, or some machines' address 
 changed due to some reason. All clients need to update their host mapping 
 files. 
 The issue is to address this problem above, and try to add an option to let 
 each regionserver record themself by ip address, instead of hostname only.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726081#comment-13726081
 ] 

Cheney Sun commented on HBASE-9086:
---

I already attached the patch to enhance the shell count command.

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-9086:
--

Attachment: HBase-9086.patch

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726082#comment-13726082
 ] 

Cheney Sun commented on HBASE-9086:
---

Jean-Marc, can you review it? Thanks.

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-9086:
--

Attachment: HBase-9086_v0.2.patch

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch, HBase-9086_v0.2.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-9086:
--

Attachment: HBase-9086.patch

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch, HBase-9086_v0.2.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheney Sun updated HBASE-9086:
--

Attachment: (was: HBase-9086.patch)

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch, HBase-9086_v0.2.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9086) Add some options to improve count performance

2013-08-01 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727225#comment-13727225
 ] 

Cheney Sun commented on HBASE-9086:
---

@Lars, no, that's not what I expect. the new patch was uploaded, and pick up 
the Jean-Marc's suggestion by adding FirstKeyOnlyFilter and KeyOnlyFilter to a 
Filter list. 

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun
 Attachments: HBase-9086.patch, HBase-9086_v0.2.patch


 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9086) Add some options to improve count performance

2013-07-30 Thread Cheney Sun (JIRA)
Cheney Sun created HBASE-9086:
-

 Summary: Add some options to improve count performance
 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun


The current count command in HBase shell is quite slow if the row size is very 
big (100+kB each). It would be helpful to provide some option to specify the 
column to count, which could give user a chance to reduce the data volume to 
scan. 
IMHO, only count the row key would be the ideal solution. Not sure how 
difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9086) Add some options to improve count performance

2013-07-30 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723800#comment-13723800
 ] 

Cheney Sun commented on HBASE-9086:
---

Hi Jean-Marc, thanks for pointing out the possible ways. But these look like 
not what I want. Both methods you mentioned above would retrieve back some K/V 
pairs to the client executing the command, right? In our case, this would harm 
the performance a lot. Let me briefly describe our case: the table schema is [ 
rowkey (20~30Bytes) | a:info (100~500+kB) | c:ref (empty in most rows) ]. If 
only specify the column a:info, it wouldn't help much since this column takes 
the most payload. If only specify c:ref, it wouldn't get the correct result, 
because most cells in this column are empty and will not be counted. 

Apparently, only specify the rowkey is the natural way to improve the count 
performance and also guarantee a correct result. Moreover, when using the count 
command, user really care about the row number, not the data. 

For now, I'm not sure if it's easy to implement such patch under the current 
HBase architecture.

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun

 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9086) Add some options to improve count performance

2013-07-30 Thread Cheney Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723895#comment-13723895
 ] 

Cheney Sun commented on HBASE-9086:
---

Yes, this is exactly what I want. Thanks Jean-Marc. I changed the 
$HBASE_HOME/lib/ruby/hbase/table.rb in my client machine, change the line 161 
to scan.setFilter(org.apache.hadoop.hbase.filter.KeyOnlyFilter.new). A huge 
improvement was obtained - previously, count 100w lines took more 100 seconds, 
now count 400w line only need 75 seconds. 
I would like to work on the ruby scripts to provide the patch tomorrow. 
BTW, the count command in the shell didn't expose all the arguments in 
RowCounter, such as the --range option. 

-Cheney

 Add some options to improve count performance
 -

 Key: HBASE-9086
 URL: https://issues.apache.org/jira/browse/HBASE-9086
 Project: HBase
  Issue Type: Wish
  Components: shell
Affects Versions: 0.94.2
Reporter: Cheney Sun

 The current count command in HBase shell is quite slow if the row size is 
 very big (100+kB each). It would be helpful to provide some option to specify 
 the column to count, which could give user a chance to reduce the data volume 
 to scan. 
 IMHO, only count the row key would be the ideal solution. Not sure how 
 difficult to implement it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira