[jira] [Commented] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112121#comment-16112121 ] Cheney Sun commented on HBASE-11768: @stack, I found that you rebased the patch to master. Is the patch still on the roadmap? If so, I would like to fix the issues found by QA bot. > Register region server in zookeeper by ip address > - > > Key: HBASE-11768 > URL: https://issues.apache.org/jira/browse/HBASE-11768 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 2.0.0 >Reporter: Cheney Sun > Labels: patch > Fix For: 2.0.0 > > Attachments: HBASE-11768.master.001.patch, HBASE_11768.patch > > > HBase cluster isn't always setup along with a DNS server. But regionservers > now register their hostnames in zookeeper, which bring some inconvenience > when regionserver isn't in one DNS server. In such situation, clients have to > maintain the ip/hostname mapping in their /etc/hosts files in order to > resolve the hostname returned from zookeeper to the right address. > However, this causes a lot of pain for clients to maintain the mapping, > especially when adding new machines to the cluster, or some machines' address > changed due to some reason. All clients need to update their host mapping > files. > The issue is to address this problem above, and try to add an option to let > each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-11768: --- Fix Version/s: 2.0.0 Labels: patch (was: ) Status: Patch Available (was: Open) Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun Labels: patch Fix For: 2.0.0 Attachments: HBASE_11768.patch HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109147#comment-14109147 ] Cheney Sun commented on HBASE-11768: Jean, Yes, I have run it in a test cluster with some workload for several days. And so far, so good. Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun Labels: patch Fix For: 2.0.0 Attachments: HBASE_11768.patch HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11768) Register region server in zookeeper by ip address
Cheney Sun created HBASE-11768: -- Summary: Register region server in zookeeper by ip address Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. This causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-11768: --- Description: HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. was: HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. This causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11768) Register region server in zookeeper by ip address
[ https://issues.apache.org/jira/browse/HBASE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-11768: --- Attachment: HBASE_11768.patch I like to provide one patch for review. This patch is rather straightforward, which add one option hbase.regionserver.use.ip to control whether to use ip or hostname in zookeeper. By default, the value is false, to leave the current behavior unchanged. If set the value to true, regionserver ip instead of its hostname registered under the HBASE_ROOT/rs/ip.xx.xxx. Register region server in zookeeper by ip address - Key: HBASE-11768 URL: https://issues.apache.org/jira/browse/HBASE-11768 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.0.0 Reporter: Cheney Sun Attachments: HBASE_11768.patch HBase cluster isn't always setup along with a DNS server. But regionservers now register their hostnames in zookeeper, which bring some inconvenience when regionserver isn't in one DNS server. In such situation, clients have to maintain the ip/hostname mapping in their /etc/hosts files in order to resolve the hostname returned from zookeeper to the right address. However, this causes a lot of pain for clients to maintain the mapping, especially when adding new machines to the cluster, or some machines' address changed due to some reason. All clients need to update their host mapping files. The issue is to address this problem above, and try to add an option to let each regionserver record themself by ip address, instead of hostname only. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726081#comment-13726081 ] Cheney Sun commented on HBASE-9086: --- I already attached the patch to enhance the shell count command. Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-9086: -- Attachment: HBase-9086.patch Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726082#comment-13726082 ] Cheney Sun commented on HBASE-9086: --- Jean-Marc, can you review it? Thanks. Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-9086: -- Attachment: HBase-9086_v0.2.patch Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch, HBase-9086_v0.2.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-9086: -- Attachment: HBase-9086.patch Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch, HBase-9086_v0.2.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheney Sun updated HBASE-9086: -- Attachment: (was: HBase-9086.patch) Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch, HBase-9086_v0.2.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13727225#comment-13727225 ] Cheney Sun commented on HBASE-9086: --- @Lars, no, that's not what I expect. the new patch was uploaded, and pick up the Jean-Marc's suggestion by adding FirstKeyOnlyFilter and KeyOnlyFilter to a Filter list. Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun Attachments: HBase-9086.patch, HBase-9086_v0.2.patch The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9086) Add some options to improve count performance
Cheney Sun created HBASE-9086: - Summary: Add some options to improve count performance Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723800#comment-13723800 ] Cheney Sun commented on HBASE-9086: --- Hi Jean-Marc, thanks for pointing out the possible ways. But these look like not what I want. Both methods you mentioned above would retrieve back some K/V pairs to the client executing the command, right? In our case, this would harm the performance a lot. Let me briefly describe our case: the table schema is [ rowkey (20~30Bytes) | a:info (100~500+kB) | c:ref (empty in most rows) ]. If only specify the column a:info, it wouldn't help much since this column takes the most payload. If only specify c:ref, it wouldn't get the correct result, because most cells in this column are empty and will not be counted. Apparently, only specify the rowkey is the natural way to improve the count performance and also guarantee a correct result. Moreover, when using the count command, user really care about the row number, not the data. For now, I'm not sure if it's easy to implement such patch under the current HBase architecture. Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9086) Add some options to improve count performance
[ https://issues.apache.org/jira/browse/HBASE-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723895#comment-13723895 ] Cheney Sun commented on HBASE-9086: --- Yes, this is exactly what I want. Thanks Jean-Marc. I changed the $HBASE_HOME/lib/ruby/hbase/table.rb in my client machine, change the line 161 to scan.setFilter(org.apache.hadoop.hbase.filter.KeyOnlyFilter.new). A huge improvement was obtained - previously, count 100w lines took more 100 seconds, now count 400w line only need 75 seconds. I would like to work on the ruby scripts to provide the patch tomorrow. BTW, the count command in the shell didn't expose all the arguments in RowCounter, such as the --range option. -Cheney Add some options to improve count performance - Key: HBASE-9086 URL: https://issues.apache.org/jira/browse/HBASE-9086 Project: HBase Issue Type: Wish Components: shell Affects Versions: 0.94.2 Reporter: Cheney Sun The current count command in HBase shell is quite slow if the row size is very big (100+kB each). It would be helpful to provide some option to specify the column to count, which could give user a chance to reduce the data volume to scan. IMHO, only count the row key would be the ideal solution. Not sure how difficult to implement it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira