from:"Liyin Tang \(Updated\) \(JIRA\)"

[jira] [Updated] (HBASE-5474) [89-fb] Share the multiput thread pool for all the HTable instance

2012-03-05 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5474:
--

Summary: [89-fb] Share the multiput thread pool for all the HTable instance 
 (was: Share the multiput thread pool for all the HTable instance)

 [89-fb] Share the multiput thread pool for all the HTable instance
 --

 Key: HBASE-5474
 URL: https://issues.apache.org/jira/browse/HBASE-5474
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 Currently, each HTable instance will have a thread pool for the multiput 
 operation. Each thread pool is actually a cached thread pool, which is 
 bounded the number of region server. So the maximum number of threads will be 
 ( # region server * # htable instance).  On the other hand, if all HTable 
 instance could share this thread pool, the max number threads will still be 
 the same. However, it will increase the thread pool efficiency.
 Also the single put requests are processed within the current thread instead 
 of the thread pool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance

2012-02-27 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5474:
--

Description:
Currently, each HTable instance will have a thread pool for the multiput
operation. Each thread pool is actually a cached thread pool, which is bounded
the number of region server. So the maximum number of threads will be ( #
region server * # htable instance). On the other hand, if all HTable instance
could share this thread pool, the max number threads will still be the same.
However, it will increase the thread pool efficiency.

Also the single put requests are processed within the current thread instead of
the thread pool.

was:
Currently, each HTable instance will have a thread pool for the multiput
operation. Each thread pool is actually a cached thread pool, which is bounded
the number of region server. So the maximum number of threads will be ( #
region server * # htable instance). On the other hand, if all HTable instance
could share this thread pool, the max number threads will still be the same.
However, it will increase the thread pool efficiency.

Shared the multiput thread pool for all the HTable instance
---

Key: HBASE-5474
URL: https://issues.apache.org/jira/browse/HBASE-5474
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

Currently, each HTable instance will have a thread pool for the multiput
operation. Each thread pool is actually a cached thread pool, which is
bounded the number of region server. So the maximum number of threads will be
( # region server * # htable instance). On the other hand, if all HTable
instance could share this thread pool, the max number threads will still be
the same. However, it will increase the thread pool efficiency.
Also the single put requests are processed within the current thread instead
of the thread pool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5474) Share the multiput thread pool for all the HTable instance

2012-02-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5474:
--

Summary: Share the multiput thread pool for all the HTable instance  (was: 
Shared the multiput thread pool for all the HTable instance)

 Share the multiput thread pool for all the HTable instance
 --

 Key: HBASE-5474
 URL: https://issues.apache.org/jira/browse/HBASE-5474
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 Currently, each HTable instance will have a thread pool for the multiput 
 operation. Each thread pool is actually a cached thread pool, which is 
 bounded the number of region server. So the maximum number of threads will be 
 ( # region server * # htable instance).  On the other hand, if all HTable 
 instance could share this thread pool, the max number threads will still be 
 the same. However, it will increase the thread pool efficiency.
 Also the single put requests are processed within the current thread instead 
 of the thread pool.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5474) Shared the multiput thread pool for all the HTable instance

2012-02-24 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5474:
--

was:Currently, each HTable instance will have a thread pool for the multiput
operation. Each thread pool is actually a unbounded cached thread pool. So it
would increase the efficiency if HTable could share this unbounded cached
thread pool across all the HTable instance ?

Shared the multiput thread pool for all the HTable instance
---

Key: HBASE-5474
URL: https://issues.apache.org/jira/browse/HBASE-5474
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

[jira] [Updated] (HBASE-5438) A tool to check region balancing for a particular table

2012-02-21 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5438:
--

Attachment: 0001-hbase-5438.patch

 A tool to check region balancing for a particular table
 ---

 Key: HBASE-5438
 URL: https://issues.apache.org/jira/browse/HBASE-5438
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: 0001-hbase-5438.patch, D1827.1.patch, D1827.1.patch, 
 D1827.1.patch


 When debugging the table level region imbalance problem, I write a tool to 
 check how the region balanced across all the region server for a particular 
 table.
 bin/hbase org.jruby.Main region_balance_checker.rb test_table

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5407) Show the per-region level request/sec count in the web ui

2012-02-15 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5407:
--

Description: It would be nice to show the per-region level request/sec 
count in the web ui, especially when debugging the hot region problem.  (was: 
It would be nice to show the per-region level request count in the web ui, 
especially when debugging the hot region problem.)
Summary: Show the per-region level request/sec count in the web ui  
(was: Show the per-region level request count in the web ui)

 Show the per-region level request/sec count in the web ui
 -

 Key: HBASE-5407
 URL: https://issues.apache.org/jira/browse/HBASE-5407
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 It would be nice to show the per-region level request/sec count in the web 
 ui, especially when debugging the hot region problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5369) Compaction selection based on the hotness of the HFile's block in the block cache

2012-02-10 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5369:
--

Description:
HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction selection
algorithm could just simply skip compact these HFiles until these block cache
become cold.

For example, if there is a HFile and 80% of blocks for this HFile is be cached,
which means this HFile is really hot, then just skip this HFile during the
compaction selection.
The percentage of hot blocks should be configured as a high bar to make sure
that HBase are still making progress for the compaction.

was:
HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction selection
algorithm could just simply skip compact these HFiles until these block cache
become cold.

Compaction selection based on the hotness of the HFile's block in the block
cache
-

Key: HBASE-5369
URL: https://issues.apache.org/jira/browse/HBASE-5369
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

HBase reserves a large set memory for the block cache and the cached blocks
will be age out in a LRU fashion. Obviously, we don't want to age out the
blocks which are still hot. However, when the compactions are starting, these
hot blocks may naturally be invalid. Considering that the block cache has
already known which HFiles these hot blocks come from, the compaction
selection algorithm could just simply skip compact these HFiles until these
block cache become cold.
For example, if there is a HFile and 80% of blocks for this HFile is be
cached, which means this HFile is really hot, then just skip this HFile
during the compaction selection.
The percentage of hot blocks should be configured as a high bar to make sure
that HBase are still making progress for the compaction.

[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection

2012-02-09 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5199:
--

Attachment: (was: HBASE-5199.patch)

 Delete out of TTL store files before compaction selection
 -

 Key: HBASE-5199
 URL: https://issues.apache.org/jira/browse/HBASE-5199
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, 
 D1311.4.patch, D1311.5.patch, D1311.5.patch


 Currently, HBase deletes the out of TTL store files after compaction. We can 
 change the sequence to delete the out of TTL store files before selecting 
 store files for compactions. 
 In this way, HBase can keep deleting the old invalid store files without 
 compaction, and also prevent from unnecessary compactions since the out of 
 TTL store files will be deleted before the compaction selection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection

2012-02-09 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5199:
--

Attachment: hbase-5199.patch

The new patch is rebased on the latest trunk and all the unit tests are passed.

 Delete out of TTL store files before compaction selection
 -

 Key: HBASE-5199
 URL: https://issues.apache.org/jira/browse/HBASE-5199
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, 
 D1311.4.patch, D1311.5.patch, D1311.5.patch, hbase-5199.patch


 Currently, HBase deletes the out of TTL store files after compaction. We can 
 change the sequence to delete the out of TTL store files before selecting 
 store files for compactions. 
 In this way, HBase can keep deleting the old invalid store files without 
 compaction, and also prevent from unnecessary compactions since the out of 
 TTL store files will be deleted before the compaction selection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

2012-01-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5259:
--

Attachment: HBASE-5259.patch

 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, 
 D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection

2012-01-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5199:
--

Attachment: HBASE-5199.patch

 Delete out of TTL store files before compaction selection
 -

 Key: HBASE-5199
 URL: https://issues.apache.org/jira/browse/HBASE-5199
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, 
 D1311.4.patch, D1311.5.patch, D1311.5.patch, HBASE-5199.patch


 Currently, HBase deletes the out of TTL store files after compaction. We can 
 change the sequence to delete the out of TTL store files before selecting 
 store files for compactions. 
 In this way, HBase can keep deleting the old invalid store files without 
 compaction, and also prevent from unnecessary compactions since the out of 
 TTL store files will be deleted before the compaction selection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection

2012-01-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5199:
--

Release Note: set hbase.store.delete.expired.storefile as true to enable 
the expired store file deletion
  Status: Patch Available  (was: Open)

 Delete out of TTL store files before compaction selection
 -

 Key: HBASE-5199
 URL: https://issues.apache.org/jira/browse/HBASE-5199
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1311.1.patch, D1311.2.patch, D1311.3.patch, 
 D1311.4.patch, D1311.5.patch, D1311.5.patch, HBASE-5199.patch


 Currently, HBase deletes the out of TTL store files after compaction. We can 
 change the sequence to delete the out of TTL store files before selecting 
 store files for compactions. 
 In this way, HBase can keep deleting the old invalid store files without 
 compaction, and also prevent from unnecessary compactions since the out of 
 TTL store files will be deleted before the compaction selection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

2012-01-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5259:
--

Status: Patch Available  (was: Open)

 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, 
 D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5199) Delete out of TTL store files before compaction selection

2012-01-14 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5199:
--

Description:
Currently, HBase deletes the out of TTL store files after compaction. We can
change the sequence to delete the out of TTL store files before selecting store
files for compactions.
In this way, HBase can keep deleting the old invalid store files without
compaction, and also prevent from unnecessary compactions since the out of TTL
store files will be deleted before the compaction selection.

was:
Currently, HBase deletes the out of TTL store files after major compaction. We
can change the sequence to delete the out of TTL store files before selecting
store files for compactions.
In this way, HBase can keep deleting the old invalid store files without major
compaction, and also prevent from unnecessary major compactions since the out
of TTL store files will be deleted before the compaction selection.

Delete out of TTL store files before compaction selection
-

Key: HBASE-5199
URL: https://issues.apache.org/jira/browse/HBASE-5199
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

Currently, HBase deletes the out of TTL store files after compaction. We can
change the sequence to delete the out of TTL store files before selecting
store files for compactions.
In this way, HBase can keep deleting the old invalid store files without
compaction, and also prevent from unnecessary compactions since the out of
TTL store files will be deleted before the compaction selection.

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-13 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
 D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: (was: HBASE-5033-apach-trunk.patch)

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, 
 D933.4.patch, D933.5.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: HBASE-5033-apach-trunk.patch

1) Based on the recent trunk and generate the patch --no-prefix
2) The default number of thread is set to 1.
3) Performance evaluation: the performance will be vary for different cluster 
environment such as the number of regions and the number of store files for 
each region.

The simple restart test shows the single region server (22 regions) restart 
time decreased from 78 sec to 55 sec 
So this will roughly save about 29% region server restart time. Also the 
cluster (100 nodes) restart time decreased from 316 secs to 189 secs, which has 
saved around 40% restart time.



 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: 5033.txt, D933.1.patch, D933.2.patch, D933.3.patch, 
 D933.4.patch, D933.5.patch, HBASE-5033-apach-trunk.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: (was: HBASE-5033-apach-trunk.patch)

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
 D933.3.patch, D933.4.patch, D933.5.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2012-01-11 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: HBASE-5033.patch

Resubmit the patch. Thanks Ted for correction.

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: 5033-trunk.txt, 5033.txt, D933.1.patch, D933.2.patch, 
 D933.3.patch, D933.4.patch, D933.5.patch, HBASE-5033.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-28 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Attachment: HBASE-5033-apach-trunk.patch

The patch is based on the apache trunk. 
This patch has passed all the unit tests except for TestCoprocessorEndpoint, 
which is failed with or without this change.

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, 
 D933.5.patch, HBASE-5033-apach-trunk.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-28 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Status: Patch Available  (was: Open)

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D933.1.patch, D933.2.patch, D933.3.patch, D933.4.patch, 
 D933.5.patch, HBASE-5033-apach-trunk.patch


 Region servers are opening/closing each store and each store file for every 
 store in sequential fashion, which may cause inefficiency to open/close 
 regions. 
 So this diff is to open/close each store in parallel in order to reduce 
 region open/close time. Also it would help to reduce the cluster restart time.
 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5032) Add other DELETE or DELETE into the delete bloom filter

2011-12-28 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5032:
--

Description:
To speed up time range scans we need to seek to the maximum timestamp of the
requested range,instead of going to the first KV of the (row, column) pair and
iterating from there. If we don't know the (row, column), e.g. if it is not
specified in the query, we need to go to end of the current row/column pair
first, get a KV from there, and do another seek to (row', column',
timerange_max) from there. We can only skip over to the timerange_max timestamp
when we know that there are no DeleteColumn records at the top of that
row/column with a higher timestamp. We can utilize another Bloom filter keyed
on (row, column) to quickly find that out. (From HBASE-4962)

So the motivation is to save seek ops for scanning time-range queries if we
know there is no delete for this row/column.

From the implementation prospective, we have already have a delete family
bloom filter which contains all the

was:
Previously, the delete family bloom filter only contains the row key which has
the delete family. It helps us to avoid the top-row seek operation.

This jira attempts to add the delete column into this delete bloom filter as
well (rename the delete family bloom filter as delete bloom filter).

The motivation is to save seek ops for scan time-range queries if we know there
is no delete column for this row/column.
We can seek directly to the exact timestamp we are interested in, instead of
seeking to the latest timestamp and keeping skipping to find out whether there
is any delete column before the interested timestamp.

Summary: Add other DELETE or DELETE into the delete bloom filter
(was: Add DELETE COLUMN into the delete bloom filter)

Add other DELETE or DELETE into the delete bloom filter

Key: HBASE-5032
URL: https://issues.apache.org/jira/browse/HBASE-5032
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

To speed up time range scans we need to seek to the maximum timestamp of the
requested range,instead of going to the first KV of the (row, column) pair
and iterating from there. If we don't know the (row, column), e.g. if it is
not specified in the query, we need to go to end of the current row/column
pair first, get a KV from there, and do another seek to (row', column',
timerange_max) from there. We can only skip over to the timerange_max
timestamp when we know that there are no DeleteColumn records at the top of
that row/column with a higher timestamp. We can utilize another Bloom filter
keyed on (row, column) to quickly find that out. (From HBASE-4962)
So the motivation is to save seek ops for scanning time-range queries if we
know there is no delete for this row/column.
From the implementation prospective, we have already have a delete family
bloom filter which contains all the

[jira] [Updated] (HBASE-5032) Add other DELETE type information into the delete bloom filter to optimize the time range query

2011-12-28 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5032:
--

So the motivation is to save seek ops for scanning time-range queries if we
know there is no delete for this row/column.

From the implementation prospective, we have already had a delete family bloom
filter which contains all the delete family key values. So we can reuse the
same bloom filter for all other kinds of delete information such as delete
columns or delete.

was:
To speed up time range scans we need to seek to the maximum timestamp of the
requested range,instead of going to the first KV of the (row, column) pair and
iterating from there. If we don't know the (row, column), e.g. if it is not
specified in the query, we need to go to end of the current row/column pair
first, get a KV from there, and do another seek to (row', column',
timerange_max) from there. We can only skip over to the timerange_max timestamp
when we know that there are no DeleteColumn records at the top of that
row/column with a higher timestamp. We can utilize another Bloom filter keyed
on (row, column) to quickly find that out. (From HBASE-4962)

So the motivation is to save seek ops for scanning time-range queries if we
know there is no delete for this row/column.

From the implementation prospective, we have already have a delete family
bloom filter which contains all the

Summary: Add other DELETE type information into the delete bloom filter
to optimize the time range query (was: Add other DELETE or DELETE into the
delete bloom filter)

Add other DELETE type information into the delete bloom filter to optimize
the time range query
---

Key: HBASE-5032
URL: https://issues.apache.org/jira/browse/HBASE-5032
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

To speed up time range scans we need to seek to the maximum timestamp of the
requested range,instead of going to the first KV of the (row, column) pair
and iterating from there. If we don't know the (row, column), e.g. if it is
not specified in the query, we need to go to end of the current row/column
pair first, get a KV from there, and do another seek to (row', column',
timerange_max) from there. We can only skip over to the timerange_max
timestamp when we know that there are no DeleteColumn records at the top of
that row/column with a higher timestamp. We can utilize another Bloom filter
keyed on (row, column) to quickly find that out. (From HBASE-4962)
So the motivation is to save seek ops for scanning time-range queries if we
know there is no delete for this row/column.
From the implementation prospective, we have already had a delete family
bloom filter which contains all the delete family key values. So we can reuse
the same bloom filter for all other kinds of delete information such as
delete columns or delete.

[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-20 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4698:
--

Status: Patch Available  (was: Open)

 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-19 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-5033:
--

Description: 
1) Opening each store in parallel
2) Loading each store file for every store in parallel
3) Closing each store in parallel
4) Closing each store file for every store in parallel.

  was:Opening store files in parallel to reduce region open time

Summary: Opening/Closing store in parallel to reduce region open/close 
time  (was: Opening store files in parallel to reduce region open time)

 Opening/Closing store in parallel to reduce region open/close time
 --

 Key: HBASE-5033
 URL: https://issues.apache.org/jira/browse/HBASE-5033
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 1) Opening each store in parallel
 2) Loading each store file for every store in parallel
 3) Closing each store in parallel
 4) Closing each store file for every store in parallel.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5033) Opening/Closing store in parallel to reduce region open/close time

2011-12-19 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-5033:
--

Description:
Region servers are opening/closing each store and each store file for every
store in sequential fashion, which may cause inefficiency to open/close
regions.

So this diff is to open/close each store in parallel in order to reduce region
open/close time. Also it would help to reduce the cluster restart time.

1) Opening each store in parallel
2) Loading each store file for every store in parallel
3) Closing each store in parallel
4) Closing each store file for every store in parallel.

was:
1) Opening each store in parallel
2) Loading each store file for every store in parallel
3) Closing each store in parallel
4) Closing each store file for every store in parallel.

Opening/Closing store in parallel to reduce region open/close time
--

Key: HBASE-5033
URL: https://issues.apache.org/jira/browse/HBASE-5033
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Attachments: D933.1.patch

Region servers are opening/closing each store and each store file for every
store in sequential fashion, which may cause inefficiency to open/close
regions.
So this diff is to open/close each store in parallel in order to reduce
region open/close time. Also it would help to reduce the cluster restart time.
1) Opening each store in parallel
2) Loading each store file for every store in parallel
3) Closing each store in parallel
4) Closing each store file for every store in parallel.

[jira] [Updated] (HBASE-4698) Let the HFile Pretty Printer print all the key values for a specific row.

2011-12-19 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4698:
--

Attachment: HBASE-4689-trunk.patch

The diff based on 89-fb has been accepted and committed.
So generate this patch rebasing on the latest apache trunk.
 

 Let the HFile Pretty Printer print all the key values for a specific row.
 -

 Key: HBASE-4698
 URL: https://issues.apache.org/jira/browse/HBASE-4698
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D111.1.patch, D111.1.patch, D111.1.patch, D111.2.patch, 
 D111.3.patch, D111.4.patch, HBASE-4689-trunk.patch


 When using HFile Pretty Printer to debug HBase issues, 
 it would very nice to allow the Pretty Printer to seek to a specific row, and 
 only print all the key values for this row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics

2011-10-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4689:
--

Attachment: (was: hbase-4689.patch)

 [89-fb] Make the table level metrics work with rpc* metrics
 ---

 Key: HBASE-4689
 URL: https://issues.apache.org/jira/browse/HBASE-4689
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.89.20100924
Reporter: Liyin Tang
Assignee: Liyin Tang

 In r1182034, the per table/cf for rpc* metrics has a bug. It will only show 
 cf level metrics even though we enabled the per table level metrics.
 Fix this bug here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics

2011-10-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4689:
--

Attachment: (was: hbase-4689.patch)

 [89-fb] Make the table level metrics work with rpc* metrics
 ---

 Key: HBASE-4689
 URL: https://issues.apache.org/jira/browse/HBASE-4689
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.89.20100924
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4689.patch


 In r1182034, the per table/cf for rpc* metrics has a bug. It will only show 
 cf level metrics even though we enabled the per table level metrics.
 Fix this bug here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4689) [89-fb] Make the table level metrics work with rpc* metrics

2011-10-27 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4689:
--

Attachment: hbase-4689.patch

 [89-fb] Make the table level metrics work with rpc* metrics
 ---

 Key: HBASE-4689
 URL: https://issues.apache.org/jira/browse/HBASE-4689
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.89.20100924
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4689.patch


 In r1182034, the per table/cf for rpc* metrics has a bug. It will only show 
 cf level metrics even though we enabled the per table level metrics.
 Fix this bug here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4191) hbase load balancer needs locality awareness

2011-10-24 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-4191:
--

Description:
Previously, HBASE-4114 implements the metrics for HFile HDFS block locality,
which provides the HFile level locality information.
But in order to work with load balancer and region assignment, we need the
region level locality information.

Let's define the region locality information first, which is almost the same as
HFile locality index.

HRegion locality index (HRegion A, RegionServer B) =
(Total number of HDFS blocks that can be retrieved locally by the RegionServer
B for the HRegion A) / ( Total number of the HDFS blocks for the Region A)
So the HRegion locality index tells us that how much locality we can get if the
HMaster assign the HRegion A to the RegionServer B.

So there will be 2 steps involved to assign regions based on the locality.
1) During the cluster start up time, the master will scan the hdfs to calculate
the HRegion locality index for each pair of HRegion and Region Server. It is
pretty expensive to scan the dfs. So we only needs to do this once during the
start up time.

2) During the cluster run time, each region server will update the HRegion
locality index as metrics periodically as HBASE-4114 did. The Region Server
can expose them to the Master through ZK, meta table, or just RPC messages.

Based on the HRegion locality index, the assignment manager in the master
would have a global knowledge about the region locality distribution. Imaging
the HRegion locality index as the capacity between the region server set and
region set, the assignment manager could the run the MAXIMUM FLOW solver to
reach the global optimization.

In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on
the same metrics, to proactively to scan dfs to calculate the global locality
information in the cluster. It will help us to verify data locality information
during the run time.

was:
HBASE-4114 implemented getTopBlockLocations().
Load balancer should utilize this method and assign the region to be moved to
the region server with the highest block affinity.

Issue Type: New Feature (was: Improvement)
Summary: hbase load balancer needs locality awareness (was: Utilize
getTopBlockLocations in load balancer)

hbase load balancer needs locality awareness

Key: HBASE-4191
URL: https://issues.apache.org/jira/browse/HBASE-4191
Project: HBase
Issue Type: New Feature
Reporter: Ted Yu
Assignee: Liyin Tang

Previously, HBASE-4114 implements the metrics for HFile HDFS block locality,
which provides the HFile level locality information.
But in order to work with load balancer and region assignment, we need the
region level locality information.
Let's define the region locality information first, which is almost the same
as HFile locality index.
HRegion locality index (HRegion A, RegionServer B) =
(Total number of HDFS blocks that can be retrieved locally by the
RegionServer B for the HRegion A) / ( Total number of the HDFS blocks for the
Region A)
So the HRegion locality index tells us that how much locality we can get if
the HMaster assign the HRegion A to the RegionServer B.
So there will be 2 steps involved to assign regions based on the locality.
1) During the cluster start up time, the master will scan the hdfs to
calculate the HRegion locality index for each pair of HRegion and Region
Server. It is pretty expensive to scan the dfs. So we only needs to do this
once during the start up time.
2) During the cluster run time, each region server will update the HRegion
locality index as metrics periodically as HBASE-4114 did. The Region Server
can expose them to the Master through ZK, meta table, or just RPC messages.
Based on the HRegion locality index, the assignment manager in the master
would have a global knowledge about the region locality distribution. Imaging
the HRegion locality index as the capacity between the region server set
and region set, the assignment manager could the run the MAXIMUM FLOW solver
to reach the global optimization.
In addition, the HBASE-4491 (Locality Checker) is the tool, which is based on
the same metrics, to proactively to scan dfs to calculate the global locality
information in the cluster. It will help us to verify data locality
information during the run time.

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up ROWCOL bloomfilter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4469:
--

Summary: Avoid top row seek by looking up ROWCOL bloomfilter  (was: Avoid 
top row seek by looking up bloomfilter)

 Avoid top row seek by looking up ROWCOL bloomfilter
 ---

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.0

 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-4532:
--

Description:
The previous jira, HBASE-4469, is to avoid the top row seek operation if
row-col bloom filter is enabled.
This jira tries to avoid top row seek for all the cases by creating a dedicated
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty
column.

For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.

Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%

So we can only get 10% more seek savings if the ROWCOL bloom filter is
enabled.[HBASE-4469]

After this change:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=NONE, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROW, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with
optimization: 1458 (58.18%), savings: 41.82%

So we can get 10% more seek savings for ALL kinds of bloom filter.

was:
HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
This jira tries to avoid top row seek for all the cases by creating a dedicated
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty
column.

The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.

Avoid top row seek by dedicated bloom filter for delete family bloom filter
---

Key: HBASE-4532
URL: https://issues.apache.org/jira/browse/HBASE-4532
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Attachments: D27.1.patch, D27.1.patch

The previous jira, HBASE-4469, is to avoid the top row seek operation if
row-col bloom filter is enabled.
This jira tries to avoid top row seek for all the cases by creating a
dedicated bloom filter only for delete family
The only subtle use case is when we are interested in the top row with empty
column.
For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last
kv for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.
The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.
Evaluation from TestSeekOptimization:
Previously:
For bloom=NONE, compr=NONE total seeks without optimization: 2506, with
optimization: 1714 (68.40%), savings: 31.60%
For bloom=ROW, compr=NONE total seeks without optimization: 2506, with
optimization:

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Attachment: hbase-4532-89-fb.patch

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-21 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4532:
--

Attachment: HBASE-4532-apache-trunk.patch

 Avoid top row seek by dedicated bloom filter for delete family bloom filter
 ---

 Key: HBASE-4532
 URL: https://issues.apache.org/jira/browse/HBASE-4532
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D27.1.patch, D27.1.patch, HBASE-4532-apache-trunk.patch, 
 hbase-4532-89-fb.patch


 The previous jira, HBASE-4469, is to avoid the top row seek operation if 
 row-col bloom filter is enabled. 
 This jira tries to avoid top row seek for all the cases by creating a 
 dedicated bloom filter only for delete family
 The only subtle use case is when we are interested in the top row with empty 
 column.
 For example, 
 we are interested in row1/cf1:/1/put.
 So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family 
 bloom filter will say there is NO delete family.
 Then it will avoid the top row seek and return a fake kv, which is the last 
 kv for this row (createLastOnRowCol).
 In this way, we have already missed the real kv we are interested in.
 The solution for the above problem is to disable this optimization if we are 
 trying to GET/SCAN a row with empty column.
 Evaluation from TestSeekOptimization:
 Previously:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1714 (68.40%), savings: 31.60%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings ONLY if the ROWCOL bloom filter is 
 enabled.[HBASE-4469]
 
 After this change:
 For bloom=NONE, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=NONE total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=NONE, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROW, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with 
 optimization: 1458 (58.18%), savings: 41.82%
 So we can get about 10% more seek savings for ALL kinds of bloom filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui

2011-10-18 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4418:
--

Attachment: (was: HBASE-4418_1.patch)

 Show all the hbase configuration in the web ui
 --

 Key: HBASE-4418
 URL: https://issues.apache.org/jira/browse/HBASE-4418
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The motivation is to show ALL the HBase configuration, which takes effect in 
 the run time, in a global place.
 So we can easily know which configuration takes effect and what the value is.
 The configuration shows all the HBase and DFS configuration entry in the 
 configuration file and also includes all the HBase default setting in the 
 code, which is not the config file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui

2011-10-18 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4418:
--

Attachment: (was: HBASE-4418_2.patch)

 Show all the hbase configuration in the web ui
 --

 Key: HBASE-4418
 URL: https://issues.apache.org/jira/browse/HBASE-4418
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 The motivation is to show ALL the HBase configuration, which takes effect in 
 the run time, in a global place.
 So we can easily know which configuration takes effect and what the value is.
 The configuration shows all the HBase and DFS configuration entry in the 
 configuration file and also includes all the HBase default setting in the 
 code, which is not the config file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui

2011-10-18 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4418:
--

Attachment: hbase-4418-apache-trunk.patch

Using refection to check whether the hadoop supports showing configuration in 
/conf servlet (HADOOP-6408).
If the hadoop supports HADOOP-6408, the master and region server web ui will 
have the link to show HBase configuration.

 Show all the hbase configuration in the web ui
 --

 Key: HBASE-4418
 URL: https://issues.apache.org/jira/browse/HBASE-4418
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4418-apache-trunk.patch


 The motivation is to show ALL the HBase configuration, which takes effect in 
 the run time, in a global place.
 So we can easily know which configuration takes effect and what the value is.
 The configuration shows all the HBase and DFS configuration entry in the 
 configuration file and also includes all the HBase default setting in the 
 code, which is not the config file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-17 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: (was: hbase-4585-apache.patch)

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-17 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: (was: hbase-4585-trunk.patch)

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-17 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: hbase-4585-apache.patch

The patch for apache-trunk is ready.

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-17 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: hbase-4585-apache-trunk.patch

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch, hbase-4585-apache-trunk.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4592) Get CacheStats request count based on the HFile block type

2011-10-14 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4592:
--

Summary: Get CacheStats request count based on the HFile block type   (was: 
Get HFile request count based on the block type)

 Get CacheStats request count based on the HFile block type 
 ---

 Key: HBASE-4592
 URL: https://issues.apache.org/jira/browse/HBASE-4592
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang

 Currently, the CacheStats can only get the total request count for all the 
 block type. 
 We can break down this metrics to different block type.
 It will benefit us to get more fine grained metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for delete family bloom filter

2011-10-14 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-4532:
--

Description:
HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
This jira tries to avoid top row seek for all the cases by creating a dedicated
bloom filter only for delete family

The only subtle use case is when we are interested in the top row with empty
column.

The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.

Previous solution is to create the dedicated bloom filter for delete family,
which does not work if there is a row with empty column.
For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last kv
for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.

The root cause is that even there is no delete family at top row, we still
cannot avoid the top row seek.
We can ONLY avoid the top row seek when there is no row with empty column, no
matter what kind of kv type (delete/deleteCol/deleteFamily/put).
So the current solution is to create the dedicate bloom filter for row with
empty column.

Summary: Avoid top row seek by dedicated bloom filter for delete family
bloom filter (was: Avoid top row seek by dedicated bloom filter for row with
empty column)

Avoid top row seek by dedicated bloom filter for delete family bloom filter
---

Key: HBASE-4532
URL: https://issues.apache.org/jira/browse/HBASE-4532
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
This jira tries to avoid top row seek for all the cases by creating a
dedicated bloom filter only for delete family
The only subtle use case is when we are interested in the top row with empty
column.
For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last
kv for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.
The solution for the above problem is to disable this optimization if we are
trying to GET/SCAN a row with empty column.

[jira] [Updated] (HBASE-4592) Break down request count based on the HFile block type

2011-10-14 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4592:
--

Summary: Break down request count based on the HFile block type   (was: Get 
CacheStats request count based on the HFile block type )

 Break down request count based on the HFile block type 
 ---

 Key: HBASE-4592
 URL: https://issues.apache.org/jira/browse/HBASE-4592
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang

 Currently, the CacheStats can only get the total request count for all the 
 block type. 
 We can break down this metrics to different block type.
 It will benefit us to get more fine grained metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

2011-10-13 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4469:
--

Attachment: HBASE-4469_1.patch

 Avoid top row seek by looking up bloomfilter
 

 Key: HBASE-4469
 URL: https://issues.apache.org/jira/browse/HBASE-4469
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-4469_1.patch


 The problem is that when seeking for the row/col in the hfile, we will go to 
 top of the row in order to check for row delete marker (delete family). 
 However, if the bloomfilter is enabled for the column family, then if a 
 delete family operation is done on a row, the row is already being added to 
 bloomfilter. We can take advantage of this factor to avoid seeking to the top 
 of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4568) Make zk dump jsp response more quickly

2011-10-13 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4568:
--

Description: 
1) For each zk dump, currently hbase will create a zk client instance every 
time. 
This is quite slow when any machines in the quorum is dead. Because it will 
connect to each machine in the zk quorum again.

code
HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
Configuration conf = master.getConfiguration();
HBaseAdmin hbadmin = new HBaseAdmin(conf);
HConnection connection = hbadmin.getConnection();
ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();
/code

So we can simplify this:
code
HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
ZooKeeperWatcher watcher = master.getZooKeeperWatcher();
/code

2) Also when hbase call getServerStats() for each machine in the zk quorum, it 
hard coded the default time out as 1 min. 
It would be nice to make this configurable and set it to a low time out.

When hbase tries to connect to each machine in the zk quorum, it will create 
the socket, and then set the socket time out, and read it with this time out.
It means hbase will create a socket and connect to the zk server with 0 time 
out at first, which will take a long time. 
Because a timeout of zero is interpreted as an infinite timeout. The connection 
will then block until established or an error occurs.

3) The recoverable zookeeper should be real exponentially backoff when there is 
connection loss exception, which will give hbase much longer time window to 
recover from zk machine failures.

  was:
For each zk dump, currently hbase will create a zk client instance every time. 
This is quite slow when any machines in the quorum is dead. Because it will 
connect to each machine in the zk quorum again.

  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  Configuration conf = master.getConfiguration();
  HBaseAdmin hbadmin = new HBaseAdmin(conf);
  HConnection connection = hbadmin.getConnection();
  ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();


So we can simplify this:
  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  ZooKeeperWatcher watcher = master.getZooKeeperWatcher();


Also when hbase call getServerStats() for each machine in the zk quorum, it 
hard coded the default time out as 1 min. 
It would be nice to make this configurable and set it to a low time out.

When hbase tries to connect to each machine in the zk quorum, it will create 
the socket, and then set the socket time out, and read it with this time out.
It means hbase will create a socket and connect to the zk server with 0 time 
out at first, which will take a long time. 
Because a timeout of zero is interpreted as an infinite timeout. The connection 
will then block until established or an error occurs.





 Make zk dump jsp response more quickly
 --

 Key: HBASE-4568
 URL: https://issues.apache.org/jira/browse/HBASE-4568
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-4568.patch


 1) For each zk dump, currently hbase will create a zk client instance every 
 time. 
 This is quite slow when any machines in the quorum is dead. Because it will 
 connect to each machine in the zk quorum again.
 code
 HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
 Configuration conf = master.getConfiguration();
 HBaseAdmin hbadmin = new HBaseAdmin(conf);
 HConnection connection = hbadmin.getConnection();
 ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();
 /code
 So we can simplify this:
 code
 HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
 ZooKeeperWatcher watcher = master.getZooKeeperWatcher();
 /code
 2) Also when hbase call getServerStats() for each machine in the zk quorum, 
 it hard coded the default time out as 1 min. 
 It would be nice to make this configurable and set it to a low time out.
 When hbase tries to connect to each machine in the zk quorum, it will create 
 the socket, and then set the socket time out, and read it with this time out.
 It means hbase will create a socket and connect to the zk server with 0 time 
 out at first, which will take a long time. 
 Because a timeout of zero is interpreted as an infinite timeout. The 
 connection will then block until established or an error occurs.
 3) The recoverable zookeeper should be real exponentially backoff when there 
 is connection loss exception, which will give hbase much longer time window 
 to recover from zk machine failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-13 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: hbase-4585-89.patch

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4585) Avoid seek operation when current kv is deleted

2011-10-13 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4585:
--

Attachment: hbase-4585-trunk.patch

 Avoid seek operation when current kv is deleted
 ---

 Key: HBASE-4585
 URL: https://issues.apache.org/jira/browse/HBASE-4585
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: hbase-4585-89.patch, hbase-4585-trunk.patch


 When the current kv is deleted during the matching in the ScanQueryMatcher, 
 currently the matcher will return skip and continue to seek.
 Actually, if the current kv is deleted because of family deleted or column 
 deleted, the matcher should seek to next col.
 If the current kv is deleted because of version deleted, the matcher should 
 just return skip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4532) Avoid top row seek by dedicated bloom filter for row with empty column

2011-10-13 Thread Liyin Tang (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liyin Tang updated HBASE-4532:
--

Summary: Avoid top row seek by dedicated bloom filter for row with
empty column (was: Avoid top row seek by dedicated bloom filter for delete
family)

Avoid top row seek by dedicated bloom filter for row with empty column
--

Key: HBASE-4532
URL: https://issues.apache.org/jira/browse/HBASE-4532
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
This jira tries to avoid top row seek for all the cases by creating a
dedicated bloom filter only for row with empty column.
Previous solution is to create the dedicated bloom filter for delete family,
which does not work if there is a row with empty column.
For example,
we are interested in row1/cf1:/1/put.
So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
Then it will avoid the top row seek and return a fake kv, which is the last
kv for this row (createLastOnRowCol).
In this way, we have already missed the real kv we are interested in.
The root cause is that even there is no delete family at top row, we still
cannot avoid the top row seek.
We can ONLY avoid the top row seek when there is no row with empty column, no
matter what kind of kv type (delete/deleteCol/deleteFamily/put).
So the current solution is to create the dedicate bloom filter for row with
empty column.

[jira] [Updated] (HBASE-4568) Make zk dump jsp response more quickly

2011-10-10 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4568:
--

Description: 
For each zk dump, currently hbase will create a zk client instance every time. 
This is quite slow when any machines in the quorum is dead. Because it will 
connect to each machine in the zk quorum again.

  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  Configuration conf = master.getConfiguration();
  HBaseAdmin hbadmin = new HBaseAdmin(conf);
  HConnection connection = hbadmin.getConnection();
  ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();


So we can simplify this:
  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  ZooKeeperWatcher watcher = master.getZooKeeperWatcher();


Also when hbase call getServerStats() for each machine in the zk quorum, it 
hard coded the default time out as 1 min. 
It would be nice to make this configurable and set it to a low time out.

When hbase tries to connect to each machine in the zk quorum, it will create 
the socket, and then set the socket time out, and read it with this time out.
It means hbase will create a socket and connect to the zk server with 0 time 
out at first, which will take a long time. 
Because a timeout of zero is interpreted as an infinite timeout. The connection 
will then block until established or an error occurs.




  was:
For each zk dump, currently hbase will create a zk client instance every time. 
This is quite slow when any machines in the quorum is dead. Because it will 
connect to each machine in the zk quorum again.

  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  Configuration conf = master.getConfiguration();
  HBaseAdmin hbadmin = new HBaseAdmin(conf);
  HConnection connection = hbadmin.getConnection();
  ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();


So we can simplify this:
  HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
  ZooKeeperWatcher watcher = master.getZooKeeperWatcher();


Also when hbase call getServerStats() for each machine in the zk quorum, it 
hard coded the default time out as 1 min. 
It would be nice to make this configurable and set it to a low time out.



 Make zk dump jsp response more quickly
 --

 Key: HBASE-4568
 URL: https://issues.apache.org/jira/browse/HBASE-4568
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang

 For each zk dump, currently hbase will create a zk client instance every 
 time. 
 This is quite slow when any machines in the quorum is dead. Because it will 
 connect to each machine in the zk quorum again.
   HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
   Configuration conf = master.getConfiguration();
   HBaseAdmin hbadmin = new HBaseAdmin(conf);
   HConnection connection = hbadmin.getConnection();
   ZooKeeperWatcher watcher = connection.getZooKeeperWatcher();
 So we can simplify this:
   HMaster master = (HMaster)getServletContext().getAttribute(HMaster.MASTER);
   ZooKeeperWatcher watcher = master.getZooKeeperWatcher();
 Also when hbase call getServerStats() for each machine in the zk quorum, it 
 hard coded the default time out as 1 min. 
 It would be nice to make this configurable and set it to a low time out.
 When hbase tries to connect to each machine in the zk quorum, it will create 
 the socket, and then set the socket time out, and read it with this time out.
 It means hbase will create a socket and connect to the zk server with 0 time 
 out at first, which will take a long time. 
 Because a timeout of zero is interpreted as an infinite timeout. The 
 connection will then block until established or an error occurs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4418) Show all the hbase configuration in the web ui

2011-09-30 Thread Liyin Tang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-4418:
--

Attachment: HBASE-4418_1.patch

 Show all the hbase configuration in the web ui
 --

 Key: HBASE-4418
 URL: https://issues.apache.org/jira/browse/HBASE-4418
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-4418_1.patch


 The motivation is to show ALL the HBase configuration, which takes effect in 
 the run time, in a global place.
 So we can easily know which configuration takes effect and what the value is.
 The configuration shows all the HBase and DFS configuration entry in the 
 configuration file and also includes all the HBase default setting in the 
 code, which is not the config file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

52 matches

Mail list logo