[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBaseStreamingScanDesign.pdf Design Document Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Description: A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache ---so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. was: A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache---so the cache is empty---the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache ---so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13071) Hbase Streaming Scan Feature
Eshcar Hillel created HBASE-13071: - Summary: Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache---so the cache is empty---the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Description: A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. was: A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache ---so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_10.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_10.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367025#comment-14367025 ] Eshcar Hillel commented on HBASE-13090: --- Could be useful to return a *non* empty result array even when the region is not exhausted. For example, if the scanner is async (HBASE-13071) the application can start iterating over the results instead of waiting for the server to collect the entire batch. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367026#comment-14367026 ] Eshcar Hillel commented on HBASE-13090: --- Could be useful to return a *non* empty result array even when the region is not exhausted. For example, if the scanner is async (HBASE-13071) the application can start iterating over the results instead of waiting for the server to collect the entire batch. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell Assignee: Jonathan Lawlor Attachments: HBASE-13090-v1.patch, HBASE-13090-v2.patch, HBASE-13090-v3.patch, HBASE-13090-v3.patch It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367166#comment-14367166 ] Eshcar Hillel commented on HBASE-13071: --- Hi everyone, What would be the next thing to do to get this patch in (now that all the lights are green ;) )? Thanks, Eshcar Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366767#comment-14366767 ] Eshcar Hillel commented on HBASE-13071: --- Yes it's all about setting the delays, but I don't want to change them to make the results look better.They are there just to make the point. From: Edward Bortnikov (JIRA) j...@apache.org To: esh...@yahoo-inc.com Sent: Monday, March 16, 2015 7:52 AM Subject: [jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature [ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362777#comment-14362777 ] Edward Bortnikov commented on HBASE-13071: -- Eshcar, Do you have an idea why there are still steps in the async graph? This probably means that our delays are not long enough. Eddie On Monday, March 16, 2015 1:14 AM, Eshcar Hillel (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_10.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- This message was sent by Atlassian JIRA (v6.3.4#6332) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_9.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_10.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HbaseStreamingScanEvaluationwithMultipleClients.pdf Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362606#comment-14362606 ] Eshcar Hillel commented on HBASE-13071: --- New patch is attached. Also attached the evaluation results for multiple parallel scanners. Bottom line, on client side results show similar latency improvement trends for multiple async scanners as for a single scanner thread. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353865#comment-14353865 ] Eshcar Hillel commented on HBASE-13071: --- A new patch is attached following the comments by [~jonathan.lawlor] and [~stack]. Some notes on implementation and design: * The default value is now set to async. (btw, this means async scanner is used in multiple tests, which used to have sync scan.) * The responsibility to invoke super.close() is now shifted to the pending prefetch thread, so it is not missed. * In case of sync scanner, the caching parameter indicates both the size of the buffer and the chunk size (#rows fetched). In case of async scanner, the parameter only indicates the later, while the buffer size is doubled. This should now be clear from the documentation, as well as from the new methods getCacheCapacity() and getThresholdSize(). * cache and caching were members of ClientScanner even before this patch. I only added the abstract initCache() method. I agree that having two abstract classes is not the cleanest solution, but neither is having initCache() in a class where not all subclasses have a cache. As I said before, this hierarchy can benefit from some re-factoring (the right design might use composition like in the strategy pattern instead of inheritance, but all these decisions should not be in the scope of the current Jira). Some notes on performance: * This feature is a client side feature and therefore should be tested in terms of client side latency. * This feature should reduce the latency, and in worse case scenario should not increase it (at least not significantly) * On the server side I would expect the same behavior as in sync scanner, since the same RPC calls are invoked, they only shift earlier in time to have the data ready at the client side before the user needs it. * I cannot explain the behavior of the low humps in your test. Do you see this consistently? What is the exact setting? Is it a fixed number of scans or a fixed time? Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_5.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_7.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355648#comment-14355648 ] Eshcar Hillel commented on HBASE-13071: --- I think the best way to test this patch is to use the extended version of YCSB which supports measuring multi-step operations like scans (see the link to the code - I added the code in a separate branch). The attached evaluation file describes the settings I used in my test. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Affects Version/s: (was: 0.98.11) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_6.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_8.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_2.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13090) Progress heartbeats for long running scanners
[ https://issues.apache.org/jira/browse/HBASE-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343108#comment-14343108 ] Eshcar Hillel commented on HBASE-13090: --- In addition to a timer, or as an alternative to it, one can consider capping prefetched data at the server side by counting the number of rows scanned at each prefetch step. A capping factor limits the max number of rows to be scanned before returning the result to the client. This way when the limit is exceeded the server sends whatever data it gathered so far. If no data was found it only sends a heartbeat. When finished scanning the region signal that it is exhausted. At the client side, the scanner continuos to scan agains the current region until it is exhausted. Progress heartbeats for long running scanners - Key: HBASE-13090 URL: https://issues.apache.org/jira/browse/HBASE-13090 Project: HBase Issue Type: New Feature Reporter: Andrew Purtell It can be necessary to set very long timeouts for clients that issue scans over large regions when all data in the region might be filtered out depending on scan criteria. This is a usability concern because it can be hard to identify what worst case timeout to use until scans are occasionally/intermittently failing in production, depending on variable scan criteria. It would be better if the client-server scan protocol can send back periodic progress heartbeats to clients as long as server scanners are alive and making progress. This is related but orthogonal to streaming scan (HBASE-13071). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348304#comment-14348304 ] Eshcar Hillel commented on HBASE-13071: --- I will work on a new version following the comments above (will take a few days). [~stack] I will get back with a full answer to your questions, first I want to do some additional perf tests on my side. The cause of the behavior of the tall humps can be rooted in the way you performed the tests. What is the size of the prefetch? 30? If the tests simply call next in a loop without actually processing the data (which is simulated with delays in my tests) then the user exhaust the cache very quickly even though the prefetch is done in the background, and therefore the behavior is equivalent to a sync scan when the app needs to wait for the current prefetch to complete. It doesn't need to wait for the prefetch thread to complete loading the cache at the client side but this is minor when compared to the round trip time at the server side. As I mentioned before, the assumption underlying this new feature is that the processing time at the client side can be balanced by the network and IO at the server side. If the processing is short then the network+IO is still a bottleneck. Makes sense? Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, gc.eshcar.png, hits.eshcar.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_4.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345929#comment-14345929 ] Eshcar Hillel commented on HBASE-13071: --- Thanks [~stack] for your comments, I applied most of them. ** The cache is defined in the context of ClientScanner, therefore initializing it and the prefetch methods are defined here. IMHO, the entire hierarchy requires major refactoring (e.g., due to code replication), but this should be done in the scope of a different jira :). ** How would you suggest to get a hold of the thread executing the prefetch, so as to interrupt it on close? ** Apologies for the formatting irregularities. I use IntelliJ which fails to import the eclipse formatting as suggested in the help page you referred me to. ** Waiting (patiently) for the pictures... Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071-v1.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071-v2.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_98_1.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_1.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342109#comment-14342109 ] Eshcar Hillel commented on HBASE-13071: --- New patches for 0.98 and trunk are available. Link to review board. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_3.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332166#comment-14332166 ] Eshcar Hillel commented on HBASE-13071: --- Thanks all for your comments. @stack, I wasn't aware of the discussions in the other Jiras, thanks for putting the links -- I am now updated. @ Lars, the concurrent queue in the suggested modification is implemented as a LinkedBlockingQueue (which in addition to efficient put and get operations provides an efficient count operation). But we can discuss alternatives, including devising a dedicated data structure if it looks this can improve performance. The suggested modification focuses on managing the concurrent queue at the client side, but still applies the pull model, where the client pulls the data from the server. To support a true streaming, a push model, where the server is pushing the data to the client, might be better. In both cases a concurrent queue is part of the solution. I am attaching some evaluation results. Next step is to provide a patch for 0.98. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HbaseStreamingScanEvaluation.pdf Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14332170#comment-14332170 ] Eshcar Hillel commented on HBASE-13071: --- This config setting can be easily removed. Our main concern was to allow backward compatibility for users, and specifically to maintain scan behavior, unless explicitly asked to use asynchronous scanner. Since the asynchronous scanner uses a concurrent data structure which entails some overhead, in some cases -- like short scans -- the caller might prefer to use a sync scan. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338596#comment-14338596 ] Eshcar Hillel commented on HBASE-13071: --- I tried running the code from master, but encountered problems (even without the patch). Specifically, when running commands from hbase shell wasn't able to exit the shell properly, therefore created the patch based on 0.98. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBASE-13071-v2.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338581#comment-14338581 ] Eshcar Hillel commented on HBASE-13071: --- I use the following commands to create the patches git format-patch 0.98 --minimal --stdout HBASE-13071-v1.patch git diff --no-prefix 0.98 HBASE-13071-v2.patch Any idea why these can't be applied? Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBASE-13071-v2.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338599#comment-14338599 ] Eshcar Hillel commented on HBASE-13071: --- I tried running the code from master, but encountered problems (even without the patch). Specifically, when running commands from hbase shell wasn't able to exit the shell properly, therefore created the patch based on 0.98. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBASE-13071-v2.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338438#comment-14338438 ] Eshcar Hillel commented on HBASE-13071: --- I've just attached the patch. The default value for scanners is sync. This can be easily changed. Addressing the issue raised by [~jonathan.lawlor]: The prefetch logic is the same for sync and async scanners. Therefore, async scanner stops RPCs if the max result size is exceeded. However, since the prefetch is executed in the background, it is possible that the size of the data inside the cache exceeds the max size set by the user (which cannot happen with sync scanner). There are ways to handle this, but this requires knowing the size of the data in the cache at any point and limiting the size of the data retrieved from the server with respect to this size. This may reduce the performance gain. I plan to attach a patch of the YCSB extension if anyone wants to re-run the experiments. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Affects Version/s: 0.98.11 Status: Patch Available (was: Open) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071-v2.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Affects Versions: 0.98.11 Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBASE-13071-v2.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071-v1.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13071-v1.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503549#comment-14503549 ] Eshcar Hillel commented on HBASE-13071: --- Done rebase. Thanks to HBASE-13090 next and loadCache methods are separated so this rebase wasn't too painful (thanks [~jonathan.lawlor]). I also changed some new scanner tests to account for the change in scanner cache interface (it is now a Queue). Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_rebase_1.0.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531326#comment-14531326 ] Eshcar Hillel commented on HBASE-13071: --- Aligning with the size-in-bytes basis for scan requests - here is a snippet of the code to set the cache capacity and to determine whether or not to invoke prefetch when next() is called in ClientAsyncPrefetchScanner {code} // double buffer - double cache size private int calcCacheCapacity() { int capacity = Integer.MAX_VALUE; if(caching = 0 caching (Integer.MAX_VALUE /2)) { capacity = caching * 2 + 1; } if(capacity == Integer.MAX_VALUE){ capacity = (int) (maxScannerResultSize / ESTIMATED_SINGLE_RESULT_SIZE); } return capacity; } private boolean prefetchCondition() { return (getCacheCount() getCountThreshold()) (getCacheSizeInBytes() getSizeThreshold()) ; } private int getCountThreshold() { return cacheCapacity / 2 ; } private long getSizeThreshold() { return maxScannerResultSize / 2 ; } {code} where cacheSizeInBytes is an AtomicInteger that is updated whenever the cache is (increased when adding results to cache, decreased when removing them). Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071_trunk_rebase_2.0.patch New patch available. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538419#comment-14538419 ] Eshcar Hillel commented on HBASE-13071: --- 2 check styles error added in this patch: (1) forgot to remove redundant import in ClientSimpleScanner, (2) added a line to the method loadCache() in ClientScanner which caused it to overflow (151 lines). Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538418#comment-14538418 ] Eshcar Hillel commented on HBASE-13071: --- 2 check styles error added in this patch: (1) forgot to remove redundant import in ClientSimpleScanner, (2) added a line to the method loadCache() in ClientScanner which caused it to overflow (151 lines). Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_2.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_4.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_1.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_3.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_98_1.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071-0_98.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071-0_98.patch, HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071-BRANCH-1.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071-0_98.patch, HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: HBASE-13071-trunk-bug-fix.patch Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071-trunk-bug-fix.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547627#comment-14547627 ] Eshcar Hillel commented on HBASE-13071: --- Hi ~stack, Attached 2 new patches for branch-1 and 0.98. While preparing these patches I discovered that in asynchronous scanner the cache byte-size variable is not updated in one of the places where polling item from the cache. Therefore I also attach a patch to fix this bug in trunk - it is a small local fix in ClientAsyncPrefetchScanner.java (this is already fixed in the patches for branch-1 and 0.98). Will you be able to apply the patches? Also do we need to open a new Jira for the refugee patch or is it ok to post it here? Thanks, Eshcar Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071-0_98.patch, HBASE-13071-BRANCH-1.patch, HBASE-13071-trunk-bug-fix.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_5.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_10.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_8.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_7.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_6.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Attachment: (was: HBASE-13071_trunk_9.patch) Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Assignee: Eshcar Hillel Fix For: 2.0.0 Attachments: 99.eshcar.png, HBASE-13071_trunk_rebase_1.0.patch, HBASE-13071_trunk_rebase_2.0.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, Releasenote-13071.txt, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497714#comment-14497714 ] Eshcar Hillel commented on HBASE-13071: --- ClientScanner is an abstract class that bares the code shared by the sync and async scanner classes, like the prefetch method. #prefetch does not replace #next, it is invoked from #next in ClientSimpleScanner (the sync scanner) thereby preserving the same sync behavior as before. In ClientAsyncPrefetchScanner the prefetch method is invoked in the run method of a background thread when the buffer at the client side is half full. I hope this makes sense. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494748#comment-14494748 ] Eshcar Hillel commented on HBASE-13071: --- I looked into the PerformanceEvaluation tool, the code is easy to read and maintain. I believe the changes that are required in the implementation of testRow() in ScanTest: * set caching to 100 (or even to DEFAULT_HBASE_CLIENT_SCANNER_CACHING) instead of 30 * add timeout before calling testScanner.next() [I think you already added this one] * make sure setFilter(FilterAllFilter) is not invoked and optionally, add a scanRange10 class to do really big scans [~stack], do you have by any chance the results of the client latency distribution collected by the tool in your previous experiments? BTW, 30 is not the default value for prefetch size. DEFAULT_HBASE_CLIENT_SCANNER_CACHING is set to 100 in 0.98 and to Integer.MAX_VALUE in master. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482738#comment-14482738 ] Eshcar Hillel commented on HBASE-13071: --- Thanks [~stack] for running this rig tests. I believe the right way to see the benefit of this feature is to measure the scan.next() latency at the client side, there you should see the latency going down as you increase the delays. Obviously, an async scanner puts more pressure on the server since the rate it is asking for records is higher. Since you are already stress testing the server with 50 (heavy scanners) clients, it could be that the extra pressure the async clients put on the server push it beyond its peak point. Other than that, what is the prefetch size you are using? I assume it is less than 100. The scenarios in which async scanner would have maximum gain is when the client side processing (i.e., delays) are equal to the server side I/O time + network delays. If the prefetch size is too small the network delays are more pronounced, and therefore the delays should be longer. Finally, [~stack] could you please share the client code you use for your tests, either via this Jira or send it directly to me, so I can take a closer look, and try it out myself. Hbase Streaming Scan Feature Key: HBASE-13071 URL: https://issues.apache.org/jira/browse/HBASE-13071 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: 99.eshcar.png, HBASE-13071_98_1.patch, HBASE-13071_trunk_1.patch, HBASE-13071_trunk_10.patch, HBASE-13071_trunk_2.patch, HBASE-13071_trunk_3.patch, HBASE-13071_trunk_4.patch, HBASE-13071_trunk_5.patch, HBASE-13071_trunk_6.patch, HBASE-13071_trunk_7.patch, HBASE-13071_trunk_8.patch, HBASE-13071_trunk_9.patch, HBaseStreamingScanDesign.pdf, HbaseStreamingScanEvaluation.pdf, HbaseStreamingScanEvaluationwithMultipleClients.pdf, gc.delay.png, gc.eshcar.png, gc.png, hits.delay.png, hits.eshcar.png, hits.png, latency.delay.png, latency.png, network.png A scan operation iterates over all rows of a table or a subrange of the table. The synchronous nature in which the data is served at the client side hinders the speed the application traverses the data: it increases the overall processing time, and may cause a great variance in the times the application waits for the next piece of data. The scanner next() method at the client side invokes an RPC to the regionserver and then stores the results in a cache. The application can specify how many rows will be transmitted per RPC; by default this is set to 100 rows. The cache can be considered as a producer-consumer queue, where the hbase client pushes the data to the queue and the application consumes it. Currently this queue is synchronous, i.e., blocking. More specifically, when the application consumed all the data from the cache --- so the cache is empty --- the hbase client retrieves additional data from the server and re-fills the cache with new data. During this time the application is blocked. Under the assumption that the application processing time can be balanced by the time it takes to retrieve the data, an asynchronous approach can reduce the time the application is waiting for data. We attach a design document. We also have a patch that is based on a private branch, and some evaluation results of this code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482938#comment-14482938 ] Eshcar Hillel commented on HBASE-13408: --- Thank you [~zhangduo] for raising the important WAL truncating issue and [~lhofhansl] and [~stack] for raising the components format issue. These two issues should definitely be addressed in our solution. 1. When the memstore compactor completes a compaction it can inquire the resulting component for the oldest record sequence id, and use it to apply WAL truncation. This might not be good enough in all scenarios, in which case the memstore should get into a panic mode and do a real flush. So there are several triggers for entering a panic mode, one relates to the memstore size and the other relates to the WAL size. 2. The CellSetMgr and CellSetScanner abstractions we suggested should allow for easy support of any cell storage format. Specifically, the active set can use a skip-list to absorb the updates and the compactor can generate b-trees or any other cache friendly format. We can use a Factory pattern for this purpose. There is no technical challenge in making this feature available for all column families; however, we believe in-memory columns have better chance of benefiting from it while in the general case this memstore could put a burden on the region server. If you believe this has the potential to improve performance also in other scenarios there is no reason not to make it a first citizen column type. A CellSetMgr, as explained above, is an abstraction of the cell set storage, be it skip list or a b-tree, w/o SLAB, compressed or not, and any other details that should be encapsulated and de-coupled from the users of these objects. HBASE-5311 suggested using an RCU-like mechanism to protect the components (layers) of the memstore as they shift around, and also applied a freezing phase. Our solution uses the existing sync mechanism to push the component into the pipeline. Once the component is in the pipeline it is read-only, therefore can be accessed without using locks. The only part we might need to protect is when we swap the subset of pipeline components with the new single compacted component. This should be as easy as changing a pointer and can use an RCU as well. When no protection is applied a concurrent reader can miss this swap then it goes through the “old” components, which is more expensive but is still correct. Shifting a component from the pipeline to the snapshot should be the same as shifting it from active set to snapshot (as it is done today). HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13408) HBase In-Memory Memstore Compaction
Eshcar Hillel created HBASE-13408: - Summary: HBase In-Memory Memstore Compaction Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1. The data is kept in memory for as long as possible 2. Memstore data is either compacted or in process of being compacted 3. Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13719) Asynchronous scanner -- cache size-in-bytes bug fix
[ https://issues.apache.org/jira/browse/HBASE-13719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13719: -- Attachment: HBASE-13071-trunk-bug-fix.patch Asynchronous scanner -- cache size-in-bytes bug fix --- Key: HBASE-13719 URL: https://issues.apache.org/jira/browse/HBASE-13719 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel Attachments: HBASE-13071-trunk-bug-fix.patch Hbase Streaming Scan is a feature recently added to trunk. In this feature, an asynchronous scanner pre-loads data to the cache based on its size (both row count and size in bytes). In one of the locations where the scanner polls an item from the cache, the variable holding the estimated byte size of the cache is not updated. This affects the decision of when to load the next batch of data. A bug fix patch is attached - it comprises only local changes to the ClientAsyncPrefetchScanner.java file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13719) Asynchronous scanner -- cache size-in-bytes bug fix
Eshcar Hillel created HBASE-13719: - Summary: Asynchronous scanner -- cache size-in-bytes bug fix Key: HBASE-13719 URL: https://issues.apache.org/jira/browse/HBASE-13719 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel Hbase Streaming Scan is a feature recently added to trunk. In this feature, an asynchronous scanner pre-loads data to the cache based on its size (both row count and size in bytes). In one of the locations where the scanner polls an item from the cache, the variable holding the estimated byte size of the cache is not updated. This affects the decision of when to load the next batch of data. A bug fix patch is attached - it comprises only local changes to the ClientAsyncPrefetchScanner.java file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13719) Asynchronous scanner -- cache size-in-bytes bug fix
[ https://issues.apache.org/jira/browse/HBASE-13719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13719: -- Status: Patch Available (was: Open) Asynchronous scanner -- cache size-in-bytes bug fix --- Key: HBASE-13719 URL: https://issues.apache.org/jira/browse/HBASE-13719 Project: HBase Issue Type: Bug Reporter: Eshcar Hillel Attachments: HBASE-13071-trunk-bug-fix.patch Hbase Streaming Scan is a feature recently added to trunk. In this feature, an asynchronous scanner pre-loads data to the cache based on its size (both row count and size in bytes). In one of the locations where the scanner polls an item from the cache, the variable holding the estimated byte size of the cache is not updated. This affects the decision of when to load the next batch of data. A bug fix patch is attached - it comprises only local changes to the ClientAsyncPrefetchScanner.java file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13071) Hbase Streaming Scan Feature
[ https://issues.apache.org/jira/browse/HBASE-13071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13071: -- Release Note: MOTIVATION A pipelined scan API is introduced for speeding up applications that combine massive data traversal with compute-intensive processing. Traditional HBase scans save network trips through prefetching the data to the client side cache. However, they prefetch synchronously: the fetch request to regionserver is invoked only when the entire cache is consumed. This leads to a stop-and-wait access pattern, in which the client stalls until the next chunk of data is fetched. Applications that do significant processing can benefit from background data prefetching, which eliminates this bottleneck. The pipelined scan implementation overlaps the cache population at the client side with application processing. Namely, it issues a new scan RPC when the iteration retrieves 50% of the cache. If the application processing (that is, the time between invocations of next()) is substantial, the new chunk of data will be available before the previous one is exhausted, and the client will not experience any delay. Ideally, the prefetch and the processing times should be balanced. API AND CONFIGURATION Asynchronous scanning can be configured either globally for all tables and scans, or on per-scan basis via a new Scan class API. Configuration in hbase-site.xml: hbase.client.scanner.async.prefetch, default false: property namehbase.client.scanner.async.prefetch/name valuetrue/value /property API - Scan#setAsyncPrefetch(boolean) Scan scan = new Scan(); scan.setCaching(1000); scan.setMaxResultSize(BIG_SIZE); scan.setAsyncPrefetch(true); ... ResultScanner scanner = table.getScanner(scan); IMPLEMENTATION NOTES Pipelined scan is implemented by a new ClientAsyncPrefetchScanner class, which is fully API-compatible with the synchronous ClientSimpleScanner. ClientAsyncPrefetchScanner is not instantiated in case of small (Scan#setSmall) and reversed (Scan#setReversed) scanners. The application is responsible for setting the prefetch size in a way that the prefetch time and the processing times are balanced. Note that due to double buffering, the client side cache can use twice as much memory as the synchronous scanner. Generally, this feature will put more load on the server (higher fetch rate -- which is the whole point). Also, YMMV. was: MOTIVATION A pipelined scan API is introduced for speeding up applications that combine massive data traversal with compute-intensive processing. Traditional HBase scans save network trips through prefetching the data to the client side cache. However, they prefetch synchronously: the fetch request to regionserver is invoked only when the entire cache is consumed. This leads to a stop-and-wait access pattern, in which the client stalls until the next chunk of data is fetched. Applications that do significant processing can benefit from background data prefetching, which eliminates this bottleneck. The pipelined scan implementation overlaps the cache population at the client side with application processing. Namely, it issues a new scan RPC when the iteration retrieves 50% of the cache. If the application processing (that is, the time between invocations of next()) is substantial, the new chunk of data will be available before the previous one is exhausted, and the client will not experience any delay. Ideally, the prefetch and the processing times should be balanced. API AND CONFIGURATION Asynchronous scanning can be configured either globally for all tables and scans, or on per-scan basis via a new Scan class API. Configuration in hbase-site.xml: hbase.client.scanner.async.prefetch, default false: property namehbase.client.scanner.async.prefetch/name valuetrue/value /property API - Scan#setAsyncPrefetch(boolean) Scan scan = new Scan(); scan.setCaching(1000); scan.getMaxResultSize(BIG_SIZE); scan.setAsyncPrefetch(true); ... ResultScanner scanner = table.getScanner(scan); IMPLEMENTATION NOTES Pipelined scan is implemented by a new ClientAsyncPrefetchScanner class, which is fully API-compatible with the synchronous ClientSimpleScanner. ClientAsyncPrefetchScanner is not instantiated in case of small (Scan#setSmall) and reversed (Scan#setReversed) scanners. The application is responsible for setting the prefetch size in a way that the prefetch time and the processing times are balanced. Note that due to double buffering, the client side cache can use twice as much memory as the synchronous scanner. Generally, this feature will put more load on the server (higher fetch rate -- which is the whole point). Also, YMMV. Hbase Streaming Scan Feature Key: HBASE-13071
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706496#comment-14706496 ] Eshcar Hillel commented on HBASE-13408: --- Patch is updated on review board. 0.98-inmem means setting the cluster with the code of branch 0.98 and running an in-memory column family. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBASE-13408-trunk-v02.patch InMemoryMemstoreCompactionScansEvaluationResults.pdf We attach a new patch which covers wal truncation. We also attach evaluation results for scans. The trend is very similar to the improvement we see for read operation. Following the approach suggested in HBASE-10713, we now divide flushed stores into two groups: one doing the traditional flush to disk, and the other group does in-memory flush into an inactive (read-only) memstore segment, which is subject to compaction. By default, an in-memory column family has compacted memstore which does in-memory flush, while all other column families have a default memstore which flush to disk. However, in some use cases, e.g. upon region split/merge/close, even in-memory columns flush their content to disk. Therefore, flush policy selects *two* sets of stores: one to flush to disk, and one to do in-memory flush. The first set invokes snapshot(), and the second set invokes flushInMemory() during the prepare phase. The main changes to support wal truncation are threefold: (1) upon in-memory compaction the wal is updated with a sequence number which is a lower approximation of the lowest-unflushed-sequence-id (2) When the number of log files exceed a certain threshold the store is forced to flush to disk even if it is an in-memory column. (3) upon flush to disk lowest-unflushed-sequence-id is cleared (like it used to be). Stores with in-memory segments, update this with a lower approximation of the lowest sequence id still in memory. Other stores update this sequence id with the first insert after the flush (like it used to be) While (1) should help in prolonging the time an item can stay in memory, (2) and (3) are there to ensure the wal size is maintainable and cannot explode. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650518#comment-14650518 ] Eshcar Hillel commented on HBASE-13408: --- Then how about we’ll make use of the FlushPolicy abstraction that is written so nicely and is easy to extend ;-). We can add to it a method selectStoresToCompact(), so that a flush process manages 2 sets to reduce memory usage (1) stores to flush (2) stores to be compacted. A store is in either of the two sets or in none, but not in both of them. The decision whether reducing the memory usage is done by a flush or a compaction depends on the store type and state. In addition, we’ll add a method to the MemStore interface doInmemoryCompaction(). In compacted memstore the implementation of this method would be to push the active set into the compaction pipeline and invoke a compaction. With this solution the semantics of prepare-to-flush remains the same. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647297#comment-14647297 ] Eshcar Hillel commented on HBASE-13408: --- Thank you [~Apache9] and [~anoop.hbase] for your comments. There is a question of when to push the active set into the pipeline, and which threshold to use. This should be some configurable parameter. But please let’s put this aside for a minute. The problem I meant to handle with the WAL truncation mechanism is orthogonal to this decision. Consider a region with one compacting store. Assume we add the following key-value-ts tuples to the memstore: (A,1,1) (A,4,4) (A,7,7) (B,2,2) (B,5,5) (B,8,8) (C,3,3) (C,6,6) (C,9,9) All these items will have edits in the WAL. After compaction what is left in-memory are (A,7,7) (B,8,8) (C,9,9) however these edits are not removed from the WAL since no flushing occurs. This can go on and on without ever flushing data to disk and without removing WAL edits. The solution we suggested earlier is to have a small map that would help determine that after the compaction in the example above we can remove all WAL entries that correspond to ts equal or lower than 6. And it happens not within the scope of a flush as compaction is a background process. If we don’t change the WAL truncation in this way WAL can grow without limit. Supporting a more compacted format in the compaction pipeline was discussed when we just started this JIRA. The design we suggested enables plugging-in any data structure: it can be the CellBlocks by [~anoop.hbase], it can be a b-tree, or any alternative that is suggested in HBASE-3993. It only needs to support the API defined by the CellSkipListSet wrapper class (in our patch we changed its name to CellSet to indicate the implementation is not restricted to a skip-list). Having said that, we would like to keep the initial solution simple. The plug-in infrastructure is in; experimenting with different data structures can be allocated a different task. Coming back to the timing of the in-memory flush, since this action mandates the same synchronization as in a flush to disk (to block the updaters while allocating a new active set) it seems appropriate to apply it upon a disk flush. Moreover, if we don’t change the flush semantics a compacting memstore can be forced to flush to disk when it reaches 16M (I can show an example) which would countervail the benefits of this feature. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Status: Patch Available (was: Open) HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13408-trunk-v01.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14681774#comment-14681774 ] Eshcar Hillel commented on HBASE-13408: --- We've submitted the patch that is based on trunk. This includes all the changes that were presented in 0.98 plus the comments from the code review and necessary changes to adapt the code to master branch. Also added a link to the review board. Next we plan to work on WAL truncation upon memory compaction based on the discussion in this Jira. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13408-trunk-v01.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBASE-13408-trunk-v01.patch HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBASE-13408-trunk-v01.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638791#comment-14638791 ] Eshcar Hillel commented on HBASE-13408: --- I did some learning of the flush-by-column-family feature (HBASE-10201). I think it will help us in supporting WAL truncation in compacting memstore. [~Apache9] I would appreciate if you can confirm that this should work. In the current implementation, when a region flushes a store, the previous sequence id that was associated with this store in the WAL oldestUnflushedStoreSequenceIds set is removed. The first put operation to occur after the flush installs a new sequence id for the store. The WAL uses this bookeeping when it needs to decide which WAL files can be archived (WAL truncation). For compacting memstore we would like to keep the sequence id in the oldestUnflushedStoreSequenceIds set of the WAL even after a flush is invoked. Instead, the memstore compaction thread will be responsible for setting an approximation of the correct sequence id for the store in the set. To this end, the compacting memstore maintains a mapping of timestamp to region sequence number (the same sequence numbers that are attached to WAL edits). Whenever a flush is invoked on a compacting memstore it adds the current time and current sequence number pair to this mapping. As an additional artifact of the memstore compaction the minimal timestamp that is still present in the memstore is computed. This timestamp is then used to identify the maximal sequence id in the timestamp-seqId mapping for which no entries are left in the memstore. Finally, it uses this approximated sequence number to update the oldestUnflushedStoreSequenceIds set. This way the WAL is being truncated with some delay with respect to the real sequence number, but the memory overhead if fairly small (only a small map of ts-seq is added to the memstore) when compared to a solution that adds a sequence number to each cell in the memstore and then uses it to find the *exact* oldest unflushed sequence id. What say you? HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626301#comment-14626301 ] Eshcar Hillel commented on HBASE-13408: --- Hi - we are back with an implementation of the basic feature (see link to the review board for the HBASE-13408-098 code), and some experimental results. We were able to show 30-65% performance gain for read accesses in high-churn workloads (comprising of 50% reads and 50% writes), and mainly to maintain predictable latency SLA (see performance evaluation document for full results). We’ve also adapted the design document to reflect the code, specifically renaming some classes and describing the changes we made in the region flushing policy. (see design document ver02). HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: InMemoryMemstoreCompactionEvaluationResults.pdf HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626309#comment-14626309 ] Eshcar Hillel commented on HBASE-13408: --- A comment and a request: we’ve yet to address the WAL truncating issue. The problem is twofold: (1) if the region comprises only an in-memory column (store) then flush may not occur for a long time resulting in a big log which in turn may significantly increase MTTR. This is bad. (2) if the in-memory column (store) is part of a region with default stores then flushes do occur, and the WAL truncates even entries it should not. Specifically it truncate entries of the in-memory store that are still present in the memstore, that is, not eliminated by compaction and not flushed to disk. This is a real threat to HBase durability guarantees. The same solution can help avoid both problems. Currently the WAL uses a region counter to mark the entries as well as to decide which entries are truncable. However, the memstore is unaware of these sequence numbers and therefore cannot indicate which WAL entries should not be truncated. We would like to come up with a mechanism that allows the memstore and WAL to share the minimal required information in order to ensure the data durability. We’d appreciate suggestions/insights. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632806#comment-14632806 ] Eshcar Hillel commented on HBASE-13408: --- Snapshot active set and the pipeline components are all memstore segments, it's an abstraction that allows to treat all these parts equally. The memstore compaction should work also with flush-by-column-family. However, even when flushing by column the WAL sequence id is defined per region (right?) so WAL truncation is not trivial. forceflushsize is not a new config, instead we take the average of flush size and the blocking flush size: flush-size forceflushsize blockingflushsize. When considering a flush-by-column-family mode, if the active segment is greater than flush size then flush is invoked and the active segment is pushed to the pipeline. If the active +pipeline segments are greater the forceflushsize then the flush is forced and snapshot is flushed to disk. All entries (active, pipeline, snapshot) are stored in a skip-list. The performance gain comes from accessing only memory and not the disk. The skip lists are not too large as multiple versions of the same key are removed within the compacted pipeline, but are not too small either, e.g., active is pushed to pipeline only when it gets to 128MB. When there is no duplication, i.e., a large set of active keys and no multiple versions per active key compaction is of no help, data is flushed to disk anyway but the compaction pipeline consumes memory and cpu. We don't see slow down in our experiments but in a setting where the memory/cpu resources are limited and contended for might show slow down. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634999#comment-14634999 ] Eshcar Hillel commented on HBASE-13408: --- Is your first concern ``how can an admin safely decide on the number of region per region server if the memory trace of a region may be bigger than flush size?'' First, this can happen also with default memstore implementation, and for this reason the blocking flush size is defined, and we make sure not to cross this upper limit even with compacted memstore implementation. Second, while less trivial, it is still possible to come up with a reasonable computation if you have an upper limit on the number of regions with compacted memstore at any point in time. Regarding you second question, the compaction pipeline is composed of memstore segments (1 or more). Each memstore segment has a cell set, currently this is the same data structure as in the active segment, namely a skip list. If found useful it is possible to change the format in which the cells are stored in the pipeline after compaction. [~anoop.hbase] I hope to answer your questions. HBase In-Memory Memstore Compaction --- Key: HBASE-13408 URL: https://issues.apache.org/jira/browse/HBASE-13408 Project: HBase Issue Type: New Feature Reporter: Eshcar Hillel Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. We suggest a new compacted memstore with the following principles: 1.The data is kept in memory for as long as possible 2.Memstore data is either compacted or in process of being compacted 3.Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. We suggest applying this optimization only to in-memory column families. A design document is attached. This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976200#comment-14976200 ] Eshcar Hillel commented on HBASE-13408: --- We were able to find the cause of the TestHRegion failure. There were some changes to the code implementing memstore scans that were applied to the default memstore scanner. In master this code resides in the DefaultMemStore.java file, while in our patch we extracted this code into a different file MutableCellSetSegmentScanner.java. This case demonstrates the fact that in such a major refactoring Jira tracking all relevant changes when rebasing is very hard to do, we are at a risk of such changes occurring each time we rebase. For a long time now there were no serious discussion regarding the contribution of this Jira and the fundamental ideas at the base of the code. What are the main reasons holding this Jira from being pushed into master? > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980597#comment-14980597 ] Eshcar Hillel commented on HBASE-13408: --- Our initial design did not include new flush API. Following the comment by [~Apache9] from July 30 and the discussion preceding it we introduced new APIs one for disk flush and one for in-memory flush. In hind-sight, we believe this comment was in place, and making in-memory flush a first-class-citizen is the right decision. Whether or not a new API is required, making an in-memory flush must involve decisions at a higher level, specifically using the region updatesLock while moving data around inside the memstore, as is the case with disk flush - and for good reason. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981101#comment-14981101 ] Eshcar Hillel commented on HBASE-13408: --- The updatesLock affects the performance of the entire region; holding it in exclusive mode should be reduced to the minimum possible time. Therefore, holding it exclusively is a decision to make at the region level. In our design we were very careful not to introduce any additional locks, and also not to introduce new code that acquires existing locks let alone in exclusive mode. We believe this design choice is imperative for keeping the overall performance of the system. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBASE-13408-trunk-v08.patch > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977783#comment-14977783 ] Eshcar Hillel commented on HBASE-13408: --- The attached patch fixes the tests failures and adds support for setting compacted memstore through HColumnDescriptor methods: String getMemStoreClassName() HColumnDescriptor setMemStoreClass(String className) > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978969#comment-14978969 ] Eshcar Hillel commented on HBASE-13408: --- TestWalAndCompactedMemstoreFlush has tests which initialize a region with mixed types of memstore. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: InMemoryMemstoreCompactionMasterEvaluationResults.pdf HBASE-13408-trunk-v07.patch Attaching a new patch after rebase and code review changes. One of the changes in the code is aligning the initialization of the memstore with the memstore class name configuration setting. To create a compacted memstore one needs to configure the hbase with hbase.regionserver.memstore.class=org.apache.hadoop.hbase.regionserver.CompactedMemStore In addition, we reproduced the results of the benchmarks for the master code (new and original) measured in different settings and workloads. Report is attached. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974383#comment-14974383 ] Eshcar Hillel commented on HBASE-13408: --- Following a comment in the Jira the compacted memstore configuration is now disconnected from the in-memory column family configuration setting; instead it can be set at the region server level by setting the memstore class name attribute. We are open to suggestions on how it would be best to set the memstore at each region, and specifically to add an additional column family attribute. In parallel, we should discuss the optimal way to push this branch into trunk after we've handled all major concerns that were raised so far. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBASE-13408-trunk-v10.patch > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006791#comment-15006791 ] Eshcar Hillel commented on HBASE-13408: --- Hi all :) :) :) Nothing else on our table. Any feedback? Patch is available on RB. Considering release audit is unrelated to our patch last QA is {code}+1 overall{code} Thanks :) > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBASE-13408-trunk-v09.patch We attach a new patch which includes the changes required by the recent discussion. Specifically, we removed (undo) some of the changes to the HRegion and FlushPolicy classes. We moved the code for triggering in memory flush into the compacting memstore implementation. We excluded two changes: (1) we did not remove the StoreSegmentScanner tier from the KeyValueScanner hierarchy as this would result in empty implementation (of the two methods we define here) in the other 5 concrete classes implementing the KeyValueScanner interface, which seems unnecessary. (2) we did not remove the snapshot - this needs to be discussed in a different Jira; there are pros and cons, and it shouldn’t be decided without thorough discussion. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1464#comment-1464 ] Eshcar Hillel commented on HBASE-13408: --- The patch is now available also on review board. I wasn't able to extract the audit warning. Can someone point out what the problem is? thanks > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, > HBASE-13408-trunk-v10.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990234#comment-14990234 ] Eshcar Hillel commented on HBASE-13408: --- Great comments and questions [~stack]. We will work on improving the document and code along the lines you suggested and the code review. Meanwhile here are some answers and clarifications: bq. The part that will be flushed is the 'compacted' part? Yes. And specifically, it would be the tail of the compaction pipeline which is comprised of segments list. bq. On name of the config., I think it should be IN_MEMORY_COMPACTION rather than COMPACTED We’ll change the name, however we feel it is better to have it off by default, at least until users/applications are fully aware of the implications of this feature. bq. Can the in-memory flush use same code as the flush-to-disk flush? Ditto on compaction? Flush - no, compaction - yes. In memory flush makes changes to in memory data structures, while disk flush writes to disk. When compacted memstore fully supports HFile format, can share the same compaction code. bq. what is the above (flushtotalsize) for? bq. can you be more clear on where the threshold for flush to disk is? Currently flush is called when memstore size reaches 128MB, however region can tolerate even larger memstore size before blocking the update operation. So there is lower bound for triggering a flush and an upper bound for triggering a flush while blocking update operations. With flush-total-size we attempt to further refine these boundaries, and have a soft lower bound instead of a hard bound. In the new solution region can tolerate memstore size larger than 128MB (but smaller than flush-total-size) before calling a flush to disk, knowing that the size is not necessarily monotonically increasing between flushes. We distinguish between the data that is in active segments (which are still bounded by 128MB) and overflow segments being compacted. The size of all data in memstore is bounded by flush-total-size, where flush-size < flush-total-size < flush-blocking size. bq. What is a snapshot in this scheme? we have to do a merge sort on flush to make the hfile? The snapshot is a single immutable segment that is *not* subject to compaction. There is no need to do a merge sort on flush to disk. bq. Do we hold the region lock while we compact the in-memory segments on a column family? Every time a compaction runs, it compacts all segments in the pipeline? No - the lock is held only while making the changes to the in-memory data structures: removing the tail segment from the compaction pipeline and crossing it to snapshot. Yes - currently a compacion compacts all segments in the pipeline. bq. I'm not sure I follow the approximation of oldest sequence id. This was explained in posts between july 23-july 30. Can explain this again if required. bq. Do you have a rig where you can try out your implementation apart from running it inside a regionserver? What do you mean by rig? If you mean benchmark environment then no. If you mean testing then these are included in the patch. bq. we talking about adding one more thread – a compacting thread – per Store? In the new design, the threads are run by the region server executor. bq. On MemstoreScanner, we are keeping the fact that the implementation is crossing Segments an internal implementation detail? Yes. bq. I suppose you'll deliver a skiplist version first and then move on to work on in-memory storefile, a more compact in-memory representation? This is a task that should definitely be completed; HBASE-10713 is a good starting point. bq. Seems like the whole notion of snapshot should not be exposed to the client. It is an implementation detail of the original memstore, the defaultmemstore, something that we should try not expose. Agree, however seems out of the scope of the current Jira which focuses on in-memory compaction. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, >
[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eshcar Hillel updated HBASE-13408: -- Attachment: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf Hi All, We compiled a new design document (attached) capturing all changes (we noticed there are many changes since the original design suggestion). In this new design document the behavior of the compacted memstore is confined mainly to the scope of the memstore, however some minimal changes are done at the scope of the region level, in order to give compacted memstore some slack to manage the in-memory flushes and in-memory compaction. Next we plan to prepare the patch; main changes with respect to current patch would be to remove most of the code changes at the region level, and allow compacted memstore have access to the region lock to apply in-memory flushes. > HBase In-Memory Memstore Compaction > --- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature >Reporter: Eshcar Hillel >Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, > HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, > HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, > HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, > HBASE-13408-trunk-v08.patch, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, > HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, > InMemoryMemstoreCompactionEvaluationResults.pdf, > InMemoryMemstoreCompactionMasterEvaluationResults.pdf, > InMemoryMemstoreCompactionScansEvaluationResults.pdf, > StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its > in-memory component. The memstore absorbs all updates to the store; from time > to time these updates are flushed to a file on disk, where they are > compacted. Unlike disk components, the memstore is not compacted until it is > written to the filesystem and optionally to block-cache. This may result in > underutilization of the memory due to duplicate entries per row, for example, > when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are > triggered, the data sinks to disk more frequently, slowing down retrieval of > data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data > in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1.The data is kept in memory for as long as possible > 2.Memstore data is either compacted or in process of being compacted > 3.Allow a panic mode, which may interrupt an in-progress compaction and > force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)