[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bridget Bevens updated DRILL-6071: -- Labels: ready-to-commit (was: doc-impacting ready-to-commit) > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6071: - Labels: doc-impacting ready-to-commit (was: ready-to-commit) > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6071: - Reviewer: Paul Rogers > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pritesh Maker updated DRILL-6071: - Issue Type: Improvement (was: Bug) > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-6071: Labels: ready-to-commit (was: ) > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Labels: ready-to-commit > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-6071: Description: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit for a batch, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. Row count in output batch should be decided based on memory (with min 1 and max 64k rows) and not hard coded (to 4K) in code. Memory for output batch should be configurable system option. was: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit for a batch, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. Row count in output batch should be decided based on memory (with min 1 and max 64k rows) and not hard coded (to 4K) in code. Output batch size should be configurable system option. > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Memory for output batch > should be configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-6071: Description: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit for a batch, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. Row count in output batch should be decided based on memory (with min 1 and max 64k rows) and not hard coded (to 4K) in code. Output batch size should be configurable system option. was: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit for a batch, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. > Row count in output batch should be decided based on memory (with min 1 and > max 64k rows) and not hard coded (to 4K) in code. Output batch size should be > configurable system option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6071) Limit batch size for flatten operator
[ https://issues.apache.org/jira/browse/DRILL-6071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Padma Penumarthy updated DRILL-6071: Description: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit for a batch, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. was: flatten currently uses an adaptive algorithm to control the outgoing batch size. While processing the input batch, it adjusts the number of records in outgoing batch based on memory usage so far. Once memory usage exceeds the configured limit, the algorithm becomes more proactive and adjusts the limit half way through and end of every batch. All this periodic checking of memory usage is unnecessary overhead and impacts performance. Also, we will know only after the fact. Instead, figure out how many rows should be there in the outgoing batch from incoming batch. The way to do that would be to figure out average row size of the outgoing batch and based on that figure out how many rows can be there for a given amount of memory. value vectors provide us the necessary information to be able to figure this out. > Limit batch size for flatten operator > - > > Key: DRILL-6071 > URL: https://issues.apache.org/jira/browse/DRILL-6071 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.12.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.13.0 > > > flatten currently uses an adaptive algorithm to control the outgoing batch > size. > While processing the input batch, it adjusts the number of records in > outgoing batch based on memory usage so far. Once memory usage exceeds the > configured limit for a batch, the algorithm becomes more proactive and > adjusts the limit half way through and end of every batch. All this periodic > checking of memory usage is unnecessary overhead and impacts performance. > Also, we will know only after the fact. > Instead, figure out how many rows should be there in the outgoing batch from > incoming batch. > The way to do that would be to figure out average row size of the outgoing > batch and based on that figure out how many rows can be there for a given > amount of memory. value vectors provide us the necessary information to be > able to figure this out. -- This message was sent by Atlassian JIRA (v6.4.14#64029)