[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data
[ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615464#comment-15615464 ] ASF GitHub Bot commented on QUICKSTEP-46: - Github user pateljm commented on the issue: https://github.com/apache/incubator-quickstep/pull/103 @tarunbansal I can close this. Can you rebase? Thanks! > Fault tolerance in bulk loading data > > > Key: QUICKSTEP-46 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-46 > Project: Apache Quickstep > Issue Type: Improvement > Components: Storage >Reporter: Harshad Deshmukh > > Background: The bulk load ("COPY FROM" command) of data into Quickstep tables > can't handle errors gracefully. Some examples are: A faulty row with fewer > number of columns than the original table, attribute mismatch or misalignment > etc. > Proposed solutions: > 1. Ignore the discarded row and move on to the next row, instead of > terminating the whole process. (Easiest to implement and most practical) > 2. Let user choose an action as to what to do with the erroneous tuple - > discard the tuple or supply a value for the missing column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-59) Improve performance of BitVector in Quickstep by eliminating branches
[ https://issues.apache.org/jira/browse/QUICKSTEP-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615586#comment-15615586 ] ASF GitHub Bot commented on QUICKSTEP-59: - Github user saketj closed the pull request at: https://github.com/apache/incubator-quickstep/pull/114 > Improve performance of BitVector in Quickstep by eliminating branches > - > > Key: QUICKSTEP-59 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-59 > Project: Apache Quickstep > Issue Type: Improvement > Components: Query Execution, Utility >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: QUICKSTEP-59.01.patch > > > The {{setBitRegularVersion()}} of {{BitVector.hpp}} is a critical function > that is called in various tight loop iterations over storage blocks > throughout the Quickstep code. This function has a simple purpose of setting > a bit value in a BitVector to true/false given a boolean argument. However, > it has an expensive if-else branch that can add a significant penalty at > runtime due to branch mis-predictions. > This short PR completely removes branching from the > {{setBitRegularVersion()}} by replacing the same functionality with a set of > bitwise arithmetic operations. Given that a branch mis-prediction costs about > 10 cycles, the branchless code is expected to save those precious 10 cycles > at the slight expense of 4 additional bitwise operations (an additional 2-4 > cycles only, given hyper-threading). > *Tests:* > The existing unit tests should already cover the changes introduced by this > PR. Correctness also verified by comparing TPC-H query output results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-59) Improve performance of BitVector in Quickstep by eliminating branches
[ https://issues.apache.org/jira/browse/QUICKSTEP-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615587#comment-15615587 ] ASF GitHub Bot commented on QUICKSTEP-59: - Github user saketj commented on the issue: https://github.com/apache/incubator-quickstep/pull/114 Merged in master > Improve performance of BitVector in Quickstep by eliminating branches > - > > Key: QUICKSTEP-59 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-59 > Project: Apache Quickstep > Issue Type: Improvement > Components: Query Execution, Utility >Reporter: Saket Saurabh >Assignee: Saket Saurabh > Attachments: QUICKSTEP-59.01.patch > > > The {{setBitRegularVersion()}} of {{BitVector.hpp}} is a critical function > that is called in various tight loop iterations over storage blocks > throughout the Quickstep code. This function has a simple purpose of setting > a bit value in a BitVector to true/false given a boolean argument. However, > it has an expensive if-else branch that can add a significant penalty at > runtime due to branch mis-predictions. > This short PR completely removes branching from the > {{setBitRegularVersion()}} by replacing the same functionality with a set of > bitwise arithmetic operations. Given that a branch mis-prediction costs about > 10 cycles, the branchless code is expected to save those precious 10 cycles > at the slight expense of 4 additional bitwise operations (an additional 2-4 > cycles only, given hyper-threading). > *Tests:* > The existing unit tests should already cover the changes introduced by this > PR. Correctness also verified by comparing TPC-H query output results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data
[ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615651#comment-15615651 ] ASF GitHub Bot commented on QUICKSTEP-46: - Github user tarunbansal commented on the issue: https://github.com/apache/incubator-quickstep/pull/103 @pateljm Rebase done. > Fault tolerance in bulk loading data > > > Key: QUICKSTEP-46 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-46 > Project: Apache Quickstep > Issue Type: Improvement > Components: Storage >Reporter: Harshad Deshmukh > > Background: The bulk load ("COPY FROM" command) of data into Quickstep tables > can't handle errors gracefully. Some examples are: A faulty row with fewer > number of columns than the original table, attribute mismatch or misalignment > etc. > Proposed solutions: > 1. Ignore the discarded row and move on to the next row, instead of > terminating the whole process. (Easiest to implement and most practical) > 2. Let user choose an action as to what to do with the erroneous tuple - > discard the tuple or supply a value for the missing column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-60) Delay memory allocation for hash tables
[ https://issues.apache.org/jira/browse/QUICKSTEP-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615711#comment-15615711 ] Harshad Deshmukh commented on QUICKSTEP-60: --- As I am working towards fixing this issue, I realize that my earlier interpretation was incorrect. Memory allocation for hash tables doesn't happen in ExecutionGenerator. In ExecutionGenerator (i.e. the final step in the plan generation), a placeholder for hash tables is created, with no real memory allocation. The actual memory is allocated during QueryContext construction, which is initiated in Foreman (the query execution co-ordinator). The issue is that the memory allocation for all the hash tables in a query plan happens at once, which causes the issue (described earlier). Therefore, we need to delay the hash table memory allocation for those operators, until it is absolutely needed. > Delay memory allocation for hash tables > --- > > Key: QUICKSTEP-60 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-60 > Project: Apache Quickstep > Issue Type: Improvement > Components: Query Execution, Query Optimizer, Relational Operators >Reporter: Harshad Deshmukh >Assignee: Harshad Deshmukh > Labels: performance > > Currently, Quickstep allocates memory for the hash tables during the query > execution plan generation (specifically in ExecutionGenerator). The memory > allocation results into memory reservation in the buffer pool. In some cases > (e.g. TPC-H Q21), the estimated memory requirements for the hash tables in > the query plan is huge. This means during the query execution there is an > artificial shortage of memory. This degrades the performance, as it leads to > evictions. > To avoid this issue, the query scheduler should allocate memory for the hash > tables as and when it is needed, during the query execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data
[ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616240#comment-15616240 ] ASF GitHub Bot commented on QUICKSTEP-46: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-quickstep/pull/103 > Fault tolerance in bulk loading data > > > Key: QUICKSTEP-46 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-46 > Project: Apache Quickstep > Issue Type: Improvement > Components: Storage >Reporter: Harshad Deshmukh > > Background: The bulk load ("COPY FROM" command) of data into Quickstep tables > can't handle errors gracefully. Some examples are: A faulty row with fewer > number of columns than the original table, attribute mismatch or misalignment > etc. > Proposed solutions: > 1. Ignore the discarded row and move on to the next row, instead of > terminating the whole process. (Easiest to implement and most practical) > 2. Let user choose an action as to what to do with the erroneous tuple - > discard the tuple or supply a value for the missing column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data
[ https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616242#comment-15616242 ] ASF GitHub Bot commented on QUICKSTEP-46: - Github user pateljm commented on the issue: https://github.com/apache/incubator-quickstep/pull/103 @tarunbansal Merged. Thanks for this contribution! > Fault tolerance in bulk loading data > > > Key: QUICKSTEP-46 > URL: https://issues.apache.org/jira/browse/QUICKSTEP-46 > Project: Apache Quickstep > Issue Type: Improvement > Components: Storage >Reporter: Harshad Deshmukh > > Background: The bulk load ("COPY FROM" command) of data into Quickstep tables > can't handle errors gracefully. Some examples are: A faulty row with fewer > number of columns than the original table, attribute mismatch or misalignment > etc. > Proposed solutions: > 1. Ignore the discarded row and move on to the next row, instead of > terminating the whole process. (Easiest to implement and most practical) > 2. Let user choose an action as to what to do with the erroneous tuple - > discard the tuple or supply a value for the missing column. -- This message was sent by Atlassian JIRA (v6.3.4#6332)