[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615464#comment-15615464
 ] 

ASF GitHub Bot commented on QUICKSTEP-46:
-

Github user pateljm commented on the issue:

https://github.com/apache/incubator-quickstep/pull/103
  
@tarunbansal I can close this. Can you rebase? Thanks! 


> Fault tolerance in bulk loading data
> 
>
> Key: QUICKSTEP-46
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Storage
>Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables 
> can't handle errors gracefully. Some examples are: A faulty row with fewer 
> number of columns than the original table, attribute mismatch or misalignment 
> etc. 
> Proposed solutions: 
> 1. Ignore the discarded row and move on to the next row, instead of 
> terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - 
> discard the tuple or supply a value for the missing column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-59) Improve performance of BitVector in Quickstep by eliminating branches

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615586#comment-15615586
 ] 

ASF GitHub Bot commented on QUICKSTEP-59:
-

Github user saketj closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/114


> Improve performance of BitVector in Quickstep by eliminating branches
> -
>
> Key: QUICKSTEP-59
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-59
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Query Execution, Utility
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: QUICKSTEP-59.01.patch
>
>
> The {{setBitRegularVersion()}} of {{BitVector.hpp}} is a critical function 
> that is called in various tight loop iterations over storage blocks 
> throughout the Quickstep code. This function has a simple purpose of setting 
> a bit value in a BitVector to true/false given a boolean argument. However, 
> it has an expensive if-else branch that can add a significant penalty at 
> runtime due to branch mis-predictions. 
> This short PR completely removes branching from the 
> {{setBitRegularVersion()}} by replacing the same functionality with a set of 
> bitwise arithmetic operations. Given that a branch mis-prediction costs about 
> 10 cycles, the branchless code is expected to save those precious 10 cycles 
> at the slight expense of 4 additional bitwise operations (an additional 2-4 
> cycles only, given hyper-threading).
> *Tests:*
> The existing unit tests should already cover the changes introduced by this 
> PR. Correctness also verified by comparing TPC-H query output results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-59) Improve performance of BitVector in Quickstep by eliminating branches

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615587#comment-15615587
 ] 

ASF GitHub Bot commented on QUICKSTEP-59:
-

Github user saketj commented on the issue:

https://github.com/apache/incubator-quickstep/pull/114
  
Merged in master


> Improve performance of BitVector in Quickstep by eliminating branches
> -
>
> Key: QUICKSTEP-59
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-59
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Query Execution, Utility
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: QUICKSTEP-59.01.patch
>
>
> The {{setBitRegularVersion()}} of {{BitVector.hpp}} is a critical function 
> that is called in various tight loop iterations over storage blocks 
> throughout the Quickstep code. This function has a simple purpose of setting 
> a bit value in a BitVector to true/false given a boolean argument. However, 
> it has an expensive if-else branch that can add a significant penalty at 
> runtime due to branch mis-predictions. 
> This short PR completely removes branching from the 
> {{setBitRegularVersion()}} by replacing the same functionality with a set of 
> bitwise arithmetic operations. Given that a branch mis-prediction costs about 
> 10 cycles, the branchless code is expected to save those precious 10 cycles 
> at the slight expense of 4 additional bitwise operations (an additional 2-4 
> cycles only, given hyper-threading).
> *Tests:*
> The existing unit tests should already cover the changes introduced by this 
> PR. Correctness also verified by comparing TPC-H query output results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615651#comment-15615651
 ] 

ASF GitHub Bot commented on QUICKSTEP-46:
-

Github user tarunbansal commented on the issue:

https://github.com/apache/incubator-quickstep/pull/103
  
@pateljm Rebase done.


> Fault tolerance in bulk loading data
> 
>
> Key: QUICKSTEP-46
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Storage
>Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables 
> can't handle errors gracefully. Some examples are: A faulty row with fewer 
> number of columns than the original table, attribute mismatch or misalignment 
> etc. 
> Proposed solutions: 
> 1. Ignore the discarded row and move on to the next row, instead of 
> terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - 
> discard the tuple or supply a value for the missing column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-60) Delay memory allocation for hash tables

2016-10-28 Thread Harshad Deshmukh (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15615711#comment-15615711
 ] 

Harshad Deshmukh commented on QUICKSTEP-60:
---

As I am working towards fixing this issue, I realize that my earlier 
interpretation was incorrect. Memory allocation for hash tables doesn't happen 
in ExecutionGenerator. In ExecutionGenerator (i.e. the final step in the plan 
generation), a placeholder for hash tables is created, with no real memory 
allocation. The actual memory is allocated during QueryContext construction, 
which is initiated in Foreman (the query execution co-ordinator). 

The issue is that the memory allocation for all the hash tables in a query plan 
happens at once, which causes the issue (described earlier). Therefore, we need 
to delay the hash table memory allocation for those operators, until it is 
absolutely needed. 

> Delay memory allocation for hash tables
> ---
>
> Key: QUICKSTEP-60
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-60
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Query Execution, Query Optimizer, Relational Operators
>Reporter: Harshad Deshmukh
>Assignee: Harshad Deshmukh
>  Labels: performance
>
> Currently, Quickstep allocates memory for the hash tables during the query 
> execution plan generation (specifically in ExecutionGenerator). The memory 
> allocation results into memory reservation in the buffer pool. In some cases 
> (e.g. TPC-H Q21), the estimated memory requirements for the hash tables in 
> the query plan is huge. This means during the query execution there is an 
> artificial shortage of memory. This degrades the performance, as it leads to 
> evictions. 
> To avoid this issue, the query scheduler should allocate memory for the hash 
> tables as and when it is needed, during the query execution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616240#comment-15616240
 ] 

ASF GitHub Bot commented on QUICKSTEP-46:
-

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-quickstep/pull/103


> Fault tolerance in bulk loading data
> 
>
> Key: QUICKSTEP-46
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Storage
>Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables 
> can't handle errors gracefully. Some examples are: A faulty row with fewer 
> number of columns than the original table, attribute mismatch or misalignment 
> etc. 
> Proposed solutions: 
> 1. Ignore the discarded row and move on to the next row, instead of 
> terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - 
> discard the tuple or supply a value for the missing column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (QUICKSTEP-46) Fault tolerance in bulk loading data

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/QUICKSTEP-46?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616242#comment-15616242
 ] 

ASF GitHub Bot commented on QUICKSTEP-46:
-

Github user pateljm commented on the issue:

https://github.com/apache/incubator-quickstep/pull/103
  
@tarunbansal Merged. Thanks for this contribution!


> Fault tolerance in bulk loading data
> 
>
> Key: QUICKSTEP-46
> URL: https://issues.apache.org/jira/browse/QUICKSTEP-46
> Project: Apache Quickstep
>  Issue Type: Improvement
>  Components: Storage
>Reporter: Harshad Deshmukh
>
> Background: The bulk load ("COPY FROM" command) of data into Quickstep tables 
> can't handle errors gracefully. Some examples are: A faulty row with fewer 
> number of columns than the original table, attribute mismatch or misalignment 
> etc. 
> Proposed solutions: 
> 1. Ignore the discarded row and move on to the next row, instead of 
> terminating the whole process. (Easiest to implement and most practical)
> 2. Let user choose an action as to what to do with the erroneous tuple - 
> discard the tuple or supply a value for the missing column.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)