[jira] [Updated] (DRILL-5125) Provide option to use generic code for sv remover

2017-04-06 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5125:

Labels:   (was: ready-to-commit)

> Provide option to use generic code for sv remover
> -
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Priority: Minor
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], rn=[$1])
> 00-02Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($2, 10)])
> 00-05  Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$1], dir0=[ASC])
> 00-08Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09  Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable and call to the (non-virtual) 
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced 
> by a “generic” class and virtual functions in the vector classes to choose 
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run 
> savings. Here are the last 8 runs (out of 10), with code cached turned off 
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated 
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained 
> on a version of the code that already had various optimizations described in 
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line 
> statements (an unwound loop) to be faster than a loop with 243 iterations. In 
> addition, the generic version uses an array in place of ~500 variables, and a 
> virtual function call rather than in-line, type-specific calls. One would 
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no 
> compile cost is payed for either code path (because the test excludes the 
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In 
> this instance, the array/loop/virtual function version is ~260 ms faster 
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and 
> compile step can be skipped, and this query will go faster. Plus, since the 
> change removes generated classes from the code cache, there is more room for 
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in 
> comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5125) Provide option to use generic code for sv remover

2017-03-21 Thread Padma Penumarthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Padma Penumarthy updated DRILL-5125:

Labels: ready-to-commit  (was: )

> Provide option to use generic code for sv remover
> -
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], rn=[$1])
> 00-02Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($2, 10)])
> 00-05  Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$1], dir0=[ASC])
> 00-08Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09  Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable and call to the (non-virtual) 
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced 
> by a “generic” class and virtual functions in the vector classes to choose 
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run 
> savings. Here are the last 8 runs (out of 10), with code cached turned off 
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated 
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained 
> on a version of the code that already had various optimizations described in 
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line 
> statements (an unwound loop) to be faster than a loop with 243 iterations. In 
> addition, the generic version uses an array in place of ~500 variables, and a 
> virtual function call rather than in-line, type-specific calls. One would 
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no 
> compile cost is payed for either code path (because the test excludes the 
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In 
> this instance, the array/loop/virtual function version is ~260 ms faster 
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and 
> compile step can be skipped, and this query will go faster. Plus, since the 
> change removes generated classes from the code cache, there is more room for 
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in 
> comments.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5125) Provide option to use generic code for sv remover

2017-01-03 Thread Zelaine Fong (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zelaine Fong updated DRILL-5125:

Reviewer: Padma Penumarthy

Assigned Reviewer to [~ppenumarthy]

> Provide option to use generic code for sv remover
> -
>
> Key: DRILL-5125
> URL: https://issues.apache.org/jira/browse/DRILL-5125
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Consider a non-typical Drill query: one with 6000 rows but 243 fields. 
> Consider this query:
> {code}
> select * from (select *, row_number() over(order by somedate) as rn from 
> dfs.`/some/path/data.json`) where rn=10
> {code}
> This produces a query with the following structure:
> {code}
> 00-00Screen
> 00-01  ProjectAllowDup(*=[$0], rn=[$1])
> 00-02Project(T0¦¦*=[$0], w0$o0=[$2])
> 00-03  SelectionVectorRemover
> 00-04Filter(condition=[=($2, 10)])
> 00-05  Window(window#0=[window(partition {} order by [1] rows 
> between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
> 00-06SelectionVectorRemover
> 00-07  Sort(sort0=[$1], dir0=[ASC])
> 00-08Project(T0¦¦*=[$0], validitydate=[$1])
> 00-09  Scan(groupscan=...)
> {code}
> Instrumenting, the code to measure compile time, two “long poles” stood out:
> {code}
> Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
> Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
> {code}
> Much of the initial run time of 5578 ms is taken up in compiling two classes 
> (2159 ms).
> The classes themselves are very simple: create member variables for 486 
> vectors (2 x column count), and call a method on each to do the copy. The 
> only type-specific work is the member variable and call to the (non-virtual) 
> CopyFrom or CopyFromSafe methods. The generated class can easily be replaced 
> by a “generic” class and virtual functions in the vector classes to choose 
> the correct copy method.
> Clearly, avoiding code gen means avoiding the compile times with a first-run 
> savings. Here are the last 8 runs (out of 10), with code cached turned off 
> (forcing a compile on each query run), with and without the generic versions:
> * Original (no code cache): 1832 ms / run
> * Generic (no code cache): 1317 ms / run
> This demonstrates the expected outcome: avoiding compilation of generated 
> code saves ~500 ms per run (or 28%). (Note: the numbers above were obtained 
> on a version of the code that already had various optimizations described in 
> other JIRA entries.)
> The reason, for generating code is that one would expect that 243 in-line 
> statements (an unwound loop) to be faster than a loop with 243 iterations. In 
> addition, the generic version uses an array in place of ~500 variables, and a 
> virtual function call rather than in-line, type-specific calls. One would 
> expect the unrolled loop to be faster.
> Repeat the exercise, this time with the code cache turned on so that no 
> compile cost is payed for either code path (because the test excludes the 
> first two runs in which the generated code is compiled.)
> * Original: 1302 ms / run
> * Generic version: 1040 ms / run
> Contrary to expectations, the loop is faster than the in-line statements. In 
> this instance, the array/loop/virtual function version is ~260 ms faster 
> (20%).
> The test shows that the code can be simplified, a costly costly code-gen and 
> compile step can be skipped, and this query will go faster. Plus, since the 
> change removes generated classes from the code cache, there is more room for 
> the remaining classes, which may improve the hit rate.
> This ticket offers the performance improvement as an option, described in 
> comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-5125) Provide option to use generic code for sv remover

2016-12-12 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5125:
---
Description: 
Consider a non-typical Drill query: one with 6000 rows but 243 fields. Consider 
this query:

{code}
select * from (select *, row_number() over(order by somedate) as rn from 
dfs.`/some/path/data.json`) where rn=10
{code}

This produces a query with the following structure:

{code}
00-00Screen
00-01  ProjectAllowDup(*=[$0], rn=[$1])
00-02Project(T0¦¦*=[$0], w0$o0=[$2])
00-03  SelectionVectorRemover
00-04Filter(condition=[=($2, 10)])
00-05  Window(window#0=[window(partition {} order by [1] rows 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$1], dir0=[ASC])
00-08Project(T0¦¦*=[$0], validitydate=[$1])
00-09  Scan(groupscan=...)
{code}

Instrumenting, the code to measure compile time, two “long poles” stood out:

{code}
Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
{code}

Much of the initial run time of 5578 ms is taken up in compiling two classes 
(2159 ms).

The classes themselves are very simple: create member variables for 486 vectors 
(2 x column count), and call a method on each to do the copy. The only 
type-specific work is the member variable and call to the (non-virtual) 
CopyFrom or CopyFromSafe methods. The generated class can easily be replaced by 
a “generic” class and virtual functions in the vector classes to choose the 
correct copy method.

Clearly, avoiding code gen means avoiding the compile times with a first-run 
savings. Here are the last 8 runs (out of 10), with code cached turned off 
(forcing a compile on each query run), with and without the generic versions:

* Original (no code cache): 1832 ms / run
* Generic (no code cache): 1317 ms / run

This demonstrates the expected outcome: avoiding compilation of generated code 
saves ~500 ms per run (or 28%). (Note: the numbers above were obtained on a 
version of the code that already had various optimizations described in other 
JIRA entries.)

The reason, for generating code is that one would expect that 243 in-line 
statements (an unwound loop) to be faster than a loop with 243 iterations. In 
addition, the generic version uses an array in place of ~500 variables, and a 
virtual function call rather than in-line, type-specific calls. One would 
expect the unrolled loop to be faster.

Repeat the exercise, this time with the code cache turned on so that no compile 
cost is payed for either code path (because the test excludes the first two 
runs in which the generated code is compiled.)

* Original: 1302 ms / run
* Generic version: 1040 ms / run

Contrary to expectations, the loop is faster than the in-line statements. In 
this instance, the array/loop/virtual function version is ~260 ms faster (20%).

The test shows that the code can be simplified, a costly costly code-gen and 
compile step can be skipped, and this query will go faster. Plus, since the 
change removes generated classes from the code cache, there is more room for 
the remaining classes, which may improve the hit rate.

This ticket offers the performance improvement as an option, described in 
comments.

  was:
Consider a non-typical Drill query: one with 6000 rows but 243 fields. Consider 
this query:

{code}
select * from (select *, row_number() over(order by somedate) as rn from 
dfs.`/some/path/data.json`) where rn=10
{code}

This produces a query with the following structure:

{code}
TBD
{code}

Instrumenting, the code to measure compile time, two “long poles” stood out:

{code}
Compile Time for org.apache.drill.exec.test.generated.CopierGen3: 500
Compile Time for org.apache.drill.exec.test.generated.CopierGen8: 1659
{code}

Much of the initial run time of 5578 ms is taken up in compiling two classes 
(2159 ms).

The classes themselves are very simple: create member variables for 486 vectors 
(2 x column count), and call a method on each to do the copy. The only 
type-specific work is the member variable and call to the (non-virtual) 
CopyFrom or CopyFromSafe methods. The generated class can easily be replaced by 
a “generic” class and virtual functions in the vector classes to choose the 
correct copy method.

Clearly, avoiding code gen means avoiding the compile times with a first-run 
savings. Here are the last 8 runs (out of 10), with code cached turned off 
(forcing a compile on each query run), with and without the generic versions:

* Original (no code cache): 1832 ms / run
* Generic (no code cache): 1317 ms / run

This demonstrates the expected outcome: avoiding compilation of generated code 
saves ~500 ms per run (or 28%). (Note: the numbers above were obtained on a