[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-04-17 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 @parthchandra , @vrozov I have done the following modifications: - Renamed newly added files with the prefix "VL" with "VarLen" as suggested by @parthchandra - After talking

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-03-30 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 Parth, - I have attached, within the DRILL-5846, two profiles with latest Apache code and this PR request (bounds checks are off): o Used one thread in each run o I observe ~3x

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-03-30 Thread parthchandra
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/1060 I feel putting this PR in without finalizing DRILL-6301 is putting the cart before the horse. (BTW, it would help the discussion if the benchmarks were published !). My observation based on

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-03-29 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 @parthchandra and @vrozov can you please let me know whether you are ok with the changes. Thanks! ---

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-03-29 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 I have updated this pull request with the following changes: - Excluded the implicit column optimizations from this pull request (will be included as part of another Drill Jira) - Tuned a

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-01-12 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 @paul-rogers with regard to the design aspects that you brought up: ** Corrections about the proposed Design ** Your analysis somehow assumes the Vector is the one driving the loading

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-01-09 Thread sachouche
Github user sachouche commented on the issue: https://github.com/apache/drill/pull/1060 Before I reply to the provided comments I want first to thank both Parth and Paul for taking time to review this Pull Request. @parthchandra Regarding the High Level Comments - FS

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2017-12-23 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1060 Always a good idea to suggest an alternative in addition to identifying challenges. I wonder if the code can resolve the questions raised by taking a someone different approach: 1.

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2017-12-23 Thread paul-rogers
Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1060 This PR is a tough one. We generally like to avoid design discussions in PRs; but I'm going to violate that rule. This PR is based on the premise that each vector has all the information