Yuming Wang created SPARK-24706: ----------------------------------- Summary: Support ByteType and ShortType pushdown to parquet Key: SPARK-24706 URL: https://issues.apache.org/jira/browse/SPARK-24706 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: Yuming Wang
Benchmark result: {noformat} ###############################[ Pushdown benchmark for tinyint ]################################ Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6 Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz Select 1 tinyint row (value = CAST(63 AS tinyint)): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 4307 / 4575 3.7 273.8 1.0X Parquet Vectorized (Pushdown) 227 / 241 69.4 14.4 19.0X Native ORC Vectorized 3646 / 3727 4.3 231.8 1.2X Native ORC Vectorized (Pushdown) 736 / 744 21.4 46.8 5.9X Select 10% tinyint rows (value < 12): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 5209 / 5843 3.0 331.2 1.0X Parquet Vectorized (Pushdown) 1296 / 1759 12.1 82.4 4.0X Native ORC Vectorized 4455 / 4594 3.5 283.2 1.2X Native ORC Vectorized (Pushdown) 1736 / 1813 9.1 110.4 3.0X Select 50% tinyint rows (value < 63): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 8362 / 8394 1.9 531.7 1.0X Parquet Vectorized (Pushdown) 6303 / 6530 2.5 400.7 1.3X Native ORC Vectorized 7962 / 8113 2.0 506.2 1.1X Native ORC Vectorized (Pushdown) 6680 / 7556 2.4 424.7 1.3X Select 90% tinyint rows (value < 114): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 11572 / 11715 1.4 735.7 1.0X Parquet Vectorized (Pushdown) 11198 / 11326 1.4 712.0 1.0X Native ORC Vectorized 11041 / 11209 1.4 702.0 1.0X Native ORC Vectorized (Pushdown) 11104 / 11472 1.4 706.0 1.0X {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org