[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-26 Thread dujunling
Github user dujunling commented on the issue: https://github.com/apache/spark/pull/22232 @maropu I have added a ut to check the inputMetrics --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-25 Thread dujunling
Github user dujunling commented on the issue: https://github.com/apache/spark/pull/22232 While metris suites are in core test , fileScanRdd should be in sql test, it is difficult to add tests to check the input metrics in sql module

[GitHub] spark pull request #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileS...

2018-08-25 Thread dujunling
Github user dujunling commented on a diff in the pull request: https://github.com/apache/spark/pull/22232#discussion_r212793909 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -208,7 +199,6 @@ class FileScanRDD

[GitHub] spark pull request #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileS...

2018-08-25 Thread dujunling
Github user dujunling commented on a diff in the pull request: https://github.com/apache/spark/pull/22232#discussion_r212793866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -208,7 +199,6 @@ class FileScanRDD

[GitHub] spark issue #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileSize bec...

2018-08-24 Thread dujunling
Github user dujunling commented on the issue: https://github.com/apache/spark/pull/22232 @wzhfy --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22232: [SPARK-25237][SQL]remove updateBytesReadWithFileS...

2018-08-24 Thread dujunling
GitHub user dujunling opened a pull request: https://github.com/apache/spark/pull/22232 [SPARK-25237][SQL]remove updateBytesReadWithFileSize because we use Hadoop FileSystem s… …tatistics to update the inputMetrics ## What changes were proposed in this pull request

[GitHub] spark pull request: [SPARK-11982] [SQL] improve performance of car...

2016-05-04 Thread dujunling
Github user dujunling commented on the pull request: https://github.com/apache/spark/pull/9969#issuecomment-217082562 After this patch, the query time of TPC-DS Q65 go down to 4 seconds from 28 minutes (420X faster). @davies ,How many data did you used? --- If your project is