[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...

2018-07-13 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21762 [SPARK-24800][SQL] Refactor Avro Serializer and Deserializer ## What changes were proposed in this pull request? Currently the Avro Deserializer converts input Avro format data to `Row

[GitHub] spark pull request #21761: [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2

2018-07-13 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21761 [SPARK-24771][BUILD]Upgrade Apache AVRO to 1.8.2 ## What changes were proposed in this pull request? Upgrade Apache Avro from 1.7.7 to 1.8.2. The major new features: 1. More

[GitHub] spark pull request #21760: [SPARK-24776][SQL]Avro unit test: use SQLTestUtil...

2018-07-13 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21760 [SPARK-24776][SQL]Avro unit test: use SQLTestUtils and replace deprecated methods ## What changes were proposed in this pull request? Improve Avro unit test: 1. use QueryTest

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-07-12 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21439 I guess it is still controversial to have this new behavior with the new option. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21742: [SPARK-24768][SQL] Have a built-in AVRO data source impl...

2018-07-12 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21742 ping @marmbrus @tdas Should we create API `.avro` in Spark's `DataFrameReader` and `DataFrameWriter`? If so, should we should expose `.kafka` in DataFrameReader/DataFrameWriter as

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r201697107 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -387,6 +389,82 @@ class

[GitHub] spark pull request #21741: [SPARK-24718][SQL] Timestamp support pushdown to ...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21741#discussion_r201696558 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -387,6 +389,82 @@ class

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r201678881 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RuntimeConfigSuite.scala --- @@ -54,4 +54,15 @@ class RuntimeConfigSuite extends SparkFunSuite

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r201677755 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala --- @@ -132,6 +132,14 @@ class RuntimeConfig private[sql](sqlConf: SQLConf

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201639133 --- Diff: external/avro/src/test/resources/benchmarkSchema.avsc --- @@ -0,0 +1,35 @@ +{ --- End diff -- Nice catch

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201602440 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201596017 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201595614 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201573634 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201571691 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201564691 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201564660 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #20933: [SPARK-23817][SQL]Migrate ORC file format read path to d...

2018-07-10 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20933 Status update: we are working on new proposal for changing the Data source API, to resolve the problems exposed in this PR. Before the new proposal is adopted or denied, this PR remains

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21742 [SPARK-24768][SQL] Have a built-in AVRO data source implementation ## What changes were proposed in this pull request? Apache Avro (https://avro.apache.org) is a popular data

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-07 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-06 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-06 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-06 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r200593158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,11 @@ trait FileFormat

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-06 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 @cloud-fan I can understand you concern. But I can't find better entries. The entry in `FileFormatWriter` is the only one entry for every write action, otherwise we have to add the che

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-05 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r200561229 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -148,6 +144,23 @@ class JsonFileFormat

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-05 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r200256945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,6 +306,7 @@ case class FileSourceScanExec

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-05 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r200255604 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,11 @@ trait FileFormat

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-03 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r19766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,12 @@ trait FileFormat

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...

2018-07-03 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199865933 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala --- @@ -148,6 +144,28 @@ class JsonFileFormat

[GitHub] spark issue #21667: [SPARK-24691][SQL]Dispatch the type support check in Fil...

2018-07-03 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 @HyukjinKwon @maropu I have updated the code. It is now using whitelist. @cloud-fan Thanks for the review and +1

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-03 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199705148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -42,63 +38,27 @@ object DataSourceUtils

[GitHub] spark issue #21667: [SPARK-24691][SQL]Add new API `supportDataType` in FileF...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 Sure, I am actually OK if we can have a different approach other than API. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199529548 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -42,63 +38,27 @@ object DataSourceUtils

[GitHub] spark issue #21667: [SPARK-24691][SQL]Add new API `supportDataType` in FileF...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 I agree that making it an API is a bit over. But current there are problems(bug) as I listed in PR description. Maybe we can create another separate Trait

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199524768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -42,63 +38,27 @@ object DataSourceUtils

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199519380 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,16 @@ trait FileFormat

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199494578 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,16 @@ trait FileFormat

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199426645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,14 @@ private[parquet

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

2018-07-02 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r199426772 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,14 @@ private[parquet

[GitHub] spark issue #21667: [SPARK-24691][SQL]Add new API `supportDataType` in FileF...

2018-06-30 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-06-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199081671 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -156,28 +156,6 @@ class HiveOrcSourceSuite extends

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-06-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199079744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,16 @@ trait FileFormat

[GitHub] spark issue #21667: [SPARK-24691][SQL]Add new API `supportDataType` in FileF...

2018-06-29 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21667 @maropu @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-06-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199070174 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -156,28 +156,6 @@ class HiveOrcSourceSuite extends

[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...

2018-06-28 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21667 [SPARK-24691][SQL]Add new API `supportDataType` in FileFormat ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/21389, data source schema

[GitHub] spark pull request #21655: [SPARK-24675][SQL]Rename table: validate existenc...

2018-06-28 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21655 [SPARK-24675][SQL]Rename table: validate existence of new location ## What changes were proposed in this pull request? If table is renamed to a existing new location, data won't sh

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198351604 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198342819 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198341944 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198341922 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198341823 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198341248 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -202,4 +204,230 @@ class FileBasedDataSourceSuite

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-26 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r198340297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to

[GitHub] spark issue #21600: {Spark-24553}{WEB-UI} http 302 fixes for href redirect

2018-06-20 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21600 Jenkins, this is ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21600: {Spark-24553}{WEB-UI} http 302 fixes for href redirect

2018-06-20 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21600 LGTM. I think we can do even better, to create some function reformat the URLs in HTML. --- - To unsubscribe, e-mail

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r196658307 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions

[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-19 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r196657947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions

[GitHub] spark issue #21510: [SPARK-24490][WebUI] Use WebUI.addStaticHandler in web U...

2018-06-12 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21510 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

2018-06-12 Thread gengliangwang
Github user gengliangwang closed the pull request at: https://github.com/apache/spark/pull/21532 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-12 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21532 Find possible issue, close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21532 @vanzin @felixcheung @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21532#discussion_r194558798 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -159,19 +159,29 @@ class SQLAppStatusListener

[GitHub] spark issue #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduce memor...

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21532 This PR is inspired with #21438 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21532: [SPARK-24524][SQL]Improve aggregateMetrics: reduc...

2018-06-11 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21532 [SPARK-24524][SQL]Improve aggregateMetrics: reduce memory usage and number of loops ## What changes were proposed in this pull request? The function `aggregateMetrics` process

[GitHub] spark pull request #21438: [SPARK-24398] [SQL] Improve SQLAppStatusListener....

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21438#discussion_r19457 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala --- @@ -159,7 +159,7 @@ class SQLAppStatusListener

[GitHub] spark issue #21512: [minor][WEB UI] Spark web ui auto refresh every x second...

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21512 Browsers are able to do the auto refreshing work. I think we should leave it for browsers, so that we can keep the UI code simple

[GitHub] spark issue #20260: [SPARK-23039][SQL] Finish TODO work in alter table set l...

2018-06-11 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/20260 The `TODO` in comment should be removed, as I explained. We should close this PR. --- - To unsubscribe, e-mail

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r192502051 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -423,7 +423,9 @@ class

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r192501912 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -523,6 +523,11 @@ case class

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r192488365 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -101,6 +102,13 @@ class JacksonParser

[GitHub] spark pull request #21439: [SPARK-24391][SQL] Support arrays of any types by...

2018-06-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21439#discussion_r192465130 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -548,7 +553,9 @@ case class

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-05-31 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21439 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-31 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21381 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191585661 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object

[GitHub] spark pull request #21442: [SPARK-24402] [SQL] Optimize `In` expression when...

2018-05-29 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21442#discussion_r191585050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,10 +219,15 @@ object

[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...

2018-05-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21409#discussion_r191302688 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala --- @@ -0,0 +1,145

[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Data Source write benchmar...

2018-05-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21409#discussion_r191302141 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceWriteBenchmark.scala --- @@ -0,0 +1,145

[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-23 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21411 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-23 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21411 @rdblue @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL ...

2018-05-23 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21411 [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY ## What changes were proposed in this pull request? In current parquet version,the

[GitHub] spark pull request #21409: [SPARK-24365][SQL] Add Parquet write benchmark

2018-05-23 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21409 [SPARK-24365][SQL] Add Parquet write benchmark ## What changes were proposed in this pull request? Add Parquet write benchmark. So that it would be easier to measure the writer

[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21389 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21381 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask and Use `whi...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21381 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r189872259 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala --- @@ -48,4 +49,33 @@ object JsonUtils

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21380#discussion_r189829268 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -300,14 +302,11 @@ private[csv] object

[GitHub] spark pull request #21380: [SPARK-24329][SQL] Remove comments filtering befo...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21380#discussion_r189827025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -196,7 +198,7 @@ class UnivocityParser

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-22 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21381 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21381: [SPARK-24330][SQL]Refactor ExecuteWriteTask in FileForma...

2018-05-21 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21381 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21381: refactor ExecuteWriteTask

2018-05-21 Thread gengliangwang
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/21381 refactor ExecuteWriteTask ## What changes were proposed in this pull request? As I am working on File data source V2 write path [in my repo ](https://github.com/gengliangwang/spark/blob

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21329 @JoshRosen Thanks for the explaination. I can understand your concerns. My main point is the job/task ID here https://github.com/apache/spark/pull/21329/files#diff

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21329 @rxin When I was implementing writer with Data Source V2, I find the code in `HadoopMapReduceCommitProtocol` quite misleading. The code here is just setting configuration

[GitHub] spark pull request #21299: [SPARK-24250][SQL] support accessing SQLConf insi...

2018-05-18 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21299#discussion_r189204217 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala --- @@ -90,13 +92,42 @@ object SQLExecution { * thread

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-17 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21329 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188841673 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -206,24 +280,33 @@ object

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188841632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -202,28 +263,33 @@ object

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-16 Thread gengliangwang
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21329 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188545707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -118,16 +122,62 @@ object CSVDataSource

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

2018-05-16 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r188523460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -202,28 +263,33 @@ object

<    1   2   3   4   5   6   7   8   9   >