[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211129911 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +58,15 @@ class

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-18 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21909 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211075385 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -223,7 +224,8 @@ object

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211075384 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1492,6 +1492,15 @@ object SQLConf { "This usually

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211045699 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -223,7 +224,8 @@ object

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211045061 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1492,6 +1492,15 @@ object SQLConf { "This usually

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210767018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2223,21 +2223,31 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210765672 --- Diff: docs/sql-programming-guide.md --- @@ -1894,6 +1894,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210704902 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2223,21 +2223,31 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210693829 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2223,21 +2223,31 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210666117 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1492,6 +1492,15 @@ object SQLConf { "This usually

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-15 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210350101 --- Diff: docs/sql-programming-guide.md --- @@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r209927964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r209916711 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2225,19 +2225,21 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r208022783 --- Diff: docs/sql-programming-guide.md --- @@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207955299 --- Diff: docs/sql-programming-guide.md --- @@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207850329 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2225,19 +2225,21 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207738315 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1476,6 +1476,14 @@ object SQLConf { "are performed

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207701331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1476,6 +1476,14 @@ object SQLConf { "are

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207032024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r207005019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206985104 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206983433 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-01 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206976717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-30 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206400571 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala --- @@ -119,8 +119,47 @@ object CSVBenchmarks {

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-30 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206059735 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-30 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206047653 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-30 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r206045407 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205978291 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2233,7 +2233,7 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205978149 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205978091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -203,19 +203,11 @@ class UnivocityParser(

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205977956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205974316 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2233,7 +2233,7 @@ class JsonSuite extends

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205974224 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -203,19 +203,11 @@ class UnivocityParser(

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205974275 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala --- @@ -56,9 +57,14 @@ class FailureSafeParser[IN](

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-29 Thread dmateusp
Github user dmateusp commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r205969639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-07-28 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21909 [SPARK-24959][SQL] Speed up count() for JSON and CSV ## What changes were proposed in this pull request? In the PR, I propose to skip invoking of the CSV/JSON parser per each line in the