Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r211129911
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +58,15 @@ class
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/21909
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r211075385
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -223,7 +224,8 @@ object
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r211075384
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1492,6 +1492,15 @@ object SQLConf {
"This usually
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r211045699
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala
---
@@ -223,7 +224,8 @@ object
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r211045061
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1492,6 +1492,15 @@ object SQLConf {
"This usually
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210767018
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2223,21 +2223,31 @@ class JsonSuite extends
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210765672
--- Diff: docs/sql-programming-guide.md ---
@@ -1894,6 +1894,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- In
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210704902
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2223,21 +2223,31 @@ class JsonSuite extends
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210693829
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2223,21 +2223,31 @@ class JsonSuite extends
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210666117
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1492,6 +1492,15 @@ object SQLConf {
"This usually
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r210350101
--- Diff: docs/sql-programming-guide.md ---
@@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- In
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r209927964
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r209916711
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2225,19 +2225,21 @@ class JsonSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r208022783
--- Diff: docs/sql-programming-guide.md ---
@@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- In
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207955299
--- Diff: docs/sql-programming-guide.md ---
@@ -1892,6 +1892,7 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
- In
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207850329
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2225,19 +2225,21 @@ class JsonSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207738315
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1476,6 +1476,14 @@ object SQLConf {
"are performed
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207701331
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1476,6 +1476,14 @@ object SQLConf {
"are
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207032024
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r207005019
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206985104
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206983433
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206976717
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206400571
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVBenchmarks.scala
---
@@ -119,8 +119,47 @@ object CSVBenchmarks {
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206059735
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206047653
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r206045407
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205978291
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2233,7 +2233,7 @@ class JsonSuite extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205978149
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205978091
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -203,19 +203,11 @@ class UnivocityParser(
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205977956
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205974316
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
@@ -2233,7 +2233,7 @@ class JsonSuite extends
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205974224
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -203,19 +203,11 @@ class UnivocityParser(
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205974275
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FailureSafeParser.scala
---
@@ -56,9 +57,14 @@ class FailureSafeParser[IN](
Github user dmateusp commented on a diff in the pull request:
https://github.com/apache/spark/pull/21909#discussion_r205969639
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession:
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21909
[SPARK-24959][SQL] Speed up count() for JSON and CSV
## What changes were proposed in this pull request?
In the PR, I propose to skip invoking of the CSV/JSON parser per each line
in the
37 matches
Mail list logo