[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22951 @HyukjinKwon @dongjoon-hyun Please, review the changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

2018-11-05 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22951 I will update docs soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

2018-11-05 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22951 [SPARK-25945][SQL] Support locale while parsing date/timestamp from CSV/JSON ## What changes were proposed in this pull request? In the PR, I propose to add new option `locale

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-04 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

2018-11-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22938#discussion_r230586281 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -240,16 +240,6 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

2018-11-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22938#discussion_r230585549 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -552,13 +552,19 @@ case class JsonToStructs

[GitHub] spark pull request #22939: [SPARK-25446][R] Add schema_of_json() and schema_...

2018-11-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22939#discussion_r230584990 --- Diff: R/pkg/R/functions.R --- @@ -202,14 +202,18 @@ NULL #' \itemize{ #' \item \code{from_json}: a structType object to use

[GitHub] spark pull request #22938: [SPARK-25935][SQL] Prevent null rows from JSON pa...

2018-11-04 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22938 [SPARK-25935][SQL] Prevent null rows from JSON parser ## What changes were proposed in this pull request? An input without valid JSON tokens on the root level will be treated as a bad

[GitHub] spark issue #22920: [SPARK-25931][SQL] Benchmarking creation of Jackson pars...

2018-11-03 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22920 @dongjoon-hyun Thank you for re-running the benchmarks on EC2, and @HyukjinKwon for review. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559020 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityGenerator.scala --- @@ -15,18 +15,17 @@ * limitations under

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230559006 --- Diff: sql/core/src/test/resources/sql-tests/inputs/csv-functions.sql --- @@ -15,3 +15,10 @@ CREATE TEMPORARY VIEW csvTable(csvField, a) AS SELECT

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230555774 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -174,3 +176,66 @@ case class SchemaOfCsv

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 @gatorsmile @HyukjinKwon @viirya @rednaxelafx Are you ok with the proposed changes or there is something which blocks the PR for now

[GitHub] spark issue #22929: [SPARK-25927][SQL] Fix number of partitions returned by ...

2018-11-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22929 ping @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22929: [SPARK-25927][SQL] Fix number of partitions retur...

2018-11-02 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22929 [SPARK-25927][SQL] Fix number of partitions returned by outputPartitioning ## What changes were proposed in this pull request? In the PR, I propose to make the `outputPartitioning

[GitHub] spark issue #22925: [SPARK-25913][SQL] Extend UnaryExecNode by unary SparkPl...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22925 Need to check this: https://github.com/apache/spark/blob/77e52448e7f94aadfa852cc67084415de6ecfa7c/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous

[GitHub] spark issue #22925: [SPARK-25913][SQL] Extend UnaryExecNode by unary SparkPl...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22925 ping @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22925: [SPARK-25913][SQL] Extend UnaryExecNode by unary ...

2018-11-01 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22925 [SPARK-25913][SQL] Extend UnaryExecNode by unary SparkPlan nodes ## What changes were proposed in this pull request? In the PR, I propose to extend `UnaryExecNode` instead of `SparkPlan

[GitHub] spark issue #22920: [SPARK-24959][SQL][FOLLOWUP] Creating Jackson parser in ...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22920 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22920: [SPARK-24959][SQL][FOLLOWUP] Creating Jackson par...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22920#discussion_r230095357 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala --- @@ -86,6 +86,7 @@ object JSONBenchmark extends

[GitHub] spark pull request #22844: [SPARK-25847][SQL][TEST] Refactor JSONBenchmarks ...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22844#discussion_r230030467 --- Diff: sql/core/benchmarks/JSONBenchmark-results.txt --- @@ -0,0 +1,37

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230028296 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -174,3 +176,66 @@ case class SchemaOfCsv

[GitHub] spark pull request #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22626#discussion_r230027771 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -45,7 +45,6 @@ class CsvFunctionsSuite extends QueryTest

[GitHub] spark pull request #22920: [SPARK-24959][SQL][FOLLOWUP] Creating Jackson par...

2018-11-01 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22920 [SPARK-24959][SQL][FOLLOWUP] Creating Jackson parser in the encoding JSON benchmarks ## What changes were proposed in this pull request? The #21909 introduced an optimisation for `count

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 May I ask you @hvanhovell @zsxwing to review the PR one more time. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 > @MaxGekk I sent email to spark dev list about structured plan logging, but did not get any response. @boy-uber I guess It is better to speak about the feature to @bogdanrdc @hvanhov

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-11-01 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22666 @HyukjinKwon Never mind. Thank you for your work on the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-31 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 @HyukjinKwon Thank you for your work on the PR. @cloud-fan @felixcheung @dongjoon-hyun @gatorsmile Thanks for your reviews

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-31 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 @HyukjinKwon Thank you for following up work on the PR. @cloud-fan @viirya @maropu Thanks for your reviews

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-13 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r224950964 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3854,6 +3854,38 @@ object functions { @scala.annotation.varargs

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22654 @gatorsmile @srowen Thank you for your work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 @srowen @HyukjinKwon @cloud-fan Thank you for your review of the PR. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r224846183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r224844756 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVUtils.scala --- @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r224843696 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3854,6 +3854,38 @@ object functions { @scala.annotation.varargs

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22654#discussion_r224799147 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1826,4 +1826,13 @@ class CSVSuite extends

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 @hvanhovell @zsxwing Could you look at this during the next a few days otherwise I will be able to come back to the PR in 3 weeks, please

[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22019 Probably I will not be able to look at this for the next a few weeks. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22666 > Let's add from_csv first. Sure, I just wanted to make it ready since the changes are not overlapped so m

[GitHub] spark issue #20877: [SPARK-23765][SQL] Supports custom line separator for js...

2018-10-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20877 > ... are you busy? Do you have some time to go for CSV's lineSep? @HyukjinKwon I will be on a vacation for 3 weeks but highly likely I will be in a place where there is no inter

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 @kiszk It has been already parallelized by @srowen: https://github.com/apache/spark/blob/eaafcd8a22db187e87f09966826dcf677c4c38ea/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > Then yes, it's a better place. Sorry for the back and forth! @cloud-fan Never mind. I will return it back. Thank you for review

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22654 @gatorsmile Could you look at it one more time, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r224418619 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala --- @@ -167,6 +172,58 @@ package object util

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > let's define it in the test scope. What's the test scope here? Both `DateTimeTestUtils` and `SparkFunSuite` are used in test suites o

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r224415554 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/CatalystDataToAvro.scala --- @@ -52,7 +52,7 @@ case class CatalystDataToAvro(child

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r224415510 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -633,4 +633,14 @@ package object config { .stringConf

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r224391712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala --- @@ -167,6 +172,58 @@ package object util

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r224389798 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -633,4 +633,14 @@ package object config { .stringConf

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22666 @gatorsmile @cloud-fan May I ask you to look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22626 @cloud-fan @gatorsmile Could you look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > How about defining this subset in object DateTimeTestUtils like ALL_TIMEZONES? This is what I already did in the PR: https://github.com/apache/spark/pull/22657#discussion_r224086

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > Where else can we apply this subset? I am going to make similar changes in tests for `from_csv` where we don't need to test all timezones too: https://github.com/apache/spark/pull/22

[GitHub] spark pull request #22657: [SPARK-25670][TEST] Reduce number of tested timez...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22657#discussion_r224091433 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeTestUtils.scala --- @@ -26,6 +26,16 @@ object DateTimeTestUtils

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > Then we don't need any randomness here, just pick one timezone(like PST?) and test it. I took 8 out of 627. Going to do the same in the PR: https://github.com/apache/spark/pull/22

[GitHub] spark issue #22626: [SPARK-25638][SQL] Adding new function - to_csv()

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22626 @HyukjinKwon Could you look at this PR one more time, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22666: [SPARK-25672][SQL] schema_of_csv() - schema inference fr...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22666 @HyukjinKwon May I ask you to look at this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r224044435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala --- @@ -15,50 +15,57 @@ * limitations under

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r224042803 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r224042251 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223852220 --- Diff: docs/sql-programming-guide.md --- @@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223849217 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223836030 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223833101 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -90,6 +91,10 @@ class JacksonParser

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22654#discussion_r223754099 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1826,4 +1826,13 @@ class CSVSuite extends

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22654#discussion_r223749951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala --- @@ -97,23 +97,22 @@ object CSVUtils

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223741059 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -330,7 +333,10 @@ private[csv] object

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223730392 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala --- @@ -139,14 +138,15 @@ class CSVFileFormat extends

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223737261 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -273,44 +273,47 @@ private[csv] object

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223727251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -505,20 +505,14 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223729530 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVHeaderChecker.scala --- @@ -0,0 +1,131 @@ +/* + * Licensed

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223741951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -251,7 +125,7 @@ object TextInputCSVDataSource

[GitHub] spark pull request #22676: [SPARK-25684][SQL] Organize header related codes ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22676#discussion_r223698787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -505,20 +505,14 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark issue #22590: [SPARK-25574][SQL]Add an option `keepQuotes` for parsing...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22590 @10110346 Could you describe the use case when you need this, please. As @HyukjinKwon said in one of PR, `uniVocity` parser support many config options, and we cannot expose everything from

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223675223 --- Diff: docs/sql-programming-guide.md --- @@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22654#discussion_r223635442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala --- @@ -97,23 +97,21 @@ object CSVUtils

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22654#discussion_r223623311 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala --- @@ -97,23 +97,21 @@ object CSVUtils

[GitHub] spark issue #22656: [SPARK-25669][SQL] Check CSV header only when it exists

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22656 @HyukjinKwon Could you look at the PR, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 @zsxwing Please, have a look at the PR one more time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22654 @gatorsmile Please, take a look at this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223459186 --- Diff: docs/sql-programming-guide.md --- @@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 > I didn't see that one, and I object to it. I believe we don't need to test all timezones in the case of JSON datasource. Actually we just check how `FastDateFormat` of `commons-la

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223263164 --- Diff: docs/sql-programming-guide.md --- @@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22666: [SPARK-25672][SQL] schema_of_csv() - schema infer...

2018-10-07 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22666 [SPARK-25672][SQL] schema_of_csv() - schema inference from an example ## What changes were proposed in this pull request? In the PR, I propose to add new function - *schema_of_csv()* which

[GitHub] spark issue #22657: [SPARK-25670][TEST] Reduce number of tested timezones in...

2018-10-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22657 @srowen What is the difference between this test suite and this PR https://github.com/apache/spark/pull/22631 . Also I took a sub-set of timezones in the PR: https://github.com/apache/spark/pull

[GitHub] spark issue #22656: [SPARK-25669][SQL] Check CSV header only when it exists

2018-10-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22656 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22657: [SPARK-25670][SQL] Reduce number of tested timezo...

2018-10-06 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22657 [SPARK-25670][SQL] Reduce number of tested timezones in JsonExpressionsSuite ## What changes were proposed in this pull request? After the changes, total execution time

[GitHub] spark pull request #22656: [SPARK-25669][SQL] Check CSV header only when it ...

2018-10-06 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22656 [SPARK-25669][SQL] Check CSV header only when it exists ## What changes were proposed in this pull request? Currently the first row of dataset of CSV strings is compared to field names

[GitHub] spark issue #22654: [SPARK-25660][SQL] Fix for the backward slash as CSV fie...

2018-10-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22654 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22654: [SPARK-25660][SQL] Fix for the backward slash as ...

2018-10-06 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22654 [SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter ## What changes were proposed in this pull request? The PR addresses the exception raised on accessing chars out

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-10-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r223152506 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r223144685 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-10-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r222954838 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r222942666 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala --- @@ -51,6 +56,8 @@ object ParseMode extends Logging

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r222814860 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala --- @@ -51,6 +56,8 @@ object ParseMode extends Logging

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r222812531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ParseMode.scala --- @@ -51,6 +56,8 @@ object ParseMode extends Logging

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-10-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r222811744 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -550,59 +550,93 @@ case class JsonToStructs

<    1   2   3   4   5   6   7   8   9   10   >