[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r218197544 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -469,7 +470,17 @@ abstract class TreeNode[BaseType

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r218187690 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala --- @@ -250,5 +254,36 @@ class QueryExecution(val sparkSession

[GitHub] spark issue #22442: [SPARK-25447][SQL] Support JSON options by schema_of_jso...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22442 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...

2018-09-17 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22442 [SPARK-25447][SQL] Support JSON options by schema_of_json() ## What changes were proposed in this pull request? In the PR, I propose to extended the `schema_of_json()` function

[GitHub] spark issue #22366: [SPARK-25384][SQL] Removing of spark.sql.fromJsonForceNu...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22366 @dongjoon-hyun The master branch became `2.5.0-SNAPSHOT` recently. Can we move forward with this PR? --- - To unsubscribe, e

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22365 > Seems fine but I or someone else should take a closer look before getting this in. @HyukjinKwon Whom can I ask to look at this? @gatorsmile Please, give me an adv

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22316 @HyukjinKwon @maropu @jaceklaskowski Please, take a look at this PR one more time. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 @viirya @maropu @HyukjinKwon May I ask you to look at this one more time, please. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22429#discussion_r217928631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala --- @@ -469,7 +470,17 @@ abstract class TreeNode[BaseType

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-09-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22429 @rednaxelafx Please, take a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dump query execution info to a...

2018-09-15 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22429 [SPARK-25440][SQL] Dump query execution info to a file ## What changes were proposed in this pull request? In the PR, I propose new method for debugging queries by dumping info about

[GitHub] spark pull request #22413: [SPARK-25425][SQL] Extra options overwrite sessio...

2018-09-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22413#discussion_r217736452 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,10 +241,12 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #22413: [SPARK-25425][SQL] Extra options overwrite sessio...

2018-09-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22413#discussion_r217703802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -202,7 +202,7 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #22413: Session options shouldn't override extra options

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22413#discussion_r217513373 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -202,7 +202,7 @@ class DataFrameReader private[sql](sparkSession

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r217476618 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 @HyukjinKwon Please, take a look at it again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22413: Session options shouldn't override extra options

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22413 @cloud-fan Please, take a look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22413: Session options shouldn't override extra options

2018-09-13 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22413 Session options shouldn't override extra options ## What changes were proposed in this pull request? In the PR, I propose to change order of options application. Extra options specified

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 > Out of curiosity, is this one related with an actual usecase Maxim? or is this proposed for API consistency? This is actual use case when users received CSV content dumped from anot

[GitHub] spark pull request #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV ...

2018-09-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22374#discussion_r217279960 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala --- @@ -240,23 +240,25 @@ object

[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21999 @gatorsmile Is there any chance this will be merged or I should close it? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 > The concern here is, it sounds we are stepping back from the ideal approach. @HyukjinKwon @dongjoon-hyun Should I move everything related to CSV to `external` in a separate PR? It se

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22365 @HyukjinKwon May I ask you to look at this PR one more time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22316 @gatorsmile Do you have any objections for this approach? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV ...

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22374#discussion_r217082220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -216,7 +216,12 @@ class UnivocityParser

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 @felixcheung Thank you for your comment. I have to move description of `schema` from `from_json` too otherwise I got the warning and build failure: `Duplicated \argument entries in documentation

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216734651 --- Diff: R/pkg/R/functions.R --- @@ -3720,3 +3720,22 @@ setMethod("current_timestamp", jc <- callJStatic("org.apache.s

[GitHub] spark pull request #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV ...

2018-09-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22374#discussion_r216569630 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1700,4 +1700,13 @@ class CSVSuite extends

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22367 @HyukjinKwon The same for the `master` branch: https://github.com/apache/spark/pull/22389 --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22389: [SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix em...

2018-09-11 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22389 [SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty string being parsed as null when nullValue is set. ## What changes were proposed in this pull request? In the PR, I propose new CSV

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22367 > Oh, wait. why does this target branch-2.4? Just to make sure, the changes don't have conflicts in `branch-2.4`. @HyukjinKwon Is this PR not mergeable to `master`? > You ca

[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22374 > This line below possibly returns null? @maropu It can return `null` but inside of `CSVDataSource.checkHeaderColumnNames` there is a `null` check

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r216482340 --- Diff: python/pyspark/sql/dataframe.py --- @@ -880,18 +880,23 @@ def sampleBy(self, col, fractions, seed=None): | 0|5

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216463293 --- Diff: docs/sql-programming-guide.md --- @@ -1897,6 +1897,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22367 > mind adding Closes #22234 at the end of PR description so that we can automatically close that one? Just in case, this PR for `branch-2.4` but the original #22234 for `mas

[GitHub] spark issue #22366: [SPARK-25384][SQL] Removing of spark.sql.fromJsonForceNu...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22366 > If this targets for 3.0, can we postpone this until the master branch get 3.0.0-SNAPSHOT? @dongjoon-hyun Yes, s

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r216446691 --- Diff: sql/catalyst/pom.xml --- @@ -103,6 +103,12 @@ commons-codec commons-codec + + com.univocity

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r21650 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -469,4 +470,26 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark issue #22366: [SPARK-25384][SQL] Removing of spark.sql.fromJsonForceNu...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22366 > Is it better to add a description to docs/sql-programming-guide.md? > Yea, let's note so that we can track what we change. @kiszk @HyukjinKwon May I ask you to clarify this.

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22367#discussion_r216324646 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -79,7 +79,8 @@ private[csv] object

[GitHub] spark issue #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22379 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 > Will take a look soon. @HyukjinKwon Thank you. Waiting for you feedback. --- - To unsubscribe, e-mail: revi

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-10 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22379 [SPARK-25393][SQL] Adding new function from_csv() ## What changes were proposed in this pull request? The PR adds new function `from_csv()` similar to `from_json()` to parse columns

[GitHub] spark pull request #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV ...

2018-09-09 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22374 [SPARK-25387][SQL] Fix for NPE caused by bad CSV input ## What changes were proposed in this pull request? The PR fixes NPE in `UnivocityParser` caused by malformed CSV input. In some

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-09 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22367 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...

2018-09-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 @gatorsmile @HyukjinKwon Please, take a look at #22367 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix emp...

2018-09-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22367 [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty string being parsed as null when nullValue is set. ## What changes were proposed in this pull request? In the PR, I propose new CSV

[GitHub] spark pull request #22366: [SPARK-25384][SQL] Removing of spark.sql.fromJson...

2018-09-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22366 [SPARK-25384][SQL] Removing of spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? In the PR, I propose to remove

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22365 [SPARK-25381][SQL] Stratified sampling by Column argument ## What changes were proposed in this pull request? In the PR, I propose to add an overloaded method for `sampleBy` which accepts

[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...

2018-09-07 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 > cc @MaxGekk for a followup @HyukjinKwon Do you mean to update migration guide in master and probably in Spark 2.4? I don't think this should be considered as a bug because curr

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-09-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 @HyukjinKwon I re-targeted the changes for Spark 3.0. Please, take a look at it one more time. --- - To unsubscribe, e-mail

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-06 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22316 @HyukjinKwon May I ask you to look at the PR. Is there anything which blocks the PR for now? --- - To unsubscribe, e-mail

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-09-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214878373 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -663,12 +662,7 @@ case class StructsToJson

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214842133 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214754379 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r214722485 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala --- @@ -308,4 +308,27 @@ class DataFramePivotSuite extends QueryTest

[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/22030 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-02 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22316 [SPARK-25048][SQL] Pivoting by multiple columns in Scala/Java ## What changes were proposed in this pull request? In the PR, I propose to extend implementation of existing method

[GitHub] spark issue #22030: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22030 Please, review this PR https://github.com/apache/spark/pull/22316 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-09-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214509654 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}

[GitHub] spark issue #22283: [SPARK-25283][CORE] Fix for a deadlock in UnionRDD

2018-08-31 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22283 @gatorsmile I am closing this since the changes in UnionRDD have been already merged to the master: https://github.com/apache/spark/commit/32da87dfa451fff677ed9316f740be2abdbff6a4

[GitHub] spark pull request #22283: [SPARK-25283][CORE] Fix for a deadlock in UnionRD...

2018-08-31 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/22283 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22292: [SPARK-25286][CORE] Removing the dangerous parmap

2018-08-31 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22292 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22226: [SPARK-25252][SQL] Support arrays of any types by...

2018-08-30 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r214188879 --- Diff: R/pkg/R/functions.R --- @@ -1697,8 +1697,8 @@ setMethod("to_date", }) #' @details -#' \code{to_json}

[GitHub] spark issue #22283: [SPARK-25283][CORE] Fix for a deadlock in UnionRDD

2018-08-30 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22283 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22292: [SPARK-25286][CORE] Removing the dangerous parmap

2018-08-30 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22292 @zsxwing @gatorsmile Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22292: [SPARK-25286][CORE] Removing the dangerous parmap

2018-08-30 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22292 [SPARK-25286][CORE] Removing the dangerous parmap ## What changes were proposed in this pull request? I propose to remove the `parmap` method which accepts an execution context

[GitHub] spark pull request #22283: [SPARK-25283][CORE] Fix for a deadlock in UnionRD...

2018-08-30 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22283 [SPARK-25283][CORE] Fix for a deadlock in UnionRDD ## What changes were proposed in this pull request? The commit https://github.com/apache/spark/commit

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-08-30 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r213975610 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,28 @@ case class JsonToStructs

[GitHub] spark pull request #22272: [SPARK-25273][DOC] How to install testthat 1.0.2

2018-08-29 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22272 [SPARK-25273][DOC] How to install testthat 1.0.2 ## What changes were proposed in this pull request? R test require `testthat` v1.0.2. In the PR, I described that in the section http

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-08-28 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 > to match the current behaviour to PERMISSIVE mode, explain that in the migration guide. @HyukjinKwon Should I target to Spark 3.0 or

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-28 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213419455 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -52,23 +52,24 @@ class InMemoryCatalogedDDLSuite extends

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-28 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213327251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-28 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213252905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213075357 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark issue #22226: [SPARK-25252][SQL] Support arrays of any types by to_jso...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/6 > Probably, you'd be better to file separate jira for each function. > +1 for separate JIRA. I created the JIRA ticket: https://issues.apache.org/jira/browse/SPARK

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r213050406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #22226: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r212996418 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala --- @@ -32,29 +32,29 @@ object JacksonUtils

[GitHub] spark pull request #22226: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/6#discussion_r212995984 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -65,6 +66,8 @@ private[sql] class JacksonGenerator

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r212925256 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -469,4 +470,23 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r212924389 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -469,4 +470,23 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r212922787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,22 @@ case class JsonToStructs

[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...

2018-08-27 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/6 @maropu The JIRA ticket was about both `to_json` and `from_json` originally. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_json

2018-08-26 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22237 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-26 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r212834324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-08-26 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22237 [SPARK-25243][SQL] Use FailureSafeParser in from_json ## What changes were proposed in this pull request? In the PR, I propose to switch `from_json` on `FailureSafeParser`, and to make

[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-08-26 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/18447 @HyukjinKwon I also neutral since I don't have potential use cases for the function. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...

2018-08-26 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 Should the new option be taken into account there: https://github.com/apache/spark/blob/b461acb2d90b734393c27fe7b359e2f2d297b8d4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-26 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r212817243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -671,7 +674,7 @@ case class AlterTableRecoverPartitionsCommand

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-25 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22233#discussion_r212804252 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1131,7 +1135,7 @@ abstract class DDLSuite extends

[GitHub] spark pull request #22233: [SPARK-25240][SQL] Fix for a deadlock in RECOVER ...

2018-08-25 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22233 [SPARK-25240][SQL] Fix for a deadlock in RECOVER PARTITIONS ## What changes were proposed in this pull request? In the PR, I propose to not perform recursive parallel listening of files

[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...

2018-08-24 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/6 @HyukjinKwon Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22226: [SPARK-24391][SQL] Support arrays of any types by...

2018-08-24 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/6 [SPARK-24391][SQL] Support arrays of any types by to_json ## What changes were proposed in this pull request? In the PR, I propose to extended `to_json` and support any types as element

[GitHub] spark issue #22177: [SPARK-25199][Web UI] stages in wrong order within job p...

2018-08-22 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22177 Probably you put wrong JIRA in the title. `SPARK-25199` -> `SPARK-25119` --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark pull request #22123: [SPARK-25134][SQL] Csv column pruning with checki...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22123#discussion_r211081732 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -1603,6 +1603,25 @@ class CSVSuite extends

[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22123 May I ask you check the `multiLine` mode additionally since we use different methods of uniVocity parser. When `multiLine` is disabled, the `parseLine` method is used but in the `multiLine` mode

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211075385 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala --- @@ -223,7 +224,8 @@ object

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r211075384 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1492,6 +1492,15 @@ object SQLConf { "This us

<    1   2   3   4   5   6   7   8   9   10   >