[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-19 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21803 > is the purpose of this API is to have a int instead of struct Basically, yes. All those methods `simpleString()`, `catalogString()`, `sql()` return `struct< ... : ...>`

[GitHub] spark pull request #21803: [SPARK-24849][SQL] Converting a value of StructTy...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21803#discussion_r203618974 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala --- @@ -436,6 +436,14 @@ object StructType extends AbstractDataType

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21803 > (As I described in the jira) What's this func is used for? @maropu I answered in JIRA, please, look at it. ---

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21803 @hvanhovell Could you look at the PR please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > it's not terribly useful to know, e.g., that there are 5 million cores in the cluster if your Job is running in a scheduler pool that is restricted to using far fewer CPUs via th

[GitHub] spark issue #21798: [SPARK-24836][SQL] New option for Avro datasource - igno...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21798 Please, look at this PR: https://github.com/apache/spark/pull/21810 . It introduces `AvroOptions`. --- - To unsubscribe, e

[GitHub] spark pull request #21810: [SPARK-24854][SQL] Gathering all Avro options int...

2018-07-18 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21810 [SPARK-24854][SQL] Gathering all Avro options into the AvroOptions class ## What changes were proposed in this pull request? In the PR, I propose to put all `Avro` options in new class

[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21720 @gatorsmile @maryannxue Can we move forward with this PR: https://github.com/apache/spark/pull/21699 ? --- - To unsubscribe, e

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > ... unless explicitly overridden by user. This is the problem this PR addresses, actually. > If you need fine grained information about executors, use spark listener

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21803 @maropu I added quoting of column names --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21803: [SPARK-24849][SQL] Converting a value of StructTy...

2018-07-18 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21803 [SPARK-24849][SQL] Converting a value of StructType to a DDL string ## What changes were proposed in this pull request? In the PR, I propose to extend the `StructType` object by new method

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > User's are not expected to override it unless they want fine grained control over the value This is actually one of the use cases when an user need to take control or tune a qu

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > I am not seeing the utility of these two methods. @mridulm I describe the utility of the methods in the ticket: https://issues.apache.org/jira/browse/SPARK-24

[GitHub] spark pull request #21192: [SPARK-24118][SQL] Flexible format for the lineSe...

2018-07-18 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/21192 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21798: [SPARK-24836][SQL] New option for Avro datasource...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21798#discussion_r203168190 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -276,10 +274,15 @@ private[avro] object AvroFileFormat

[GitHub] spark pull request #21798: [SPARK-24836][SQL] New option for Avro datasource...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21798#discussion_r203160706 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -276,10 +274,15 @@ private[avro] object AvroFileFormat

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21192 Looks like there is no consensus for the PR. @rxin @cloud-fan @HyukjinKwon Should I close it? --- - To unsubscribe, e-mail

[GitHub] spark issue #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSeed up t...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20793 I am closing the PR because it changes external behavior. Maybe I will create new one for Spark 3.0 --- - To unsubscribe, e

[GitHub] spark pull request #20793: [WIP][SPARK-23643] Shrinking the buffer in hashSe...

2018-07-17 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/20793 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r203148694 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21798: [SPARK-24836][SQL] New option for Avro datasource...

2018-07-17 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21798 [SPARK-24836][SQL] New option for Avro datasource - ignoreExtension ## What changes were proposed in this pull request? I propose to add new option for AVRO datasource which should control

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-17 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-16 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21769 > can we submit a separate PR to add a new option for AVRO? Sure, I will do. --- - To unsubscribe, e-mail: revi

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-16 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21769 @gengliangwang @gatorsmile Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-07-16 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21439 @gatorsmile @gengliangwang @maropu The change doesn't break existing behavior. I set new option to the value which preserve backward compatibly. The PR just extend existing implementatio

[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...

2018-07-16 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20949#discussion_r202781709 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -512,6 +513,44 @@ class CSVSuite extends QueryTest

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-16 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21773: [SPARK-24810][SQL] Fix paths to test files in Avr...

2018-07-15 Thread MaxGekk
Github user MaxGekk closed the pull request at: https://github.com/apache/spark/pull/21773 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21773: [SPARK-24810][SQL] Fix paths to test files in AvroSuite

2018-07-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21773 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21773: [SPARK-24810][SQL] Fix paths to test files in Avr...

2018-07-15 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21773 [SPARK-24810][SQL] Fix paths to test files in AvroSuite ## What changes were proposed in this pull request? In the PR, I propose to move `testFile()` to the common trait `SQLTestUtilsBase

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-15 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > AFAIK, we always have num of executor ... Not in all cases, Databricks clients can create auto-scaling clusters: https://docs.databricks.com/user-guide/clusters/sizing.html#cluster-s

[GitHub] spark pull request #21771: [SPARK-24807][CORE] Adding files/jars twice: outp...

2018-07-15 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21771#discussion_r202536141 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r202526423 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r202525034 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -623,7 +624,7 @@ class AvroSuite extends SparkFunSuite

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r202524884 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r202524696 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -809,4 +810,16 @@ class AvroSuite extends SparkFunSuite

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21769#discussion_r202524552 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21771: [SPARK-24807][SQL] Adding files/jars twice: outpu...

2018-07-14 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21771 [SPARK-24807][SQL] Adding files/jars twice: output a warning and add a note ## What changes were proposed in this pull request? In the PR, I propose to output an warning if the `addFile

[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...

2018-07-14 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21769 @gengliangwang @gatorsmile Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #21769: [SPARK-24805][SQL] Do not ignore avro files witho...

2018-07-14 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21769 [SPARK-24805][SQL] Do not ignore avro files without extensions ## What changes were proposed in this pull request? In the PR, I propose to change default behaviour of AVRO datasource which

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > in this cluster do we really mean cores allocated to the "application" or "job"? @felixcheung What about `number of CPUs/Executors potentially available to a

[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202462678 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJ

[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202460774 --- Diff: python/pyspark/context.py --- @@ -406,6 +406,22 @@ def defaultMinPartitions(self): """ retu

[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202459283 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...

2018-07-13 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202454060 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-07-12 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21439 I set the option to the value which keep current behavior. So, it should be absolutely compatibly with current implementation

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21439 @gatorsmile Could you tell me, please, what does prevent the PR from getting merged? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 @felixcheung @HyukjinKwon Could you tell me, please, what does prevent the PR from getting merged? --- - To unsubscribe, e-mail

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r201817788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala --- @@ -132,6 +132,14 @@ class RuntimeConfig private[sql](sqlConf: SQLConf

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r201817629 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/RuntimeConfigSuite.scala --- @@ -54,4 +54,15 @@ class RuntimeConfigSuite extends SparkFunSuite

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21657 @HyukjinKwon @gatorsmile Would you mind to merge the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-11 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r201582494 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -38,24 +38,28 @@ class UnivocityParser

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201493596 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201491815 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201492873 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r201490970 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #21736: [SPARK-24605][SQL][FOLLOWUP] Simplify conf retrieval in ...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21736 > Probably it is not a big deal to get rid of lazy. Sure. You just do unnecessary synchronization inside of lazy implementation when you read the `lazy val` per each `null` input,

[GitHub] spark issue #21736: [SPARK-24605][SQL][FOLLOWUP] Simplify conf retrieval in ...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21736 @mgaido91 I am not blaming you broke the current implementation. I am just testing yours. For example, now it is not clear for me why do you need `lazy val` instead of just `val` in `@transient

[GitHub] spark issue #21736: [SPARK-24605][SQL][FOLLOWUP] Simplify conf retrieval in ...

2018-07-10 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21736 @cloud-fan Just for testing, I changed implementation slightly: - removed `@transient lazy val legacySizeOfNull = SQLConf.get.legacySizeOfNull` - and call the `SQLConf.get.legacySizeOfNull

[GitHub] spark issue #21736: [SPARK-24605][SQL][FOLLOWUP] Simplify conf retrieval in ...

2018-07-09 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21736 I am testing the changes and have found this so far: ``` $ ./bin/spark-shell --master 'local-cluster[1, 1, 1024]' ``` By default the "legacy" behavior is enable

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-09 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r201042748 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -38,24 +38,28 @@ class UnivocityParser

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-09 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21657 @HyukjinKwon yes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #12904: [SPARK-15125][SQL] Changing CSV data source mapping of e...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/12904 The issue has been already solved by https://github.com/apache/spark/commit/7a2d4895c75d4c232c377876b61c05a083eab3c8 . The PR can be closed

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r200851659 --- Diff: python/pyspark/sql/conf.py --- @@ -63,6 +63,12 @@ def _checkType(self, obj, identifier): raise TypeError("expected %s '

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21730#discussion_r200851372 --- Diff: python/pyspark/sql/conf.py --- @@ -63,6 +63,12 @@ def _checkType(self, obj, identifier): raise TypeError("expected %s '

[GitHub] spark pull request #21730: [SPARK-24761][SQL] Adding of isModifiable() to Ru...

2018-07-08 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21730 [SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig ## What changes were proposed in this pull request? In the PR, I propose to extend `RuntimeConfig` by new method `isModifiable

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21699 @maryannxue Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21439 @gatorsmile May I ask you to look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200838038 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -515,13 +515,33 @@ class Analyzer

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200837317 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -700,7 +700,7 @@ case class

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200837390 --- Diff: sql/core/src/test/resources/sql-tests/results/pivot.sql.out --- @@ -144,51 +155,162 @@ PIVOT ( sum(earnings * s) FOR course IN

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200837002 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -414,7 +414,16 @@ groupingSet

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200837706 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -630,11 +630,29 @@ class AstBuilder(conf: SQLConf

[GitHub] spark pull request #21720: [SPARK-24163][SPARK-24164][SQL] Support column li...

2018-07-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21720#discussion_r200837087 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -506,7 +506,7 @@ class Analyzer( def apply

[GitHub] spark issue #21727: [SPARK-24757][SQL] Improving the error message for broad...

2018-07-07 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21727 @hvanhovell Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21727: [SPARK-24757][SQL] Improving the error message fo...

2018-07-07 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21727 [SPARK-24757][SQL] Improving the error message for broadcast timeouts ## What changes were proposed in this pull request? In the PR, I propose to provide a tip to user how to resolve the

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r200608015 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -38,24 +38,28 @@ class UnivocityParser

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-05 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r200291945 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -38,24 +38,28 @@ class UnivocityParser

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r200204856 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -197,15 +203,21 @@ class UnivocityParser

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r200197165 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -82,7 +83,12 @@ class UnivocityParser

[GitHub] spark pull request #21657: [SPARK-24676][SQL] Project required data from CSV...

2018-07-04 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21657#discussion_r200197250 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala --- @@ -82,7 +83,12 @@ class UnivocityParser

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-04 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21596 > Users can (perhaps should) be shading Jackson but I bet most won't. Would it be better to shade Jackson on Spark side? In that case, we will have more space for manoeuvres in th

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-04 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21699 > Considering you can just make a call to withColumn first I'm not really convinced in the utility of this PR. Purpose of the PR is to make pivot API consistent to `groupBy` and cle

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21699 > Were you planning to add a new overload for each existing String version, e.g. pivot(Column) and pivot(Column, java.util.List[Any])? The methods have been added already. @rednaxel

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199899927 --- Diff: python/pyspark/sql/functions.py --- @@ -2189,11 +2189,16 @@ def from_json(col, schema, options={}): >>> df = spark.createDataF

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199899708 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3381,6 +3381,48 @@ object functions { from_json(e, dataType, options

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199894916 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3381,6 +3381,48 @@ object functions { from_json(e, dataType, options

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199883857 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -744,11 +747,42 @@ case class StructsToJson

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199881316 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -744,11 +747,42 @@ case class StructsToJson

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199878616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3381,6 +3381,48 @@ object functions { from_json(e, dataType, options

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21686#discussion_r199877744 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3381,6 +3381,48 @@ object functions { from_json(e, dataType, options

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21699 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-03 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21596 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...

2018-07-02 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21699 [SPARK-24722][SQL] pivot() with Column type argument ## What changes were proposed in this pull request? In the PR, I propose column-based API for the `pivot()` function. It allows using

[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21686 > Does this actually work in SQL? Yes, it does. Please, have a look at the SQL test: https://github.com/apache/spark/pull/21686/files#diff-3b8a538abd658a260aa32c4aa593bed7

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21596 @gatorsmile The obvious regression is in schema inferring benchmarks but in other cases there is significant performance boost even on slower hardware. @Fokko Could you, please, run the JSON

[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...

2018-07-02 Thread MaxGekk
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21657 > Do you mean we remove the option for column pruning in csv? I mean reverting back the index mapping - `tokenIndexArr`. In this case, your changes in `buildReader` are not nee

[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...

2018-07-01 Thread MaxGekk
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/21686 [SPARK-24709][SQL] schema_of_json() - schema inference from an example ## What changes were proposed in this pull request? In the PR, I propose to add new function - *schema_of_json

[GitHub] spark pull request #21671: [SPARK-24682] [SQL] from_json / to_json now handl...

2018-06-30 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21671#discussion_r199331278 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -317,16 +292,52 @@ class JacksonParser( row

<    2   3   4   5   6   7   8   9   10   11   >