Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21803
> is the purpose of this API is to have a int instead of struct
Basically, yes. All those methods `simpleString()`, `catalogString()`,
`sql()` return `struct< ... : ...>`
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21803#discussion_r203618974
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -436,6 +436,14 @@ object StructType extends AbstractDataType
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21803
> (As I described in the jira) What's this func is used for?
@maropu I answered in JIRA, please, look at it.
---
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21803
@hvanhovell Could you look at the PR please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> it's not terribly useful to know, e.g., that there are 5 million cores in
the cluster if your Job is running in a scheduler pool that is restricted to
using far fewer CPUs via th
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21798
Please, look at this PR: https://github.com/apache/spark/pull/21810 . It
introduces `AvroOptions`.
---
-
To unsubscribe, e
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21810
[SPARK-24854][SQL] Gathering all Avro options into the AvroOptions class
## What changes were proposed in this pull request?
In the PR, I propose to put all `Avro` options in new class
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21720
@gatorsmile @maryannxue Can we move forward with this PR:
https://github.com/apache/spark/pull/21699 ?
---
-
To unsubscribe, e
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> ... unless explicitly overridden by user.
This is the problem this PR addresses, actually.
> If you need fine grained information about executors, use spark listener
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21803
@maropu I added quoting of column names
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21803
[SPARK-24849][SQL] Converting a value of StructType to a DDL string
## What changes were proposed in this pull request?
In the PR, I propose to extend the `StructType` object by new method
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> User's are not expected to override it unless they want fine grained
control over the value
This is actually one of the use cases when an user need to take control or
tune a qu
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> I am not seeing the utility of these two methods.
@mridulm I describe the utility of the methods in the ticket:
https://issues.apache.org/jira/browse/SPARK-24
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/21192
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21798#discussion_r203168190
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -276,10 +274,15 @@ private[avro] object AvroFileFormat
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21798#discussion_r203160706
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -276,10 +274,15 @@ private[avro] object AvroFileFormat
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21192
Looks like there is no consensus for the PR. @rxin @cloud-fan @HyukjinKwon
Should I close it?
---
-
To unsubscribe, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20793
I am closing the PR because it changes external behavior. Maybe I will
create new one for Spark 3.0
---
-
To unsubscribe, e
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/20793
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r203148694
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21798
[SPARK-24836][SQL] New option for Avro datasource - ignoreExtension
## What changes were proposed in this pull request?
I propose to add new option for AVRO datasource which should control
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21769
> can we submit a separate PR to add a new option for AVRO?
Sure, I will do.
---
-
To unsubscribe, e-mail: revi
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21769
@gengliangwang @gatorsmile Please, have a look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21439
@gatorsmile @gengliangwang @maropu The change doesn't break existing
behavior. I set new option to the value which preserve backward compatibly. The
PR just extend existing implementatio
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20949#discussion_r202781709
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
---
@@ -512,6 +513,44 @@ class CSVSuite extends QueryTest
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk closed the pull request at:
https://github.com/apache/spark/pull/21773
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21773
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21773
[SPARK-24810][SQL] Fix paths to test files in AvroSuite
## What changes were proposed in this pull request?
In the PR, I propose to move `testFile()` to the common trait
`SQLTestUtilsBase
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> AFAIK, we always have num of executor ...
Not in all cases, Databricks clients can create auto-scaling clusters:
https://docs.databricks.com/user-guide/clusters/sizing.html#cluster-s
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21771#discussion_r202536141
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1555,6 +1559,9 @@ class SparkContext(config: SparkConf) extends Logging
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r202526423
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r202525034
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -623,7 +624,7 @@ class AvroSuite extends SparkFunSuite
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r202524884
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r202524696
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -809,4 +810,16 @@ class AvroSuite extends SparkFunSuite
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21769#discussion_r202524552
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala ---
@@ -64,7 +64,7 @@ private[avro] class AvroFileFormat extends
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21771
[SPARK-24807][SQL] Adding files/jars twice: output a warning and add a note
## What changes were proposed in this pull request?
In the PR, I propose to output an warning if the `addFile
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21769
@gengliangwang @gatorsmile Please, have a look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21769
[SPARK-24805][SQL] Do not ignore avro files without extensions
## What changes were proposed in this pull request?
In the PR, I propose to change default behaviour of AVRO datasource which
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
> in this cluster do we really mean cores allocated to the "application" or
"job"?
@felixcheung What about `number of CPUs/Executors potentially available to
a
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21589#discussion_r202462678
--- Diff: R/pkg/R/context.R ---
@@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) {
sc <- getSparkContext()
invisible(callJ
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21589#discussion_r202460774
--- Diff: python/pyspark/context.py ---
@@ -406,6 +406,22 @@ def defaultMinPartitions(self):
"""
retu
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21589#discussion_r202459283
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends
Logging
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21589#discussion_r202454060
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends
Logging
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21439
I set the option to the value which keep current behavior. So, it should be
absolutely compatibly with current implementation
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21439
@gatorsmile Could you tell me, please, what does prevent the PR from
getting merged?
---
-
To unsubscribe, e-mail: reviews
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21589
@felixcheung @HyukjinKwon Could you tell me, please, what does prevent the
PR from getting merged?
---
-
To unsubscribe, e-mail
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21730#discussion_r201817788
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala
---
@@ -132,6 +132,14 @@ class RuntimeConfig private[sql](sqlConf: SQLConf
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21730#discussion_r201817629
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/RuntimeConfigSuite.scala ---
@@ -54,4 +54,15 @@ class RuntimeConfigSuite extends SparkFunSuite
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21657
@HyukjinKwon @gatorsmile Would you mind to merge the PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r201582494
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -38,24 +38,28 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21742#discussion_r201493596
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21742#discussion_r201491815
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21742#discussion_r201492873
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21742#discussion_r201490970
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroReadBenchmark.scala
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21736
> Probably it is not a big deal to get rid of lazy.
Sure. You just do unnecessary synchronization inside of lazy implementation
when you read the `lazy val` per each `null` input,
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21736
@mgaido91 I am not blaming you broke the current implementation. I am just
testing yours. For example, now it is not clear for me why do you need `lazy
val` instead of just `val` in `@transient
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21736
@cloud-fan Just for testing, I changed implementation slightly:
- removed `@transient lazy val legacySizeOfNull =
SQLConf.get.legacySizeOfNull`
- and call the `SQLConf.get.legacySizeOfNull
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21736
I am testing the changes and have found this so far:
```
$ ./bin/spark-shell --master 'local-cluster[1, 1, 1024]'
```
By default the "legacy" behavior is enable
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r201042748
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -38,24 +38,28 @@ class UnivocityParser
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21657
@HyukjinKwon yes
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/12904
The issue has been already solved by
https://github.com/apache/spark/commit/7a2d4895c75d4c232c377876b61c05a083eab3c8
. The PR can be closed
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21730#discussion_r200851659
--- Diff: python/pyspark/sql/conf.py ---
@@ -63,6 +63,12 @@ def _checkType(self, obj, identifier):
raise TypeError("expected %s '
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21730#discussion_r200851372
--- Diff: python/pyspark/sql/conf.py ---
@@ -63,6 +63,12 @@ def _checkType(self, obj, identifier):
raise TypeError("expected %s '
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21730
[SPARK-24761][SQL] Adding of isModifiable() to RuntimeConfig
## What changes were proposed in this pull request?
In the PR, I propose to extend `RuntimeConfig` by new method
`isModifiable
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21699
@maryannxue Please, have a look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21439
@gatorsmile May I ask you to look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200838038
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -515,13 +515,33 @@ class Analyzer
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200837317
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -700,7 +700,7 @@ case class
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200837390
--- Diff: sql/core/src/test/resources/sql-tests/results/pivot.sql.out ---
@@ -144,51 +155,162 @@ PIVOT (
sum(earnings * s)
FOR course IN
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200837002
--- Diff:
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -414,7 +414,16 @@ groupingSet
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200837706
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
---
@@ -630,11 +630,29 @@ class AstBuilder(conf: SQLConf
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21720#discussion_r200837087
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -506,7 +506,7 @@ class Analyzer(
def apply
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21727
@hvanhovell Please, have a look at the PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21727
[SPARK-24757][SQL] Improving the error message for broadcast timeouts
## What changes were proposed in this pull request?
In the PR, I propose to provide a tip to user how to resolve the
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r200608015
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -38,24 +38,28 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r200291945
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -38,24 +38,28 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r200204856
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -197,15 +203,21 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r200197165
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -82,7 +83,12 @@ class UnivocityParser
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21657#discussion_r200197250
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
---
@@ -82,7 +83,12 @@ class UnivocityParser
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21596
> Users can (perhaps should) be shading Jackson but I bet most won't.
Would it be better to shade Jackson on Spark side? In that case, we will
have more space for manoeuvres in th
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21699
> Considering you can just make a call to withColumn first I'm not really
convinced in the utility of this PR.
Purpose of the PR is to make pivot API consistent to `groupBy` and cle
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21699
> Were you planning to add a new overload for each existing String
version, e.g. pivot(Column) and pivot(Column, java.util.List[Any])?
The methods have been added already. @rednaxel
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199899927
--- Diff: python/pyspark/sql/functions.py ---
@@ -2189,11 +2189,16 @@ def from_json(col, schema, options={}):
>>> df = spark.createDataF
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199899708
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3381,6 +3381,48 @@ object functions {
from_json(e, dataType, options
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199894916
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3381,6 +3381,48 @@ object functions {
from_json(e, dataType, options
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199883857
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
---
@@ -744,11 +747,42 @@ case class StructsToJson
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199881316
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
---
@@ -744,11 +747,42 @@ case class StructsToJson
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199878616
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3381,6 +3381,48 @@ object functions {
from_json(e, dataType, options
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21686#discussion_r199877744
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3381,6 +3381,48 @@ object functions {
from_json(e, dataType, options
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21699
jenkins, retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21596
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21699
[SPARK-24722][SQL] pivot() with Column type argument
## What changes were proposed in this pull request?
In the PR, I propose column-based API for the `pivot()` function. It allows
using
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21686
> Does this actually work in SQL?
Yes, it does. Please, have a look at the SQL test:
https://github.com/apache/spark/pull/21686/files#diff-3b8a538abd658a260aa32c4aa593bed7
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21596
@gatorsmile The obvious regression is in schema inferring benchmarks but in
other cases there is significant performance boost even on slower hardware.
@Fokko Could you, please, run the JSON
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/21657
> Do you mean we remove the option for column pruning in csv?
I mean reverting back the index mapping - `tokenIndexArr`. In this case,
your changes in `buildReader` are not nee
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21686
[SPARK-24709][SQL] schema_of_json() - schema inference from an example
## What changes were proposed in this pull request?
In the PR, I propose to add new function - *schema_of_json
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/21671#discussion_r199331278
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
---
@@ -317,16 +292,52 @@ class JacksonParser(
row
601 - 700 of 1029 matches
Mail list logo