Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22078#discussion_r228881996
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -70,7 +76,6 @@ case class
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22078#discussion_r228881824
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -261,4 +272,69 @@ case
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22275
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22870
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22871
Thanks, @dongjoon-hyun and @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22530
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22326
late LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22847#discussion_r228789484
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
.intConf
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22666#discussion_r228787126
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala
---
@@ -19,14 +19,39 @@ package
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22871
cc @BryanCutler and @gatorsmile.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/22871
[SPARK-25179][PYTHON][DOCS] Document BinaryType support in Arrow conversion
## What changes were proposed in this pull request?
This PR targets to document binary type in "Apache
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
@dongjoon-hyun and @wangyum, please fix my comment if I am wrong at any
point - I believe you guys took a look for this part more then I did
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
> Does this upgrade Hive for execution or also for metastore? Spark
supports virtually all Hive metastore versions out there, and a lot of
deployments do run different versions of Spark agai
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22868#discussion_r228776349
--- Diff: docs/sql-migration-guide-hive-compatibility.md ---
@@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features,
such as
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228776300
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Oops, mind fixing PR title too?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
@cloud-fan, thanks for doing this backport!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Merged to branch-2.4.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228731568
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22865#discussion_r228731385
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
val
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Yup, I think strictly we should change. Looks there are two occurrences at
`udf` and `pands_udf` `isinstance(..., str)`.
Another problem at PySpark is, inconsistent type comparison like
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22858#discussion_r228731178
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
>>> df.select(schema_of_json('{&
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
I meant to use
https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583
Instead of
https://github.com/apache
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22858#discussion_r228713086
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
>>> df.select(schema_of_json('{&
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22858
Wenchen, this is because
```python
if sys.version >= '3':
basestring = str
```
Is missing. Python 3 does not hav
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
Adding @gatorsmile and @cloud-fan as well since this might be potentially
breaking changes for 3.0 release (it affects RDD operation only with namedtuple
in certain case tho
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
And you can also run profiler to show the performance effect. See
https://github.com/apache/spark/pull/19246#discussion_r139874732 to run the
profile
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
You can just replace it to CloudPickler, remove changes at tests, and push
that commit here to show no case is broken
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20503
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Oh you mean the conflict fixing is not that hard. Thanks for doing this
@cloud-fan. I planned to do this today
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21157
Yea, so to avoid to break, we could change the default pickler to
CloudPickler or document this workaround. @superbobry, can you check if the
case can be preserved if we use CloudPickler
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Yea, but I meant a bit complicated but I'm okay in that way @cloud-fan.
Thanks for doing that. I planed to do it today
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
> Hive 2.3 works with Hadoop 2.x (Hive 3.x works with Hadoop 3.x).
This is essentially what we need for Hadoop 3 support
[release-2.3.2|https://github.com/apache/hive/blob/rel/rele
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Sure!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22850
Yea, I was aware of it. I think there are some more old comments in this
file if I remember this correctly. Can you double check and fix them while we
are here
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22850
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22775#discussion_r228520891
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
@ignore_unicode_prefix
@since(2.4
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22775#discussion_r228504453
--- Diff: python/pyspark/sql/functions.py ---
@@ -2365,30 +2365,32 @@ def to_json(col, options={}):
@ignore_unicode_prefix
@since(2.4
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22771
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Yup, yup .. I should sync the tests
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22814
Merged to master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/21588
Yup, it supports Hadoop 3, and other fixes what @wangyum mentioned.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22814
LGTM too
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228381742
--- Diff: docs/sql-data-sources-avro.md ---
@@ -177,6 +180,19 @@ Data source options of Avro can be set using the
`.option` method on `DataFrameR
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228380951
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -31,10 +32,32 @@ package object avro {
* @since 2.4.0
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228380639
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22827
LGTM too
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22841
Looks good to me.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22841#discussion_r228376996
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
and "5" means the five off
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22815#discussion_r228376272
--- Diff: R/pkg/R/SQLContext.R ---
@@ -434,6 +388,7 @@ read.orc <- function(path, ...) {
#' Loads a Parquet file, returning the res
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22841#discussion_r228376015
--- Diff: python/pyspark/sql/window.py ---
@@ -239,34 +212,27 @@ def rangeBetween(self, start, end):
and "5" means the five off
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/16812
This can be easily worked around, no?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22841
Yup, I also agree with this revert.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Maybe I am too much careful about it but I am kind of nervous about this
column case. I don't intend to disallow it entirely but only for Spark 2.4. We
might have to find a way to use c
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
Actually, that usecase can more easily accomplished by simply inferring
schema by JSON datasource. Yea, I indeed suggested that as workaround for this
issue before. Let's say, `spark.read
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22621
That's my point. Why do we have to document for fixing unexpected results
fixed
---
-
To unsubscribe, e-mail: re
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22747
Yup, that's similar argument I had in
https://github.com/apache/spark/pull/22773#issuecomment-432923361 I think we
should clarify what to document
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228113346
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -10,6 +10,9 @@ displayTitle: Spark SQL Upgrading Guide
## Upgrading From Spark SQL 2.4 to 3.0
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228115771
--- Diff:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroFunctionsSuite.scala
---
@@ -61,6 +59,24 @@ class AvroFunctionsSuite extends
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22814#discussion_r228065259
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala
---
@@ -21,16 +21,31 @@ import org.apache.avro.Schema
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22621
Let's say, this can be behaivour changes too since metrics are now changed.
Should we update migration guide for s
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22690
cc @cloud-fan and @gatorsmile
Should we update migration guide as well?
---
-
To unsubscribe, e-mail: reviews
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22503
@justinuang, this might affect existing users application. Although this
matches the behaviour to non-miltiline mode, can we explicitly mention it in
migration guide?
cc @cloud-fan and
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22747
This looks also external changes to existing application users. Shall we
update migration guide?
---
-
To unsubscribe, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
Yup, will encourage to update the migration guide in that way.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22728
(From https://github.com/apache/spark/pull/22773#issuecomment-432917994)
@gatorsmile and @cloud-fan, let's say this will break `DESCRIBE FUNCTION
EXTENDED`. Should we update migration gui
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22815
BTW, should we update migration guide too?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
Sure, so for clarification, we will document everything that affects to
external users application, right?
---
-
To
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
@cloud-fan, looks we are going to start another RC. Would you mind if I ask
to take a quick look before the new RC
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22775
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
My impression so far was that we note things at migration notes when they
are improvements (not bugs), and non-trivial and related to backward
compatibility.
Shall we clarify what to
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
BTW, it's closer to bug rather then improvement tho. `from_json` should
have default name `from_json` rather then `jsontostructs` - end users would
have no idea why it's called `jso
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22773
That's the exact issue I raised before and we ended up with not keeping the
compatibility in column names. @cloud-fan and @hvanh
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22795
Thanks @viirya!!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22816#discussion_r228012237
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22816#discussion_r228012035
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -114,7 +114,7 @@ private[spark] abstract class BasePythonRunner[IN
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22237
Thanks all!!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22795
Thanks, @BryanCutler.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
Adding it as a known issue sounds reasonable to me as well.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22807#discussion_r227784077
--- Diff: python/pyspark/sql/tests.py ---
@@ -4961,6 +4961,31 @@ def foofoo(x, y):
).collect
)
+def
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22730
adding @cloud-fan since accumulator version 2 was added by you.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22795#discussion_r227634476
--- Diff: python/pyspark/sql/functions.py ---
@@ -3023,6 +3023,42 @@ def pandas_udf(f=None, returnType=None,
functionType=None
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22237
https://github.com/apache/spark/pull/22237/files#r223707899 makes sense to
me. Addressed. LGTM from my side as well
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
> According to the policy, we don't have to block the current release
because of i
@cloud-fan, BTW, would you mind if I ask to share what you read? I want to
be aware of the p
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22514
@cloud-fan, is this a performance regression that affects users that use
Hive serde tables as well?
---
-
To unsubscribe, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
Wait wait .. do we only care about regressions as blockers for the last
release (2.3)? I'm asking this because I really don't know. If so
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22237
Ah, yea I have a direct access to this branch. Let me just rebase/address
the comment tomorrow.
---
-
To unsubscribe, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
For instance,
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ
this discussion thread was started almost one year
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
It does try to fix a backward compatibility issue. It is found later now
but still it is true we found a breaking issue to fix. Broken backward
compatibility that potentially affects a set of
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22237
Oh wait you left a sign-off. Let me rebase it within tomorrow - wouldn't be
a big job.
---
-
To unsubscribe, e
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22237
If @gengliangwang find some time to work on this, yea please go ahead.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22047#discussion_r227367500
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/AnyAgg.scala
---
@@ -0,0 +1,64 @@
+/*
+ * Licensed
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22803
Yea, similar opinion. If it does not fix an actual problem, I wouldn't
encourage to fix too ..
---
-
To unsubscri
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22144
If we were going for 3.0, then I would definitely leave +1 and I agree that
we should rather focus on Spark itself as a higher priority - we should do that
when we go 3.0 and rather drop such
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22782#discussion_r227336895
--- Diff: bin/docker-image-tool.sh ---
@@ -79,7 +79,7 @@ function build {
fi
# Verify that Spark has actually been built/is a
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22782#discussion_r227336318
--- Diff: bin/docker-image-tool.sh ---
@@ -79,7 +79,7 @@ function build {
fi
# Verify that Spark has actually been built/is a
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22666
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
701 - 800 of 12634 matches
Mail list logo