chaoqin-li1123 opened a new pull request, #38013:
URL: https://github.com/apache/spark/pull/38013
### What changes were proposed in this pull request?
An example for applyInPandasWithState usage. This example split lines into
words, group by words as key and use the state per key
mridulm commented on PR #37779:
URL: https://github.com/apache/spark/pull/37779#issuecomment-1259029762
Can you take a look at comment above @yabola and work on the fix ? Since you
already spent a lot of time on this.
--
This is an automated message from the Apache Git Service.
To
itholic closed pull request #38012: [DO-NOT-MERGE][TEST] Pandas 1.5 Test
URL: https://github.com/apache/spark/pull/38012
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
zhengruifeng closed pull request #38009: [SPARK-40573][PS] Make `ddof` in
`GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers
URL: https://github.com/apache/spark/pull/38009
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
dongjoon-hyun commented on code in PR #38001:
URL: https://github.com/apache/spark/pull/38001#discussion_r980867778
##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3574,6 +3574,15 @@ object SQLConf {
.booleanConf
itholic opened a new pull request, #38016:
URL: https://github.com/apache/spark/pull/38016
### What changes were proposed in this pull request?
This PR proposes to fix the test `IndexesTest.test_to_frame` to support
pandas 1.5.0
### Why are the changes needed?
zhengruifeng opened a new pull request, #38017:
URL: https://github.com/apache/spark/pull/38017
### What changes were proposed in this pull request?
make `GroupBy.first` skip nulls
### Why are the changes needed?
to fix the behavior difference
```
In [1]:
HeartSaVioR commented on code in PR #38013:
URL: https://github.com/apache/spark/pull/38013#discussion_r980858543
##
examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py:
##
@@ -0,0 +1,114 @@
+#
+# Licensed to the Apache Software Foundation
HeartSaVioR commented on code in PR #38013:
URL: https://github.com/apache/spark/pull/38013#discussion_r980870512
##
examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py:
##
@@ -0,0 +1,114 @@
+#
+# Licensed to the Apache Software Foundation
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980939958
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +869,50 @@ class Analyzer(override val catalogManager:
itholic commented on PR #38018:
URL: https://github.com/apache/spark/pull/38018#issuecomment-1259194440
Yeah, I think maybe we should also address the other I/O functions if there
is behavior differences.
We already document about the difference for almost I/O functions, but seems
HeartSaVioR commented on PR #37936:
URL: https://github.com/apache/spark/pull/37936#issuecomment-1259257462
@WweiL GA build unfortunately caught the unused import. Could you please run
`mvn clean install -DskipTests` and `dev/scalastyle` and make sure both pass,
before pushing a new
AmplabJenkins commented on PR #38013:
URL: https://github.com/apache/spark/pull/38013#issuecomment-1259256936
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
zhengruifeng opened a new pull request, #38014:
URL: https://github.com/apache/spark/pull/38014
### What changes were proposed in this pull request?
Add badges for PySpark downloads
### Why are the changes needed?
projects like
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259053609
@caican00 I'm not sure whether it would be better change o use `toJavaMap`
or `toJavaMap.asScala` here? Can you help test it?
--
This is an automated message from
zhengruifeng commented on PR #38009:
URL: https://github.com/apache/spark/pull/38009#issuecomment-1259053301
Merged into master, thanks @HyukjinKwon for reivews
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
roczei commented on PR #37679:
URL: https://github.com/apache/spark/pull/37679#issuecomment-1259053324
@cloud-fan,
Thank you very much for your help!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
zhengruifeng commented on code in PR #38015:
URL: https://github.com/apache/spark/pull/38015#discussion_r980909404
##
python/pyspark/pandas/indexes/base.py:
##
@@ -1907,6 +1908,9 @@ def append(self, other: "Index") -> "Index":
)
index_fields =
huleilei commented on code in PR #38007:
URL: https://github.com/apache/spark/pull/38007#discussion_r980926707
##
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/index_bitmap2.q:
##
@@ -4,7 +4,10 @@ CREATE INDEX src1_index ON TABLE src(key) as 'BITMAP' WITH
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980967098
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -137,48 +138,49 @@ class DatasetUnpivotSuite extends QueryTest
itholic commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980971632
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
itholic commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980972119
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
HyukjinKwon commented on code in PR #37995:
URL: https://github.com/apache/spark/pull/37995#discussion_r980983915
##
python/pyspark/pandas/series.py:
##
@@ -6442,6 +6445,8 @@ def argmin(self, axis: Axis = None, skipna: bool = True)
-> int:
raise ValueError("axis
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980984200
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +869,50 @@ class Analyzer(override val catalogManager:
dongjoon-hyun commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1259217641
Thank you again, @cloud-fan , @viirya , @thiyaga, @huaxingao ,
@zhengruifeng .
Since the last commit is about doc, I'll merge this.
Merged to master/3.3/3.2.
cc
dongjoon-hyun closed pull request #38001: [SPARK-40562][SQL] Add
`spark.sql.legacy.groupingIdWithAppendedUserGroupBy`
URL: https://github.com/apache/spark/pull/38001
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
chaoqin-li1123 commented on PR #38013:
URL: https://github.com/apache/spark/pull/38013#issuecomment-1259024291
@HeartSaVioR The applyInPandasWithState session window example.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
yabola commented on PR #37779:
URL: https://github.com/apache/spark/pull/37779#issuecomment-1259032890
@mridulm Really thanks for your analysis! Please give me some time to
understand.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
itholic opened a new pull request, #38015:
URL: https://github.com/apache/spark/pull/38015
### What changes were proposed in this pull request?
The PR proposes to fix `CategoricalIndex.append` to match the behavior with
pandas.
### Why are the changes needed?
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259066433
Thanks ~ @caican00 waiting for your feedback :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980870200
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:
Yikf commented on code in PR #38007:
URL: https://github.com/apache/spark/pull/38007#discussion_r980876832
##
sql/hive/src/test/resources/ql/src/test/queries/clientpositive/index_bitmap2.q:
##
@@ -4,7 +4,10 @@ CREATE INDEX src1_index ON TABLE src(key) as 'BITMAP' WITH
DEFERRED
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980878546
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +869,50 @@ class Analyzer(override val catalogManager:
zhengruifeng commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980911416
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
huleilei commented on code in PR #38007:
URL: https://github.com/apache/spark/pull/38007#discussion_r980919285
##
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -216,6 +216,7 @@ statement
LEFT_PAREN
HeartSaVioR closed pull request #38008: [SPARK-40571][SS][TESTS] Construct a
new test case for applyInPandasWithState to verify fault-tolerance semantic
with random python worker failures
URL: https://github.com/apache/spark/pull/38008
--
This is an automated message from the Apache Git
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980944809
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +869,50 @@ class Analyzer(override val catalogManager:
huleilei commented on PR #38007:
URL: https://github.com/apache/spark/pull/38007#issuecomment-1259188003
> @huleilei mind completing the PR description?
OK, I have completed the PR description. Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the
caican00 commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259039554
> the collection size is greater than 500
`the collection size is greater than 500`, is it the number of elements in a
collection?
--
This is an automated message from the
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259058629
@caican00 Or if you can provide a micro-bench that can be run with GA, I am
happy to continue to solve your issue together
--
This is an automated message from the Apache Git
EvgenyZamyatin commented on PR #37967:
URL: https://github.com/apache/spark/pull/37967#issuecomment-1259169597
> is it possible to improve existing w2v instead of implementing a new one?
Yes. How do you think it should be done? Under a mode setting?
> what about implementing it in
zhengruifeng commented on PR #37770:
URL: https://github.com/apache/spark/pull/37770#issuecomment-1259182911
also, what about adding some tests in
`python/pyspark/sql/tests/test_functions.py`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please
itholic commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980971632
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
itholic commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980976721
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
HyukjinKwon commented on code in PR #37995:
URL: https://github.com/apache/spark/pull/37995#discussion_r980983915
##
python/pyspark/pandas/series.py:
##
@@ -6442,6 +6445,8 @@ def argmin(self, axis: Axis = None, skipna: bool = True)
-> int:
raise ValueError("axis
HyukjinKwon commented on code in PR #38018:
URL: https://github.com/apache/spark/pull/38018#discussion_r980991301
##
python/pyspark/pandas/frame.py:
##
@@ -5317,6 +5317,12 @@ def to_orc(
... '%s/to_orc/foo.orc' % path,
... mode = 'overwrite',
caican00 commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259063989
> > @caican00 I'm not sure whether it would be better change o use
`toJavaMap` or `toJavaMap.asScala` here? Can you help test it?
>
> Hmm... Could you try this one?
Okay.
cloud-fan commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r980852866
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##
@@ -254,7 +254,9 @@ object RewriteDistinctAggregates
HyukjinKwon commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980897890
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
HyukjinKwon commented on code in PR #38016:
URL: https://github.com/apache/spark/pull/38016#discussion_r980898223
##
python/pyspark/pandas/tests/indexes/test_base.py:
##
@@ -203,9 +203,35 @@ def test_to_frame(self):
# non-string names
itholic opened a new pull request, #38018:
URL: https://github.com/apache/spark/pull/38018
### What changes were proposed in this pull request?
This PR proposes to update the docstring of `DataFrame.to_orc`, since
`pandas.DataFrame.to_orc` is supported from pandas 1.5.0,
huleilei commented on code in PR #38007:
URL: https://github.com/apache/spark/pull/38007#discussion_r980928571
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowIndexExec.scala:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation
zhengruifeng commented on code in PR #37770:
URL: https://github.com/apache/spark/pull/37770#discussion_r980946968
##
sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala:
##
@@ -219,20 +219,21 @@ class GeneratorFunctionSuite extends QueryTest with
itholic commented on code in PR #38015:
URL: https://github.com/apache/spark/pull/38015#discussion_r980954195
##
python/pyspark/pandas/indexes/base.py:
##
@@ -1907,6 +1908,9 @@ def append(self, other: "Index") -> "Index":
)
index_fields =
zhengruifeng commented on PR #37759:
URL: https://github.com/apache/spark/pull/37759#issuecomment-1259211420
@wankunde
in your UT, the variable `duplicateKeyNumber` is negative
```
scala> val duplicateKeyNumber = Integer.MAX_VALUE + 2
val duplicateKeyNumber: Int =
HyukjinKwon commented on code in PR #38018:
URL: https://github.com/apache/spark/pull/38018#discussion_r980990148
##
python/pyspark/pandas/frame.py:
##
@@ -5266,12 +5266,12 @@ def to_orc(
**options: "OptionalPrimitiveType",
) -> None:
"""
-Write
HyukjinKwon commented on code in PR #38018:
URL: https://github.com/apache/spark/pull/38018#discussion_r980990713
##
python/pyspark/pandas/frame.py:
##
@@ -5317,6 +5317,12 @@ def to_orc(
... '%s/to_orc/foo.orc' % path,
... mode = 'overwrite',
HeartSaVioR commented on PR #38013:
URL: https://github.com/apache/spark/pull/38013#issuecomment-1259259980
One tip, unlike Scala/Java code, we can leverage `dev/reformat-python` to
reformat python code automatically.
--
This is an automated message from the Apache Git Service.
To
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259032970
@caican00 Yes, it was also clear before that when the collection size is
greater than 500, there will be no significant performance improvement.
In fact, according to the test
dongjoon-hyun commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1259077129
Thank you, @cloud-fan , @viirya , @huaxingao . Yes, as Wenchen shared, this
is really Spark-specific syntax now. Let me add that to PR description.
```
hive> SELECT version();
HeartSaVioR commented on code in PR #38013:
URL: https://github.com/apache/spark/pull/38013#discussion_r980856329
##
examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py:
##
@@ -0,0 +1,114 @@
+#
+# Licensed to the Apache Software Foundation
zhengruifeng commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980923342
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1259178940
Thanks! Merging to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1259178740
https://github.com/HeartSaVioR/spark/runs/8566461025
Looks like GA build for checking the result couldn't pull the result from
forked repo. Maybe due to concurrent runs?
itholic commented on code in PR #38015:
URL: https://github.com/apache/spark/pull/38015#discussion_r980953075
##
python/pyspark/pandas/indexes/base.py:
##
@@ -1907,6 +1908,9 @@ def append(self, other: "Index") -> "Index":
)
index_fields =
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980969164
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -535,6 +548,98 @@ class DatasetUnpivotSuite extends QueryTest
"val"),
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980968668
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -535,6 +548,98 @@ class DatasetUnpivotSuite extends QueryTest
"val"),
caican00 commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259021953
I tested it using a real job and the bottleneck seems to be still in
`MapBuilder.$plus$eq`?
And i have used a `for loop manually` for testing but still no significant
improvement.
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259043105
> > the collection size is greater than 500
>
> `the collection size is greater than 500`, is it the number of elements in
a collection?
Yes
--
This is an automated
LuciferYang commented on PR #37876:
URL: https://github.com/apache/spark/pull/37876#issuecomment-1259063043
> @caican00 I'm not sure whether it would be better change o use `toJavaMap`
or `toJavaMap.asScala` here? Can you help test it?
Hmm... Could you try this one?
--
HeartSaVioR commented on PR #38013:
URL: https://github.com/apache/spark/pull/38013#issuecomment-1259090417
Thanks for the contribution @chaoqin-li1123 ! Looks like python linter is
complaining - could you please look into this?
zhengruifeng commented on PR #38010:
URL: https://github.com/apache/spark/pull/38010#issuecomment-1259152469
Merged into master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
zhengruifeng closed pull request #38010: [MINOR] Clarify that xxhash64 seed is
42
URL: https://github.com/apache/spark/pull/38010
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
huleilei commented on code in PR #38007:
URL: https://github.com/apache/spark/pull/38007#discussion_r980939477
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowIndexExec.scala:
##
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980960360
##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -2118,6 +2127,16 @@ class Dataset[T] private[sql](
valueColumnName: String): DataFrame =
LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259031253
@srowen From the above test results, there is no significant performance
difference between using global and local singletons.
From a code perspective, thread safety should not
cloud-fan commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r980860055
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##
@@ -402,7 +405,28 @@ object RewriteDistinctAggregates
HyukjinKwon commented on PR #38014:
URL: https://github.com/apache/spark/pull/38014#issuecomment-1259131023
This one actually I don't feel strongly. We have a bunch of stats. e.g.,
Maven stats too. cc @srowen FYI
--
This is an automated message from the Apache Git Service.
To respond to
viirya commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1259140729
Thank you @dongjoon-hyun.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
itholic commented on code in PR #38018:
URL: https://github.com/apache/spark/pull/38018#discussion_r980915746
##
python/pyspark/pandas/frame.py:
##
@@ -5266,12 +5266,12 @@ def to_orc(
**options: "OptionalPrimitiveType",
) -> None:
"""
-Write the
zhengruifeng commented on PR #38007:
URL: https://github.com/apache/spark/pull/38007#issuecomment-1259154257
@huleilei mind completing the PR description?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980941315
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +869,50 @@ class Analyzer(override val catalogManager:
zhengruifeng commented on code in PR #37761:
URL: https://github.com/apache/spark/pull/37761#discussion_r980958358
##
python/pyspark/sql/dataframe.py:
##
@@ -4430,6 +4430,50 @@ def withColumnRenamed(self, existing: str, new: str) ->
"DataFrame":
"""
return
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980958517
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1098,6 +1106,87 @@ class AstBuilder extends
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r980965661
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -137,48 +138,49 @@ class DatasetUnpivotSuite extends QueryTest
zhengruifeng commented on PR #37995:
URL: https://github.com/apache/spark/pull/37995#issuecomment-1259197076
cc @HyukjinKwon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r981173804
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -535,6 +548,98 @@ class DatasetUnpivotSuite extends QueryTest
"val"),
HyukjinKwon commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r981253712
##
assembly/pom.xml:
##
@@ -74,6 +74,11 @@
spark-repl_${scala.binary.version}
${project.version}
+
+ org.apache.spark
+
HyukjinKwon commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1259519275
There's an outstanding comment:
https://github.com/apache/spark/pull/37710#discussion_r978291019. I am working
on this.
--
This is an automated message from the Apache Git
bjornjorgensen commented on PR #38018:
URL: https://github.com/apache/spark/pull/38018#issuecomment-1259301993
This is not the same.
`pandas API on Spark`
or
`pandas-on-Spark`
Which one do we use?
--
This is an automated message from the Apache Git Service.
To respond to
pan3793 commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r981107986
##
assembly/pom.xml:
##
@@ -74,6 +74,11 @@
spark-repl_${scala.binary.version}
${project.version}
+
+ org.apache.spark
+
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r981171196
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetUnpivotSuite.scala:
##
@@ -535,6 +548,98 @@ class DatasetUnpivotSuite extends QueryTest
"val"),
EnricoMi commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r981213982
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -869,26 +873,55 @@ class Analyzer(override val catalogManager:
EnricoMi opened a new pull request, #38020:
URL: https://github.com/apache/spark/pull/38020
### What changes were proposed in this pull request?
As discussed in
https://github.com/apache/spark/pull/37407#discussion_r977818035, method
`pyspark.sql.DataFrame.unpivot` should support only
cloud-fan commented on code in PR #37407:
URL: https://github.com/apache/spark/pull/37407#discussion_r981213360
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -1098,6 +1106,87 @@ class AstBuilder extends
LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259478461
https://github.com/FasterXML/jackson-core/issues/349#issuecomment-280794659
LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1259522776
> I wonder if we can reuse ObjectMapper inside classes where it matters for
perf and not try to share one instance so widely.
According to this principle, it is enough to keep
LuciferYang commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r981102013
##
assembly/pom.xml:
##
@@ -74,6 +74,11 @@
spark-repl_${scala.binary.version}
${project.version}
+
+ org.apache.spark
+
LuciferYang commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r981102013
##
assembly/pom.xml:
##
@@ -74,6 +74,11 @@
spark-repl_${scala.binary.version}
${project.version}
+
+ org.apache.spark
+
LuciferYang commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r981102013
##
assembly/pom.xml:
##
@@ -74,6 +74,11 @@
spark-repl_${scala.binary.version}
${project.version}
+
+ org.apache.spark
+
1 - 100 of 252 matches
Mail list logo