bozhang2820 opened a new pull request, #37931:
URL: https://github.com/apache/spark/pull/37931
### What changes were proposed in this pull request?
Exceptions thrown in `FileFormatWriter.write` are wrapped with
`SparkException("Job aborted.")`, which provides little extra
otterc commented on code in PR #37533:
URL: https://github.com/apache/spark/pull/37533#discussion_r974396707
##
core/src/main/scala/org/apache/spark/internal/config/package.scala:
##
@@ -2309,7 +2309,18 @@ package object config {
" shuffle is enabled.")
zhengruifeng opened a new pull request, #37929:
URL: https://github.com/apache/spark/pull/37929
### What changes were proposed in this pull request?
1. extract the computation of `DataFrame.corr` into `correlation.py`, so it
can be reused in `DataFrame.corrwith`/`Groupby.corr`/etc;
ryan-johnson-databricks commented on code in PR #37879:
URL: https://github.com/apache/spark/pull/37879#discussion_r974198718
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala:
##
@@ -28,8 +28,14 @@ class ResolveCatalogs(val
cloud-fan commented on PR #37679:
URL: https://github.com/apache/spark/pull/37679#issuecomment-1250982553
SGTM!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
tgravescs commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r974308904
##
connect/src/main/scala/org/apache/spark/sql/sparkconnect/command/SparkConnectCommandPlanner.scala:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
HyukjinKwon closed pull request #37920: [SPARK-40413][SQL] Fix `Column.isin`
return null
URL: https://github.com/apache/spark/pull/37920
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
itholic commented on code in PR #37873:
URL: https://github.com/apache/spark/pull/37873#discussion_r974111272
##
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:
##
@@ -113,10 +113,10 @@ import org.apache.spark.util.Utils
* - Scala UDF test case with a
WeichenXu123 commented on code in PR #37918:
URL: https://github.com/apache/spark/pull/37918#discussion_r974197580
##
mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala:
##
@@ -496,18 +499,23 @@ class ALSModel private[ml] (
.iterator.map { j =>
srowen commented on PR #37743:
URL: https://github.com/apache/spark/pull/37743#issuecomment-1250963427
Closed in favor of https://github.com/apache/spark/pull/37743
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
srowen closed pull request #37743: [SPARK-40294][SQL] Fix repeat calls to
`PartitionReader.hasNext` timing out
URL: https://github.com/apache/spark/pull/37743
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
cloud-fan commented on PR #37900:
URL: https://github.com/apache/spark/pull/37900#issuecomment-1250979410
Since https://github.com/apache/spark/pull/37743 is inactive, I'll merge
this PR but assign the JIRA ticket to that PR author to share credits.
--
This is an automated message from
cloud-fan commented on PR #37900:
URL: https://github.com/apache/spark/pull/37900#issuecomment-1250979859
thanks for review, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
cloud-fan closed pull request #37900: [SPARK-40456][SQL]
PartitionIterator.hasNext should be cheap to call repeatedly
URL: https://github.com/apache/spark/pull/37900
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
MaxGekk commented on PR #37916:
URL: https://github.com/apache/spark/pull/37916#issuecomment-1251016904
> Can we update core/src/main/resources/error/README.md to mention this
special error class naming prefix?
Let me do that in the next PR.
--
This is an automated message
HyukjinKwon commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1251149875
For clarification, I am fine with reverting the whole component if the plan
isn't followed in the future.
--
This is an automated message from the Apache Git Service.
To respond to
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974072051
##
python/pyspark/sql/pandas/_typing/__init__.pyi:
##
@@ -256,6 +258,10 @@ PandasGroupedMapFunction = Union[
Callable[[Any, DataFrameLike], DataFrameLike],
]
roczei commented on PR #37679:
URL: https://github.com/apache/spark/pull/37679#issuecomment-1250958346
> Is this a common behavior in other databases?
@cloud-fan Good question. The reason that we cannot delete the user
specified default database because we have the following if
cloud-fan commented on PR #37916:
URL: https://github.com/apache/spark/pull/37916#issuecomment-1250975885
Can we update `core/src/main/resources/error/README.md` to mention this
special error class naming prefix?
--
This is an automated message from the Apache Git Service.
To respond to
WeichenXu123 commented on PR #37855:
URL: https://github.com/apache/spark/pull/37855#issuecomment-1250998854
@wbo4958
Issue: The xgboost code uses rdd barrier mode, but barrier mode does not
work with `coalesce` operator.
--
This is an automated message from the Apache Git
ming95 commented on PR #37920:
URL: https://github.com/apache/spark/pull/37920#issuecomment-1250998408
> `null` comparison should return `null` which I believe is the standard
behaviour from ANSI.
I tested it in hive and mysql respectively, and it does return null. The pr
will
HyukjinKwon commented on PR #37873:
URL: https://github.com/apache/spark/pull/37873#issuecomment-1250864061
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
HyukjinKwon closed pull request #37873: [SPARK-40419][SQL][TESTS] Integrate
Grouped Aggregate Pandas UDFs into *.sql test cases
URL: https://github.com/apache/spark/pull/37873
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
WeichenXu123 commented on code in PR #37918:
URL: https://github.com/apache/spark/pull/37918#discussion_r974197580
##
mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala:
##
@@ -496,18 +499,23 @@ class ALSModel private[ml] (
.iterator.map { j =>
cloud-fan commented on code in PR #37879:
URL: https://github.com/apache/spark/pull/37879#discussion_r974211151
##
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala:
##
@@ -685,15 +685,15 @@ class DDLParserSuite extends AnalysisTest {
cloud-fan commented on code in PR #37879:
URL: https://github.com/apache/spark/pull/37879#discussion_r974211738
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala:
##
@@ -28,8 +28,14 @@ class ResolveCatalogs(val catalogManager:
clementguillot commented on PR #33154:
URL: https://github.com/apache/spark/pull/33154#issuecomment-1251015439
Hello @sunpe, thank you for your very fast answer.
Please let me give you some more context, I am using Spark v3.3.0 in K8s
using [Spark on K8S operator](
MaxGekk commented on code in PR #37916:
URL: https://github.com/apache/spark/pull/37916#discussion_r974242158
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -2360,7 +2360,10 @@ class AstBuilder extends
MaxGekk commented on code in PR #37916:
URL: https://github.com/apache/spark/pull/37916#discussion_r974242158
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -2360,7 +2360,10 @@ class AstBuilder extends
HyukjinKwon commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1251115196
Thanks for your feedback. Yes, it's pretty much decoupled, and I believe
this doesn't affect anything to other components. Sure, I will leave it out for
more days.
--
This is an
WeichenXu123 commented on code in PR #37918:
URL: https://github.com/apache/spark/pull/37918#discussion_r974195241
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala:
##
@@ -194,3 +194,44 @@ case class CollectSet(
override
cloud-fan commented on code in PR #37916:
URL: https://github.com/apache/spark/pull/37916#discussion_r974207860
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala:
##
@@ -2360,7 +2360,10 @@ class AstBuilder extends
tgravescs commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1251105175
I would be ok with merging a minimal working version as long as it doesn't
impact many other components and destabilize the builds and other developers
activities. If it doesn't fit
thomasg19930417 commented on PR #34464:
URL: https://github.com/apache/spark/pull/34464#issuecomment-1250835784
@ekoifman hi, when LOJ and LHS has many empty partition ,why inner join
not demote broadcast
else if (manyEmptyInOther && canBroadcastPlan) {
xclyfe opened a new pull request, #37930:
URL: https://github.com/apache/spark/pull/37930
### What changes were proposed in this pull request?
Currently, the defaultJoin method in BroadcastNestedLoopJoinExec collects
notMatchedBroadcastRows firstly, then collects matchedStreamRows.
HyukjinKwon commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1251143797
There is a testing plan ([Spark Connect API Testing
Plan](https://docs.google.com/document/d/1n6EgS5vcmbwJUs5KGX4PzjKZVcSKd0qf0gLNZ6NFvOE/edit?usp=sharing))
that I and @amaliujia
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974072051
##
python/pyspark/sql/pandas/_typing/__init__.pyi:
##
@@ -256,6 +258,10 @@ PandasGroupedMapFunction = Union[
Callable[[Any, DataFrameLike], DataFrameLike],
]
HyukjinKwon commented on code in PR #37873:
URL: https://github.com/apache/spark/pull/37873#discussion_r974107717
##
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala:
##
@@ -113,10 +113,10 @@ import org.apache.spark.util.Utils
* - Scala UDF test case with
dongjoon-hyun commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974486551
##
docs/configuration.md:
##
@@ -2605,6 +2605,15 @@ Apart from these, the following properties are also
available, and may be useful
2.2.0
+
+
mridulm commented on PR #37922:
URL: https://github.com/apache/spark/pull/37922#issuecomment-1251349810
> The push-based shuffle service will auto clean up the old shuffle merge
data
Consider the case I mentioned above - stage retry for an `INDETERMINATE`
stage.
We cleanup
kazuyukitanimura commented on PR #37934:
URL: https://github.com/apache/spark/pull/37934#issuecomment-1251489126
cc @sunchao @viirya @flyrain
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
xinrong-meng commented on PR #37908:
URL: https://github.com/apache/spark/pull/37908#issuecomment-1251489189
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
leewyang commented on code in PR #37734:
URL: https://github.com/apache/spark/pull/37734#discussion_r974626628
##
python/pyspark/ml/functions.py:
##
@@ -106,6 +111,167 @@ def array_to_vector(col: Column) -> Column:
return
mridulm commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974600104
##
docs/configuration.md:
##
@@ -2605,6 +2605,15 @@ Apart from these, the following properties are also
available, and may be useful
2.2.0
+
+
AmplabJenkins commented on PR #37924:
URL: https://github.com/apache/spark/pull/37924#issuecomment-1251551174
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
AmplabJenkins commented on PR #37923:
URL: https://github.com/apache/spark/pull/37923#issuecomment-1251551212
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
AmplabJenkins commented on PR #37922:
URL: https://github.com/apache/spark/pull/37922#issuecomment-1251551259
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
dongjoon-hyun commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r974424083
##
connect/src/main/scala/org/apache/spark/sql/sparkconnect/planner/SparkConnectPlanner.scala:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software
amaliujia commented on PR #37710:
URL: https://github.com/apache/spark/pull/37710#issuecomment-1251291583
@tgravescs I will follow up on the testing plan doc to address your
comments. Please feel free to bring up anything in the doc or here.
--
This is an automated message from the
Yaohua628 commented on PR #37932:
URL: https://github.com/apache/spark/pull/37932#issuecomment-1251310801
cc @HeartSaVioR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
Yaohua628 opened a new pull request, #37932:
URL: https://github.com/apache/spark/pull/37932
### What changes were proposed in this pull request?
Cherry-picked from #37905
Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when
selecting `_metadata`
alex-balikov commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974517188
##
python/pyspark/worker.py:
##
@@ -361,6 +429,32 @@ def read_udfs(pickleSer, infile, eval_type):
if eval_type ==
dtenedor commented on code in PR #37840:
URL: https://github.com/apache/spark/pull/37840#discussion_r974575955
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##
@@ -730,6 +729,13 @@ trait CheckAnalysis extends PredicateHelper with
grundprinzip commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r974602945
##
connect/src/main/scala/org/apache/spark/sql/sparkconnect/planner/SparkConnectPlanner.scala:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software
xkrogen commented on PR #37634:
URL: https://github.com/apache/spark/pull/37634#issuecomment-1251319065
Thanks for the suggestion @cloud-fan ! Good point about there many places
where Spark trusts nullability. Here I am trying to target places where _user
code_ could introduce a null. This
xkrogen commented on code in PR #37634:
URL: https://github.com/apache/spark/pull/37634#discussion_r974499166
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala:
##
@@ -252,28 +267,44 @@ object
pralabhkumar commented on PR #37417:
URL: https://github.com/apache/spark/pull/37417#issuecomment-1251334364
@dongjoon-hyun , Have incorporated all the review comments , please look
into the same.
--
This is an automated message from the Apache Git Service.
To respond to the message,
ayudovin commented on code in PR #37923:
URL: https://github.com/apache/spark/pull/37923#discussion_r974514017
##
python/pyspark/pandas/groupby.py:
##
@@ -993,6 +993,98 @@ def nth(self, n: int) -> FrameLike:
return self._prepare_return(DataFrame(internal))
+def
huanliwang-db commented on code in PR #37917:
URL: https://github.com/apache/spark/pull/37917#discussion_r974439525
##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -1599,9 +1599,18 @@ private[sql] object QueryExecutionErrors extends
huanliwang-db commented on code in PR #37917:
URL: https://github.com/apache/spark/pull/37917#discussion_r974439760
##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -1599,9 +1599,18 @@ private[sql] object QueryExecutionErrors extends
dongjoon-hyun commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974479715
##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -230,6 +230,9 @@ private[spark] class DAGScheduler(
Yaohua628 commented on PR #37905:
URL: https://github.com/apache/spark/pull/37905#issuecomment-1251311583
> There's conflict in branch-3.3. @Yaohua628 Could you please craft a PR for
branch-3.3? Thanks in advance!
Done! https://github.com/apache/spark/pull/37932 - Thank you
--
AmplabJenkins commented on PR #37930:
URL: https://github.com/apache/spark/pull/37930#issuecomment-1251324416
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
shrprasa commented on PR #37880:
URL: https://github.com/apache/spark/pull/37880#issuecomment-1251427562
@gaborgsomogyi @dongjoon-hyun @HyukjinKwon Can you please review this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
AmplabJenkins commented on PR #37928:
URL: https://github.com/apache/spark/pull/37928#issuecomment-1251440593
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
grundprinzip commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r974599749
##
connect/src/main/scala/org/apache/spark/sql/sparkconnect/command/SparkConnectCommandPlanner.scala:
##
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
xinrong-meng commented on PR #37888:
URL: https://github.com/apache/spark/pull/37888#issuecomment-1251487641
Thank you @HyukjinKwon @zhengruifeng @Yikun for taking care of the merging!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
leewyang commented on code in PR #37734:
URL: https://github.com/apache/spark/pull/37734#discussion_r974649030
##
python/pyspark/ml/functions.py:
##
@@ -106,6 +111,167 @@ def array_to_vector(col: Column) -> Column:
return
alex-balikov commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974680745
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ApplyInPandasWithStatePythonRunner.scala:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache
wbo4958 commented on PR #37855:
URL: https://github.com/apache/spark/pull/37855#issuecomment-1251592287
>
> @wbo4958
>
> Issue: The xgboost code uses rdd barrier mode, but barrier mode does not
work with `coalesce` operator.
@mridulm just suggested using
chaoqin-li1123 opened a new pull request, #37935:
URL: https://github.com/apache/spark/pull/37935
### What changes were proposed in this pull request?
Before unload of a StateStore, perform a cleanup.
### Why are the changes needed?
Current the maintenance of
MaxGekk commented on PR #37921:
URL: https://github.com/apache/spark/pull/37921#issuecomment-1251528429
@srielau @anchovYu Could you take a look at the PR, please.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
viirya closed pull request #37926: [SPARK-40484][BUILD] Upgrade log4j2 to 2.19.0
URL: https://github.com/apache/spark/pull/37926
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
viirya commented on PR #37926:
URL: https://github.com/apache/spark/pull/37926#issuecomment-1251278975
Thanks. Merging to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
roczei commented on PR #37679:
URL: https://github.com/apache/spark/pull/37679#issuecomment-1251364841
Thanks @cloud-fan, I have implemented this and all tests passed. As I see we
have resolved all of your feedbacks.
--
This is an automated message from the Apache Git Service.
To respond
alex-balikov commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974672806
##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2705,6 +2705,44 @@ object SQLConf {
.booleanConf
dongjoon-hyun closed pull request #37424: [SPARK-39991][SQL][AQE] Use available
column statistics from completed query stages
URL: https://github.com/apache/spark/pull/37424
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
dongjoon-hyun commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974483075
##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1860,8 +1863,17 @@ private[spark] class DAGScheduler(
s"(attempt
dongjoon-hyun commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974490653
##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -1860,8 +1863,17 @@ private[spark] class DAGScheduler(
s"(attempt
dongjoon-hyun commented on code in PR #37924:
URL: https://github.com/apache/spark/pull/37924#discussion_r974491025
##
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala:
##
@@ -2159,6 +2171,16 @@ private[spark] class DAGScheduler(
}
}
+ private def
xiaonanyang-db opened a new pull request, #37933:
URL: https://github.com/apache/spark/pull/37933
### What changes were proposed in this pull request?
Adjust part of changes in https://github.com/apache/spark/pull/36871.
In the pr above, we introduced the support of date
grundprinzip commented on code in PR #37710:
URL: https://github.com/apache/spark/pull/37710#discussion_r974611991
##
connect/src/main/scala/org/apache/spark/sql/sparkconnect/planner/SparkConnectPlanner.scala:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software
kazuyukitanimura opened a new pull request, #37934:
URL: https://github.com/apache/spark/pull/37934
### What changes were proposed in this pull request?
This PR proposes to support `NullType` in `ColumnarBatchRow`.
### Why are the changes needed?
`ColumnarBatchRow.get()`
xinrong-meng commented on PR #37912:
URL: https://github.com/apache/spark/pull/37912#issuecomment-1251488032
Thank you!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
leewyang commented on code in PR #37734:
URL: https://github.com/apache/spark/pull/37734#discussion_r974646917
##
python/pyspark/ml/functions.py:
##
@@ -106,6 +111,167 @@ def array_to_vector(col: Column) -> Column:
return
kazuyukitanimura commented on code in PR #37934:
URL: https://github.com/apache/spark/pull/37934#discussion_r974772538
##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala:
##
@@ -1461,10 +1462,7 @@ class ParquetIOSuite extends
kazuyukitanimura commented on code in PR #37934:
URL: https://github.com/apache/spark/pull/37934#discussion_r974775400
##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala:
##
@@ -1461,10 +1462,7 @@ class ParquetIOSuite extends
zhengruifeng commented on code in PR #37923:
URL: https://github.com/apache/spark/pull/37923#discussion_r974780246
##
python/pyspark/pandas/groupby.py:
##
@@ -993,6 +994,101 @@ def nth(self, n: int) -> FrameLike:
return self._prepare_return(DataFrame(internal))
+
LuciferYang commented on code in PR #37938:
URL: https://github.com/apache/spark/pull/37938#discussion_r974854247
##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:
##
@@ -237,6 +241,10 @@ protected void serviceInit(Configuration
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974854298
##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2705,6 +2705,44 @@ object SQLConf {
.booleanConf
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974858058
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ApplyInPandasWithStatePythonRunner.scala:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache
HyukjinKwon opened a new pull request, #37939:
URL: https://github.com/apache/spark/pull/37939
### What changes were proposed in this pull request?
This PR proposes to document datetime.timedelta support in PySpark in SQL
DataType reference page. This support was added in SPARK-37275
xiaonanyang-db commented on PR #37933:
URL: https://github.com/apache/spark/pull/37933#issuecomment-1251869330
> Can you update the description to list all of the semantics of the change?
You can remove the point where we need to merge them to TimestampType if this
is not what the PR
WweiL opened a new pull request, #37936:
URL: https://github.com/apache/spark/pull/37936
## What changes were proposed in this pull request?
Add complex tests to `StreamingSessionWindowSuite`. Concretely, I created
two helper functions,
- one is called
HeartSaVioR closed pull request #37917: [SPARK-40466][SS] Improve the error
message when DSv2 is disabled while DSv1 is not avaliable
URL: https://github.com/apache/spark/pull/37917
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974784396
##
python/pyspark/sql/pandas/serializers.py:
##
@@ -371,3 +375,292 @@ def load_stream(self, stream):
raise ValueError(
HeartSaVioR commented on code in PR #37893:
URL: https://github.com/apache/spark/pull/37893#discussion_r974803979
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ApplyInPandasWithStatePythonRunner.scala:
##
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache
Yikun commented on code in PR #37923:
URL: https://github.com/apache/spark/pull/37923#discussion_r974808692
##
python/pyspark/pandas/groupby.py:
##
@@ -993,6 +994,101 @@ def nth(self, n: int) -> FrameLike:
return self._prepare_return(DataFrame(internal))
+def
Yikun commented on PR #36087:
URL: https://github.com/apache/spark/pull/36087#issuecomment-1251756266
@dongjoon-hyun Could we backport this to branch-3.3, this will very help to
run branch-3.3 K8S in github action.
--
This is an automated message from the Apache Git Service.
To respond
LuciferYang commented on code in PR #37938:
URL: https://github.com/apache/spark/pull/37938#discussion_r974846876
##
common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:
##
@@ -237,6 +241,10 @@ protected void serviceInit(Configuration
1 - 100 of 173 matches
Mail list logo