aokolnychyi opened a new pull request, #36304:
URL: https://github.com/apache/spark/pull/36304
### What changes were proposed in this pull request?
This PR adds runtime group filtering for group-based row-level operations.
### Why are the changes needed?
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980610466
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
github-actions[bot] commented on PR #35608:
URL: https://github.com/apache/spark/pull/35608#issuecomment-1258818178
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
cloud-fan closed pull request #37679: [SPARK-35242][SQL] Support changing
session catalog's default database
URL: https://github.com/apache/spark/pull/37679
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
github-actions[bot] commented on PR #35734:
URL: https://github.com/apache/spark/pull/35734#issuecomment-1258818146
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #35638:
URL: https://github.com/apache/spark/pull/35638#issuecomment-1258818165
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #35748:
URL: https://github.com/apache/spark/pull/35748#issuecomment-1258818118
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #35744:
URL: https://github.com/apache/spark/pull/35744#issuecomment-1258818132
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #35594:
URL: https://github.com/apache/spark/pull/35594#issuecomment-1258818196
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
zhengruifeng commented on PR #37998:
URL: https://github.com/apache/spark/pull/37998#issuecomment-1258863348
Merged into master, thanks @HyukjinKwon for reivew
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r98058
##
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
self.assertTrue(q.isActive)
HyukjinKwon commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980669582
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
aokolnychyi opened a new pull request, #38004:
URL: https://github.com/apache/spark/pull/38004
### What changes were proposed in this pull request?
This PR adds DS v2 APIs for handling row-level operations for data sources
that support deltas of rows.
### Why are
amaliujia opened a new pull request, #38006:
URL: https://github.com/apache/spark/pull/38006
### What changes were proposed in this pull request?
Add `Connect` config and two connect gRPC config keys.
1. `spark.connect.grpc.debug.enabled` Boolean
2.
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615123
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
Kimahriman commented on code in PR #38003:
URL: https://github.com/apache/spark/pull/38003#discussion_r980631860
##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala:
##
@@ -140,6 +140,24 @@ class
attilapiros commented on code in PR #37990:
URL: https://github.com/apache/spark/pull/37990#discussion_r980633310
##
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManagerSuite.scala:
##
@@ -109,6 +116,12 @@ class
attilapiros commented on PR #37990:
URL: https://github.com/apache/spark/pull/37990#issuecomment-1258851049
> Although 6.1.1 is intrusive, this patch looks solid. Is there any other
reason why this is still WIP, @attilapiros ?
@dongjoon-hyun I just executed some manual tests
viirya commented on code in PR #38001:
URL: https://github.com/apache/spark/pull/38001#discussion_r980647185
##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -3574,6 +3574,15 @@ object SQLConf {
.booleanConf
HyukjinKwon closed pull request #37993: [SPARK-40557][CONNECT] Update generated
proto files for Spark Connect
URL: https://github.com/apache/spark/pull/37993
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
HyukjinKwon commented on PR #37993:
URL: https://github.com/apache/spark/pull/37993#issuecomment-1258864096
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
HyukjinKwon commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980668935
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
aokolnychyi commented on PR #36304:
URL: https://github.com/apache/spark/pull/36304#issuecomment-1258618345
I want to resume working on this PR but I need feedback on one point.
In the original implementation, @cloud-fan and I discussed supporting a
separate scan builder for runtime
mridulm commented on PR #37779:
URL: https://github.com/apache/spark/pull/37779#issuecomment-1258716473
Added a few debug statements, and it became clear what the issue is.
Essentially, since we are leveraging a `ThreadPoolExecutor`, it does not
result in killing the thread with the
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980614606
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980616391
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
zhengruifeng closed pull request #37998: [SPARK-40561][PS] Implement
`min_count` in `GroupBy.min`
URL: https://github.com/apache/spark/pull/37998
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HyukjinKwon commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980668686
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980480469
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980509911
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaBatchWrite.java:
##
@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980613106
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980613293
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
attilapiros commented on code in PR #37990:
URL: https://github.com/apache/spark/pull/37990#discussion_r980632958
##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala:
##
@@ -85,7 +85,7 @@
HeartSaVioR opened a new pull request, #38008:
URL: https://github.com/apache/spark/pull/38008
### What changes were proposed in this pull request?
This PR proposes a new test case for applyInPandasWithState to verify
fault-tolerance semantic is not broken despite of random python
HeartSaVioR commented on PR #37993:
URL: https://github.com/apache/spark/pull/37993#issuecomment-1258866543
post +1, thanks for updating this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
zhengruifeng commented on PR #37967:
URL: https://github.com/apache/spark/pull/37967#issuecomment-1258873185
so this is a totally new implementation of `SkipGram` W2V in `.mllib`
is it possible to improve existing w2v instead of implementing a new one?
what about implementing it in
HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980667435
##
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
self.assertTrue(q.isActive)
sigmod commented on PR #37996:
URL: https://github.com/apache/spark/pull/37996#issuecomment-1258522623
cc @andylam-db @maryannxue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
dongjoon-hyun commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1258524084
Thank you for your feedback, @thiyaga .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
AmplabJenkins commented on PR #37994:
URL: https://github.com/apache/spark/pull/37994#issuecomment-1258554960
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
AmplabJenkins commented on PR #37993:
URL: https://github.com/apache/spark/pull/37993#issuecomment-1258555111
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
aokolnychyi commented on PR #38004:
URL: https://github.com/apache/spark/pull/38004#issuecomment-1258647018
@cloud-fan @rdblue @huaxingao @dongjoon-hyun @sunchao @viirya, could you
take a look? This is the API from the design doc we discussed earlier.
I have also created PR #38005
bersprockets commented on PR #37825:
URL: https://github.com/apache/spark/pull/37825#issuecomment-1258849538
@beliefer
> Please reference `SimplifyBinaryComparison`.
Thanks, I will take a look. This is reference to the fall-through case,
where we discover there is really only
huleilei opened a new pull request, #38007:
URL: https://github.com/apache/spark/pull/38007
### What changes were proposed in this pull request?
I create an index for a table.I want to know what indexes are in the table.
But SHOW INDEX syntax is not supported. So I think the
attilapiros commented on PR #37990:
URL: https://github.com/apache/spark/pull/37990#issuecomment-1258859382
I would like to go through one more time to find all the places where we can
specify the namespace.
--
This is an automated message from the Apache Git Service.
To respond to the
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980511958
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWriteBuilder.java:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation
grundprinzip commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980606668
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or
amaliujia commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980607213
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
HyukjinKwon commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980666406
##
python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-0.txt:
##
@@ -0,0 +1,100 @@
+non
Review Comment:
Can we avoid
amaliujia commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980404430
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
AmplabJenkins commented on PR #37996:
URL: https://github.com/apache/spark/pull/37996#issuecomment-1258485849
Can one of the admins verify this patch?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980508846
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java:
##
@@ -45,4 +45,14 @@ public interface LogicalWriteInfo {
* the schema
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615123
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615483
##
connect/src/main/scala/org/apache/spark/sql/connect/package.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+
cloud-fan commented on code in PR #37994:
URL: https://github.com/apache/spark/pull/37994#discussion_r980615672
##
connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala:
##
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation
HyukjinKwon commented on code in PR #37989:
URL: https://github.com/apache/spark/pull/37989#discussion_r980652463
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4495,7 +4495,8 @@ class DAGSchedulerSuite extends SparkFunSuite with
amaliujia commented on PR #38006:
URL: https://github.com/apache/spark/pull/38006#issuecomment-1258739872
@HyukjinKwon @cloud-fan @grundprinzip
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
Kimahriman commented on PR #38003:
URL: https://github.com/apache/spark/pull/38003#issuecomment-1258752350
> Thank you for making a PR with the test coverage, @Kimahriman .
Previously, it fails, right?
Yeah these tests actually fail with an exception without the change
--
This is
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980508846
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java:
##
@@ -45,4 +45,14 @@ public interface LogicalWriteInfo {
* the schema
amaliujia commented on code in PR #38006:
URL: https://github.com/apache/spark/pull/38006#discussion_r980607213
##
core/src/main/scala/org/apache/spark/internal/config/Connect.scala:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
cloud-fan commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1258815644
> it's not in the SQL standard
Yea, but since we copied it from Hive, I think the result should match Hive
as well. Sorry I didn't realize there is a result change when doing the
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980626015
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java:
##
@@ -45,4 +45,18 @@ public interface LogicalWriteInfo {
* the schema
bersprockets commented on code in PR #37825:
URL: https://github.com/apache/spark/pull/37825#discussion_r980641159
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala:
##
@@ -218,9 +218,16 @@ object RewriteDistinctAggregates
HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980664896
##
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##
@@ -90,6 +107,99 @@ def check_results(batch_df, _):
self.assertTrue(q.isActive)
HeartSaVioR commented on PR #38008:
URL: https://github.com/apache/spark/pull/38008#issuecomment-1258874132
cc. @HyukjinKwon @alex-balikov
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
dongjoon-hyun commented on code in PR #38003:
URL: https://github.com/apache/spark/pull/38003#discussion_r980419475
##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala:
##
@@ -140,6 +140,24 @@ class
amaliujia commented on PR #37750:
URL: https://github.com/apache/spark/pull/37750#issuecomment-1258518169
Because Spark supports `SELECT distinct(col1, col2)` (and the return is a
struct of co1 and col2), which makes this error message proposal complicated.
Because now we cannot say
amaliujia closed pull request #37750: [SPARK-40296] Error class for DISTINCT
function not found
URL: https://github.com/apache/spark/pull/37750
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
thiyaga commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1258517989
We use grouping sets on our queries and rely on `grouping__id` to use as an
identifier to query the data for respective group. If we use `grouping__id`
directly, it will be prone to
dongjoon-hyun commented on code in PR #38003:
URL: https://github.com/apache/spark/pull/38003#discussion_r980419475
##
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitionsSuite.scala:
##
@@ -140,6 +140,24 @@ class
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980510778
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWrite.java:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
aokolnychyi commented on code in PR #38004:
URL: https://github.com/apache/spark/pull/38004#discussion_r980511079
##
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWrite.java:
##
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
aokolnychyi opened a new pull request, #38005:
URL: https://github.com/apache/spark/pull/38005
### What changes were proposed in this pull request?
This WIP PR shows how the API added in PR #38004 can be implemented.
### Why are the changes needed?
Thes
github-actions[bot] closed pull request #36030: Draft: [SPARK-38715]
Configurable client ID for Kafka Spark SQL producer
URL: https://github.com/apache/spark/pull/36030
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
github-actions[bot] closed pull request #36829: [SPARK-39438][SQL] Add a
threshold to not in line CTE
URL: https://github.com/apache/spark/pull/36829
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
github-actions[bot] closed pull request #36005: [SPARK-38506][SQL] Push partial
aggregation through join
URL: https://github.com/apache/spark/pull/36005
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
github-actions[bot] closed pull request #36046: [SPARK-38771][SQL] Adaptive
Bloom filter Join
URL: https://github.com/apache/spark/pull/36046
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
github-actions[bot] closed pull request #35799: [SPARK-38498][STREAM] Support
customized StreamingListener by configuration
URL: https://github.com/apache/spark/pull/35799
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
github-actions[bot] closed pull request #35858: [SPARK-38448] [YARN] [CORE]
Sending Available Resources in Yarn Cluster Information to Spark Driver
URL: https://github.com/apache/spark/pull/35858
--
This is an automated message from the Apache Git Service.
To respond to the message, please
github-actions[bot] closed pull request #35806: [SPARK-38505][SQL] Make partial
aggregation adaptive
URL: https://github.com/apache/spark/pull/35806
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
github-actions[bot] closed pull request #35763: [SPARK-38433][BUILD] change the
shell code style with shellcheck
URL: https://github.com/apache/spark/pull/35763
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
github-actions[bot] commented on PR #35751:
URL: https://github.com/apache/spark/pull/35751#issuecomment-1258818104
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] closed pull request #35845: [SPARK-38520][SQL] ANSI
interval overflow when reading CSV
URL: https://github.com/apache/spark/pull/35845
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
github-actions[bot] commented on PR #35569:
URL: https://github.com/apache/spark/pull/35569#issuecomment-1258818209
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] closed pull request #35808: [WIP][SPARK-38512] Rebased
traversal order from "pre-order" to "post-order" for `ResolveFunctions` Rule
URL: https://github.com/apache/spark/pull/35808
--
This is an automated message from the Apache Git Service.
To respond to the message,
cloud-fan commented on PR #37679:
URL: https://github.com/apache/spark/pull/37679#issuecomment-1258818080
thanks, meriging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
github-actions[bot] commented on PR #36889:
URL: https://github.com/apache/spark/pull/36889#issuecomment-1258818008
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
sadikovi commented on code in PR #35764:
URL: https://github.com/apache/spark/pull/35764#discussion_r980780872
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala:
##
@@ -111,6 +111,9 @@ class JDBCOptions(
// the number of partitions
mskapilks commented on PR #37996:
URL: https://github.com/apache/spark/pull/37996#issuecomment-1259015446
> 2. We can use `shuffle records written` instead of
`spark.sql.optimizer.runtime.bloomFilter.expectedNumItems` to build bloom
filter.
Good point. It would be better than current
sadikovi commented on code in PR #35764:
URL: https://github.com/apache/spark/pull/35764#discussion_r980779629
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala:
##
@@ -111,6 +111,9 @@ class JDBCOptions(
// the number of partitions
mridulm commented on PR #37779:
URL: https://github.com/apache/spark/pull/37779#issuecomment-1259007734
Thanks for the query @Ngone51 - I missed out one aspect of my analysis,
which ends up completely changing the solution - my bad :-(
The answer to your query has the reason for the
sadikovi commented on code in PR #35764:
URL: https://github.com/apache/spark/pull/35764#discussion_r980766168
##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala:
##
@@ -168,6 +177,71 @@ private[sql] object JDBCRelation extends Logging
LuciferYang commented on PR #37999:
URL: https://github.com/apache/spark/pull/37999#issuecomment-1258914643
In the serial r/w scenario, the benefits are obvious,
- Reading scenario: using singleton is 1800+% faster than creating
`ObjectMapper ` every time
- Write scenario: using a
zhengruifeng opened a new pull request, #38009:
URL: https://github.com/apache/spark/pull/38009
### What changes were proposed in this pull request?
Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept
arbitary integers
### Why are the changes needed?
for API
srowen commented on code in PR #38010:
URL: https://github.com/apache/spark/pull/38010#discussion_r980690447
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala:
##
@@ -643,7 +643,8 @@ object Murmur3HashFunction extends InterpretedHashFunction {
srowen opened a new pull request, #38010:
URL: https://github.com/apache/spark/pull/38010
### What changes were proposed in this pull request?
State that the hash seed used for xxhash64 is 42 in docs.
### Why are the changes needed?
It's somewhat non-standard not seed to
itholic opened a new pull request, #38012:
URL: https://github.com/apache/spark/pull/38012
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How
HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980722405
##
python/pyspark/sql/tests/test_pandas_grouped_map_with_state.py:
##
@@ -46,8 +55,27 @@
cast(str, pandas_requirement_message or pyarrow_requirement_message),
wangyum opened a new pull request, #38011:
URL: https://github.com/apache/spark/pull/38011
### What changes were proposed in this pull request?
This PR adds `PURGE` in `DROP TABLE` documentation.
Related documentation and code:
1. Hive `DROP TABLE` documentation:
HeartSaVioR commented on code in PR #38008:
URL: https://github.com/apache/spark/pull/38008#discussion_r980721102
##
python/test_support/sql/streaming/apply_in_pandas_with_state/random_failure/input/test-0.txt:
##
@@ -0,0 +1,100 @@
+non
Review Comment:
I just changed
1 - 100 of 208 matches
Mail list logo