Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r202093665
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -225,12 +316,44 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r202093508
--- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
@@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 1
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r202090024
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -248,29 +371,29 @@ private[parquet] class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21118
@cloud-fan, I'd like to get this PR in by 2.4.0. Now that the change to
push predicates and projections happens when converting to the physical plan,
this can go in. I've rebased this on master
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201763831
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -248,29 +371,29 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201763489
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -37,41 +39,64 @@ import
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201761849
--- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
@@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Select 1
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
I don't think we need (or want) `SaveMode` passed to writers after
standardizing. Uses of `WriteSupport` will always append data to an existing
table, which makes it simpler for writers
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201756667
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -37,41 +39,64 @@ import
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201755545
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -225,12 +316,44 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201755353
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -225,12 +316,44 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201754882
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -225,12 +316,44 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r201754998
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -202,6 +283,16 @@ private[parquet] class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
@cloud-fan, yes. There is an open PR, #21308, that adds `DeleteSupport`.
I'm not pushing for that just yet because I think `DeleteSupport` should be
applied to `Table` after #21306 makes
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r201427780
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -241,39 +240,47 @@ final class DataFrameWriter[T] private[sql](ds
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r201399506
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
@cloud-fan, I think we can ignore that last test failure because tests are
passing on the last commit that made real changes. The latest commit only
changed a comment
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
@cloud-fan, I've updated this and the tests are passing, so I think it is
ready for another look.
I just pushed a comments-only commit to fix the Javadoc for AppendData that
@viirya pointed
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200825129
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -240,21 +238,27 @@ final class DataFrameWriter[T] private[sql](ds
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200824906
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/WriteSupport.java ---
@@ -38,15 +38,16 @@
* If this method fails (by throwing
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200824639
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2120,6 +2122,99 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200824599
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2120,6 +2122,99 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200824602
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2120,6 +2122,99 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200824532
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2120,6 +2122,99 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200824504
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java
---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
@cloud-fan, I've updated this with the requested changes. Thanks for
looking at it!
---
-
To unsubscribe, e-mail: reviews
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200711421
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -40,17 +44,24 @@ case class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200710273
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java
---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200709816
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java
---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200428354
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -172,6 +173,7 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200423733
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -40,17 +44,24 @@ case class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200423206
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -2120,6 +2122,99 @@ class Analyzer
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21305#discussion_r200421789
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
---
@@ -203,33 +203,33 @@ class DataSourceV2Suite extends
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r200419939
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -82,6 +120,30 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200418152
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableCatalog.java
---
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21305
@cloud-fan, can you also review this PR for DataSourceV2?
This adds the first of the logical plans proposed in [SPIP: Standardize
Logical
Plans](https://docs.google.com/document/d
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r200181894
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -82,6 +120,30 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r200181749
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -82,6 +120,30 @@ private[parquet] class
Github user rdblue closed the pull request at:
https://github.com/apache/spark/pull/21306
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user rdblue reopened a pull request:
https://github.com/apache/spark/pull/21306
[SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog support.
## What changes were proposed in this pull request?
This adds a mix-in to `DataSourceV2` that allows implementations
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21306
@cloud-fan, I've updated this to address your comments. Thanks for the
reviews!
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200178463
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java
---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200177371
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java
---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200174526
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java
---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200173138
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200171696
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java
---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200171424
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200170560
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java
---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200170480
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21306#discussion_r200170491
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21696
Thanks, @wangyum! I think this is refactor was a good idea.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21696#discussion_r199980993
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -19,166 +19,186 @@ package
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21696#discussion_r199980897
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -19,166 +19,186 @@ package
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21696#discussion_r199980632
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -19,166 +19,186 @@ package
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21696#discussion_r199979463
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -379,14 +366,29 @@ class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21682
+1
I agree with some of the minor refactoring suggestions, but overall this
looks correct to me.
---
-
To unsubscribe
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21682#discussion_r199977784
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -69,6 +77,14 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21682#discussion_r199977420
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -42,6 +42,14 @@ private[parquet] class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21306
@cloud-fan, thanks for the thorough feedback!
> What catalog operations we want to forward to the data source catalog?
Currently it's create/drop/alter table, I think it's good eno
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r198907669
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -359,6 +369,70 @@ class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r198906232
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -359,6 +369,70 @@ class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r198904779
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -62,6 +98,30 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r198904504
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -62,6 +98,30 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21556#discussion_r198904089
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -378,6 +378,22 @@ object SQLConf {
.booleanConf
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21623
Overall, I think this is close. The tests need to cover the row group stats
case and we should update how configuration is passed to the filters. Thanks
for working on this, @wangyum
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198553569
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -22,16 +22,23 @@ import java.sql.Date
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198551889
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
---
@@ -660,6 +661,56 @@ class
Github user rdblue closed the pull request at:
https://github.com/apache/spark/pull/21262
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21306
@cloud-fan, what needs to change to get this in? I'd like to start making
more PRs based on these changes.
---
-
To unsubscribe
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198244664
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -270,6 +277,29 @@ private[parquet] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21623#discussion_r198230713
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
---
@@ -270,6 +277,29 @@ private[parquet] class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21615
+1
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197552309
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -125,11 +124,11 @@ object
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21606
+1
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197547079
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197543585
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21577
Thanks for fixing this, @vanzin!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197542830
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -125,11 +124,11 @@ object
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197542704
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -104,12 +104,12 @@ object SparkHadoopWriter extends Logging
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197542014
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
---
@@ -42,15 +42,12 @@
*Usually
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197541490
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -125,11 +124,11 @@ object
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21606#discussion_r197540970
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala ---
@@ -76,13 +76,17 @@ object SparkHadoopWriter extends Logging
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
@vanzin, thanks for working on this. I was out most of this week at a
conference and I'm still on just half time, which is why I was delayed. Sorry
to leave you all waiting. I'll make comments
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
Yes, I just checked and speculative attempts do get a different TID. Just
turn on speculation, run a large stage, and sort tasks in a stage by TID. There
aren't duplicates
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21574
+1 (non-binding) assuming that tests pass.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21574#discussion_r196223209
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -106,7 +106,7 @@ case class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21574#discussion_r196222500
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -23,17 +23,24 @@ import
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
@vanzin, the ID that this uses is the TID, which I thought was always
unique. It appears to be a one-up counter. Also, I noted on your PR that both
are needed because even if we only commit one
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21577#discussion_r196217742
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -399,7 +399,8 @@ private[spark] object JsonProtocol {
("Full
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21577
+1. This fixes the commit coordinator problem where two separate tasks can
be authorized. That case could lead to duplicate data (if, for example, both
tasks generated unique file names using
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21577#discussion_r196215961
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -399,7 +399,8 @@ private[spark] object JsonProtocol {
("Full
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21577#discussion_r196214944
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -131,16 +139,17 @@ private[spark] class
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21577#discussion_r196214788
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala ---
@@ -131,16 +139,17 @@ private[spark] class
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
I think the right thing to do for this commit is to use the task ID instead
of the stage-local attempt number. I've updated the PR with the change so I
think this is ready to commit. @vanzin
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21558
@tgravescs, that's exactly what we're seeing. I think it might just be
misleading to have a stage-local attempt ID although it is more friendly for
users and matches what MR does
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21574#discussion_r196192774
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -23,17 +23,24 @@ import
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21503
Thank you for reviewing this, @cloud-fan!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/21574#discussion_r196173414
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
---
@@ -23,17 +23,24 @@ import
401 - 500 of 1330 matches
Mail list logo