Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
Yes, I was hoping to improve that eg using filename as offset or other non
consumer-owned approach, but that would be rather long term. Do you think it is
solvable
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres why it wouldnt make sense? According to the documentation all
SS sources have offsets, but not all sinks can also be SS sources e.g. ForEach
doesnt have offsets in general. So usually
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres I haven't thought about this. Let me investigate bit more.
Shall we return to this PR? Do you agree with extending WriterCommitMessage
and using in DataWritingSparkTask#run
GitHub user vackosar opened a pull request:
https://github.com/apache/spark/pull/21919
[SPARK-24933][SS] Report numOutputRows in SinkProgress via WrittenComâ¦
## What changes were proposed in this pull request?
SinkProgress should report similar properties like
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon, please
help with the review and merge
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres I removed use of commit to report the row count. Would you
have a look?
---
-
To unsubscribe, e-mail: reviews
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r207716505
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/StreamWriterCommitProgress.java
---
@@ -0,0 +1,31
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres thx for good point. The reason for placing this into
WriterCommitMessage is to set a standard information that should passed at the
commit time.
But I agree that row counting
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres @zsxwing I will exclude SinkProgress constructor from binary
compatibility check as this object is constructed internally by Spark. That
will remove current MiMa test failure
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r208902157
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -198,11 +198,14 @@ class SourceProgress protected[sql
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r208902692
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -213,6 +216,12 @@ class SinkProgress protected[sql
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r208901343
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -58,6 +61,7 @@ case class
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r208107058
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -46,6 +46,9 @@ case class
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres @cloud-fan do you have any other structure and functionality
suggestions for the PR now? Or can I focus on finalizing the work and getting
it merged
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@cloud-fan happy to merge?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@jose-torres are you happy to merge?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r210708107
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -254,3 +259,10 @@ class SinkProgress protected[sql
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r209621895
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -198,11 +198,14 @@ class SourceProgress protected[sql
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/21919#discussion_r207991171
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
---
@@ -179,3 +192,24 @@ class
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/22143#discussion_r211379697
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala
---
@@ -19,18 +19,23 @@ package
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/22143#discussion_r211379432
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
---
@@ -61,12 +63,30 @@ private[kafka010] class
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/22143
@arunmahadevan min and max are used there can be other writers to same
topic occurring in different job. The messages sent would then become
interleaved and one would have to return large number
Github user vackosar commented on a diff in the pull request:
https://github.com/apache/spark/pull/22143#discussion_r211381706
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala
---
@@ -116,3 +133,66 @@ class
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/22143
@arunmahadevan @jose-torres @cloud-fan you may interested in this one.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user vackosar opened a pull request:
https://github.com/apache/spark/pull/22143
[SPARK-24647][SS] Report KafkaStreamWriter's written min and max offsâ¦
â¦ets via CustomMetrics.
## What changes were proposed in this pull request?
Report KafkaStreamWriter's
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/22143
@cloud-fan are you ok merging the PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@cloud-fan apart from conflicts are you happy to merge?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@tdas, @gatorsmile and @cloud-fan, just resolved conflicts. Are you happy
to merge or any suggestions?
---
-
To unsubscribe, e
Github user vackosar commented on the issue:
https://github.com/apache/spark/pull/21919
@tdas, @gatorsmile and @cloud-fan, just resolved conflicts. Are you happy
to merge or any suggestions?
Please respond such that I can either merge or close this PR
29 matches
Mail list logo