[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 Yes, I was hoping to improve that eg using filename as offset or other non consumer-owned approach, but that would be rather long term. Do you think it is solvable

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres why it wouldnt make sense? According to the documentation all SS sources have offsets, but not all sinks can also be SS sources e.g. ForEach doesnt have offsets in general. So usually

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres I haven't thought about this. Let me investigate bit more. Shall we return to this PR? Do you agree with extending WriterCommitMessage and using in DataWritingSparkTask#run

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-07-30 Thread vackosar
GitHub user vackosar opened a pull request: https://github.com/apache/spark/pull/21919 [SPARK-24933][SS] Report numOutputRows in SinkProgress via WrittenCom… ## What changes were proposed in this pull request? SinkProgress should report similar properties like

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon, please help with the review and merge

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-03 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres I removed use of commit to report the row count. Would you have a look? --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-04 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r207716505 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/StreamWriterCommitProgress.java --- @@ -0,0 +1,31

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres thx for good point. The reason for placing this into WriterCommitMessage is to set a standard information that should passed at the commit time. But I agree that row counting

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-06 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres @zsxwing I will exclude SinkProgress constructor from binary compatibility check as this object is constructed internally by Spark. That will remove current MiMa test failure

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-09 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r208902157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -198,11 +198,14 @@ class SourceProgress protected[sql

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-09 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r208902692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -213,6 +216,12 @@ class SinkProgress protected[sql

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-09 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r208901343 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -58,6 +61,7 @@ case class

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-07 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r208107058 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -46,6 +46,9 @@ case class

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-08 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres @cloud-fan do you have any other structure and functionality suggestions for the PR now? Or can I focus on finalizing the work and getting it merged

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-16 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @cloud-fan happy to merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-14 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @jose-torres are you happy to merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-16 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r210708107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -254,3 +259,10 @@ class SinkProgress protected[sql

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-13 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r209621895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -198,11 +198,14 @@ class SourceProgress protected[sql

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-06 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r207991171 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -179,3 +192,24 @@ class

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211379697 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala --- @@ -19,18 +19,23 @@ package

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211379432 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala --- @@ -61,12 +63,30 @@ private[kafka010] class

[GitHub] spark issue #22143: [SPARK-24647][SS] Report KafkaStreamWriter's written min...

2018-08-20 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/22143 @arunmahadevan min and max are used there can be other writers to same topic occurring in different job. The messages sent would then become interleaved and one would have to return large number

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-20 Thread vackosar
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/22143#discussion_r211381706 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaStreamWriter.scala --- @@ -116,3 +133,66 @@ class

[GitHub] spark issue #22143: [SPARK-24647][SS] Report KafkaStreamWriter's written min...

2018-08-19 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/22143 @arunmahadevan @jose-torres @cloud-fan you may interested in this one. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22143: [SPARK-24647][SS] Report KafkaStreamWriter's writ...

2018-08-19 Thread vackosar
GitHub user vackosar opened a pull request: https://github.com/apache/spark/pull/22143 [SPARK-24647][SS] Report KafkaStreamWriter's written min and max offs… …ets via CustomMetrics. ## What changes were proposed in this pull request? Report KafkaStreamWriter's

[GitHub] spark issue #22143: [SPARK-24647][SS] Report KafkaStreamWriter's written min...

2018-10-16 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/22143 @cloud-fan are you ok merging the PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-10-16 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @cloud-fan apart from conflicts are you happy to merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-11-05 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @tdas, @gatorsmile and @cloud-fan, just resolved conflicts. Are you happy to merge or any suggestions? --- - To unsubscribe, e

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-12-04 Thread vackosar
Github user vackosar commented on the issue: https://github.com/apache/spark/pull/21919 @tdas, @gatorsmile and @cloud-fan, just resolved conflicts. Are you happy to merge or any suggestions? Please respond such that I can either merge or close this PR