[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-05-17 Thread sirishaSindri
Github user sirishaSindri closed the pull request at:

https://github.com/apache/spark/pull/20836


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-03-26 Thread sirishaSindri
Github user sirishaSindri commented on a diff in the pull request:

https://github.com/apache/spark/pull/20836#discussion_r177276642
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -279,9 +279,8 @@ private[kafka010] case class InternalKafkaConsumer(
   if (record.offset > offset) {
 // This may happen when some records aged out but their offsets 
already got verified
 if (failOnDataLoss) {
-  reportDataLoss(true, s"Cannot fetch records in [$offset, 
${record.offset})")
--- End diff --

@gaborgsomogyi Thank you Gaborgsomogyi for looking at it. For the batch 
queries, it will always fail if it fails to read any data from the provided 
offsets due to lost data.  
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
 . With this change,it wont fail .Instead It will return all the available 
messages within the requested offset range.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/20836#discussion_r176656905
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -279,9 +279,8 @@ private[kafka010] case class InternalKafkaConsumer(
   if (record.offset > offset) {
 // This may happen when some records aged out but their offsets 
already got verified
 if (failOnDataLoss) {
-  reportDataLoss(true, s"Cannot fetch records in [$offset, 
${record.offset})")
--- End diff --

It's just breaks the whole concept. When `failOnDataLoss ` enabled this 
exception means it should fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-03-15 Thread sirishaSindri
GitHub user sirishaSindri opened a pull request:

https://github.com/apache/spark/pull/20836

SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consu…

…mer Can't Handle Non-consecutive Offsets

## What changes were proposed in this pull request?
 In the fetchData , Instead of throwing an exception on failOnDataLoss, 
I am saying return the record 
if its offset falls in the user requested offset range
## How this patch was tested :
  manually tested, added a unit test and ran  in a real deployment


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sirishaSindri/spark SPARK-23685

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20836


commit 5ccfed840f9cf9cd1c28a309b934e1285332d04d
Author: z001k5c 
Date:   2018-03-15T15:53:14Z

SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consumer 
Can't Handle Non-consecutive Offsets

commit 7e08cd9062683c062b7b0408ffe40ff726249909
Author: z001k5c 
Date:   2018-03-15T17:06:06Z

SPARK-23685 : Fix for the Spark Structured Streaming Kafka 0.10 Consumer 
Can't Handle Non-consecutive Offsets




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org