Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
Hi Devs, Thanks all for a very prompt response! That was insanely quick. Merci beaucoup! :) Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-internals Follow me at https://twitter.com/jaceklaskowski On Mon, Aug 26, 2019 at 4:05 PM Jungtaek Lim wrote: > Thanks! The patch is here: https://github.com/apache/spark/pull/25583 > > On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi > wrote: > >> Just checked this and it's a copy-paste :) It works properly when >> KafkaSourceInitialOffsetWriter used. Pull me in if review needed. >> >> BR, >> G >> >> >> On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: >> >>> Nice finding! I don't see any reason to not use >>> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I >>> guess it was copied and pasted sometime before and not addressed yet. >>> As you haven't submit a patch, I'll submit a patch shortly, with >>> mentioning credit. I'd close mine and wait for your patch if you plan to do >>> it. Please let me know. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> >>> On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: >>> Hi, Just found out that KafkaSource [1] does not use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for initial offsets. Any reason for that? Should I report an issue? Just checking out as I'm with 2.4.3 exclusively and have no idea what's coming for 3.0. [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-internals Follow me at https://twitter.com/jaceklaskowski >>> >>> -- >>> Name : Jungtaek Lim >>> Blog : http://medium.com/@heartsavior >>> Twitter : http://twitter.com/heartsavior >>> LinkedIn : http://www.linkedin.com/in/heartsavior >>> >> > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior >
Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
We were worried about regression when adding Kafka source v2 because it had lots of changes. Hence we copy-pasted codes to keep the Kafka source v1 untouched and provided a config to fallback to v1. On Mon, Aug 26, 2019 at 7:05 AM Jungtaek Lim wrote: > Thanks! The patch is here: https://github.com/apache/spark/pull/25583 > > On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi > wrote: > >> Just checked this and it's a copy-paste :) It works properly when >> KafkaSourceInitialOffsetWriter used. Pull me in if review needed. >> >> BR, >> G >> >> >> On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: >> >>> Nice finding! I don't see any reason to not use >>> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I >>> guess it was copied and pasted sometime before and not addressed yet. >>> As you haven't submit a patch, I'll submit a patch shortly, with >>> mentioning credit. I'd close mine and wait for your patch if you plan to do >>> it. Please let me know. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> >>> On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: >>> Hi, Just found out that KafkaSource [1] does not use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for initial offsets. Any reason for that? Should I report an issue? Just checking out as I'm with 2.4.3 exclusively and have no idea what's coming for 3.0. [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-internals Follow me at https://twitter.com/jaceklaskowski >>> >>> -- >>> Name : Jungtaek Lim >>> Blog : http://medium.com/@heartsavior >>> Twitter : http://twitter.com/heartsavior >>> LinkedIn : http://www.linkedin.com/in/heartsavior >>> >> > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior > -- Best Regards, Ryan
Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
OK, starting with this tomorrow... On Mon, 26 Aug 2019, 16:05 Jungtaek Lim, wrote: > Thanks! The patch is here: https://github.com/apache/spark/pull/25583 > > On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi > wrote: > >> Just checked this and it's a copy-paste :) It works properly when >> KafkaSourceInitialOffsetWriter used. Pull me in if review needed. >> >> BR, >> G >> >> >> On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: >> >>> Nice finding! I don't see any reason to not use >>> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I >>> guess it was copied and pasted sometime before and not addressed yet. >>> As you haven't submit a patch, I'll submit a patch shortly, with >>> mentioning credit. I'd close mine and wait for your patch if you plan to do >>> it. Please let me know. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> >>> On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: >>> Hi, Just found out that KafkaSource [1] does not use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for initial offsets. Any reason for that? Should I report an issue? Just checking out as I'm with 2.4.3 exclusively and have no idea what's coming for 3.0. [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-internals Follow me at https://twitter.com/jaceklaskowski >>> >>> -- >>> Name : Jungtaek Lim >>> Blog : http://medium.com/@heartsavior >>> Twitter : http://twitter.com/heartsavior >>> LinkedIn : http://www.linkedin.com/in/heartsavior >>> >> > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior >
Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
Thanks! The patch is here: https://github.com/apache/spark/pull/25583 On Mon, Aug 26, 2019 at 11:02 PM Gabor Somogyi wrote: > Just checked this and it's a copy-paste :) It works properly when > KafkaSourceInitialOffsetWriter used. Pull me in if review needed. > > BR, > G > > > On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: > >> Nice finding! I don't see any reason to not use >> KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I >> guess it was copied and pasted sometime before and not addressed yet. >> As you haven't submit a patch, I'll submit a patch shortly, with >> mentioning credit. I'd close mine and wait for your patch if you plan to do >> it. Please let me know. >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> >> On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: >> >>> Hi, >>> >>> Just found out that KafkaSource [1] does not >>> use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for >>> initial offsets. >>> >>> Any reason for that? Should I report an issue? Just checking out as I'm >>> with 2.4.3 exclusively and have no idea what's coming for 3.0. >>> >>> [1] >>> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 >>> >>> [2] >>> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >>> https://about.me/JacekLaskowski >>> The Internals of Spark SQL https://bit.ly/spark-sql-internals >>> The Internals of Spark Structured Streaming >>> https://bit.ly/spark-structured-streaming >>> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals >>> Follow me at https://twitter.com/jaceklaskowski >>> >>> >> >> -- >> Name : Jungtaek Lim >> Blog : http://medium.com/@heartsavior >> Twitter : http://twitter.com/heartsavior >> LinkedIn : http://www.linkedin.com/in/heartsavior >> > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
Just checked this and it's a copy-paste :) It works properly when KafkaSourceInitialOffsetWriter used. Pull me in if review needed. BR, G On Mon, Aug 26, 2019 at 3:57 PM Jungtaek Lim wrote: > Nice finding! I don't see any reason to not use > KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I > guess it was copied and pasted sometime before and not addressed yet. > As you haven't submit a patch, I'll submit a patch shortly, with > mentioning credit. I'd close mine and wait for your patch if you plan to do > it. Please let me know. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: > >> Hi, >> >> Just found out that KafkaSource [1] does not >> use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for >> initial offsets. >> >> Any reason for that? Should I report an issue? Just checking out as I'm >> with 2.4.3 exclusively and have no idea what's coming for 3.0. >> >> [1] >> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 >> >> [2] >> https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://about.me/JacekLaskowski >> The Internals of Spark SQL https://bit.ly/spark-sql-internals >> The Internals of Spark Structured Streaming >> https://bit.ly/spark-structured-streaming >> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals >> Follow me at https://twitter.com/jaceklaskowski >> >> > > -- > Name : Jungtaek Lim > Blog : http://medium.com/@heartsavior > Twitter : http://twitter.com/heartsavior > LinkedIn : http://www.linkedin.com/in/heartsavior >
Re: [SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
Nice finding! I don't see any reason to not use KafkaSourceInitialOffsetWriter from KafkaSource, as they're identical. I guess it was copied and pasted sometime before and not addressed yet. As you haven't submit a patch, I'll submit a patch shortly, with mentioning credit. I'd close mine and wait for your patch if you plan to do it. Please let me know. Thanks, Jungtaek Lim (HeartSaVioR) On Mon, Aug 26, 2019 at 8:03 PM Jacek Laskowski wrote: > Hi, > > Just found out that KafkaSource [1] does not > use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for > initial offsets. > > Any reason for that? Should I report an issue? Just checking out as I'm > with 2.4.3 exclusively and have no idea what's coming for 3.0. > > [1] > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 > > [2] > https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > The Internals of Spark SQL https://bit.ly/spark-sql-internals > The Internals of Spark Structured Streaming > https://bit.ly/spark-structured-streaming > The Internals of Apache Kafka https://bit.ly/apache-kafka-internals > Follow me at https://twitter.com/jaceklaskowski > > -- Name : Jungtaek Lim Blog : http://medium.com/@heartsavior Twitter : http://twitter.com/heartsavior LinkedIn : http://www.linkedin.com/in/heartsavior
[SS] KafkaSource doesn't use KafkaSourceInitialOffsetWriter for initial offsets?
Hi, Just found out that KafkaSource [1] does not use KafkaSourceInitialOffsetWriter (of KafkaMicroBatchStream) [2] for initial offsets. Any reason for that? Should I report an issue? Just checking out as I'm with 2.4.3 exclusively and have no idea what's coming for 3.0. [1] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala#L102 [2] https://github.com/apache/spark/blob/master/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala#L281 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski The Internals of Spark SQL https://bit.ly/spark-sql-internals The Internals of Spark Structured Streaming https://bit.ly/spark-structured-streaming The Internals of Apache Kafka https://bit.ly/apache-kafka-internals Follow me at https://twitter.com/jaceklaskowski