Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Liam Clarke-Hutchinson
Spark Structured Streaming has some significant limitations compared to Kafka Streams. This one has always proved hard to overcome: "Multiple streaming aggregations (i.e. a chain of aggregations on a streaming DF) are not yet supported on streaming Datasets." On Thu, 29 Apr. 2021, 8:13 am

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Matthias, I will create a KIP or ticket for tracking this issue. -thanks Mohan On 4/28/21, 1:01 PM, "Matthias J. Sax" wrote: Feel free to do a KIP and contribute to Kafka! https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Or create a

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Mich Talebzadeh
Hi, "I'd assume this is because Kafka Streams is positioned for building streaming applications, rather than doing analytics, whereas Spark is more often used for analytics purposes." Well not necessarily the full picture. Spark can do both analytics and streaming, especially with Spark

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Andrew Otto
> I am not sure I understand. We have built several analytics applications. We typically use custom aggregations as they are not available directly in the library. Oh for sure! I was answering this question: > . Is there any reason why it is not provided as part of the library ? And assuming

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Matthias J. Sax
Feel free to do a KIP and contribute to Kafka! https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals Or create a ticket for tracking. -Matthias On 4/28/21 12:49 PM, Parthasarathy, Mohan wrote: > Andrew, > > I am not sure I understand. We have built several analytics

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Andrew, I am not sure I understand. We have built several analytics applications. We typically use custom aggregations as they are not available directly in the library. -mohan On 4/28/21, 12:12 PM, "Andrew Otto" wrote: I'd assume this is because Kafka Streams is positioned for

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Matthias, Once a Spark dataframe is created by reading the data from Kafka (https://sparkbyexamples.com/spark/spark-streaming-with-kafka/) , you can use Spark SQL and all the aggregations that are shown in this page are valid. I feel that having this built into Kafka streams library would make

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Andrew Otto
I'd assume this is because Kafka Streams is positioned for building streaming applications, rather than doing analytics, whereas Spark is more often used for analytics purposes.

Re: Spark Streams vs Kafka Streams

2021-04-28 Thread Matthias J. Sax
I am not familiar with all the details about Spark, however, the link you shared is for Spark SQL. I thought Spark SQL is for batch processing only? Personally, I would be open to add more built-in aggregations next to count(). It did not come up in the community so far, so there was no

Mirror Maker 2: Incoming messages on source and target kafka cluster mismatch after mirroring

2021-04-28 Thread fighter
We have did the kafka cluster migration from source kafka cluster to target kafka cluster using MirrorMaker 2.5.1 in distributed mode using kafka connect cluster. We see noticeable difference incoming messages rate per sec on source and target. We also analyze that on kafka connect producer has

Re: What's so special about 2,8,9,15,56,72 error codes?

2021-04-28 Thread Nikita Kretov
Thank you all for answers! Israel Ekpo, you clarification is really helpful for me. After studying protocol documentation closely, i indeed can agree with you about server side nature of error with types (8,8,15,56,72). But do you think error with code 2 ```CORRUPT_MESSAGE``` is a server side

Spark Streams vs Kafka Streams

2021-04-28 Thread Parthasarathy, Mohan
Hi, Whenever the discussion about what streaming framework to use for near-realtime analytics, there is normally a discussion about Spark vs Kafka streaming. One of the points in favor of Spark streaming is the simple aggregations that are built-in. See here:

Changing Replication Factor

2021-04-28 Thread Marcus Horsley-Rai
Hi All, I'm in a sub-optimal situation whereby I have some Kafka Streams apps deployed to production, but the default replication factor set on the brokers was 1 when they were first deployed. As such, any state store changelog topics, and re-partition topics therefore have RF 1 also. I'm

Re: What's so special about 2,8,9,15,56,72 error codes?

2021-04-28 Thread Israel Ekpo
https://kafka.apache.org/protocol.html#protocol_error_codes According to the documentation, those numeric codes are special because they are used within the Kafka protocol to indicate problems that are observed at the server. These special numeric codes can be translated by the client into

Re: Kafka Streams - Out of Order Handling

2021-04-28 Thread Marcus Horsley-Rai
Thanks very much for taking the time to answer, Matthias! Very much appreciated All the best, Marcus On Wed, Apr 7, 2021 at 10:22 PM Matthias J. Sax wrote: > Sorry for late reply... > > > > I only see issues of out of order data in my re-partitioned topic as a > result of a rebalance

Re: Request to be added the contributor list

2021-04-28 Thread Tom Bentley
Hi Wenhao, I added you as a contributor in Jira, so you should now be able to assign issues to yourself etc. We normally discuss development of Kafka on the d...@kafka.apache.org mailing list. Thanks for your interest. Tom On Wed, Apr 28, 2021 at 3:39 PM Wenhao Ji wrote: > Hi everyone, > > I

Re: Standard way to get http POST request into a Kafka topic?

2021-04-28 Thread Ran Lupovich
Btw. Just now accomplished a working poc in dev using wso2 , confluent rest proxy , confluent schema registry, kafka Produce message to kafka via post http rest request בתאריך יום ד׳, 28 באפר׳ 2021, 06:42, מאת Ran Lupovich ‏< ranlupov...@gmail.com>: > Hi, have a look for Rest Proxy component

Request to be added the contributor list

2021-04-28 Thread Wenhao Ji
Hi everyone, I am requesting to be added to the contributors list since I would like to fix the bug KAFKA-7572. Can somebody help with this? Thanks in advance! Wenhao

Re: What's so special about 2,8,9,15,56,72 error codes?

2021-04-28 Thread Men Lim
that article linked to apache error code, which tells you their meaning. https://kafka.apache.org/protocol.html#protocol_error_codes On Wed, Apr 28, 2021 at 6:44 AM Nikita Kretov wrote: > I'm doing little research about what metrics and formulas used to > calculate SLA for kafka clusters. I

What's so special about 2,8,9,15,56,72 error codes?

2021-04-28 Thread Nikita Kretov
I'm doing little research about what metrics and formulas used to calculate SLA for kafka clusters. I found that some of major cloud providers offer managed kafka solutions. for example - aws msk (Amazon Managed Streaming for Apache Kafka) Interestingly, aws msk SLA document defines

Re: Standard way to get http POST request into a Kafka topic?

2021-04-28 Thread Andrew Otto
Hi Reed, Something will have to produce the POST request body into Kafka. We do this at the Wikimedia Foundation with a service called EventGate . I've got a 3 part blog series in which the 3rd entry