Kafka Stream tuning.

2018-02-12 Thread TSANG, Brilly
Hi kafka users, I created a filtering stream with the Processor API; input topic that have input rate at ~5 records per millisecond. The filtering function on average takes 0.05milliseconds to complete which in ideal case would translate to (1/0.05) 20 records per millisecond. However,

How to manually delete all files under a topic's partitioins

2018-02-12 Thread le....@qtparking.com
Good afternoon Apache Kafka users group。 Excuse me, how to manually delete all partitions a topic message under the file, I know that the first step must be to stop the Kafka service but zookeeper service, there is no need to stop, then the second step is to directly execute the rm command

unable to find custom JMX metrics

2018-02-12 Thread Salah Alkawari
hi, i have a processor that generates custom jmx metrics: public class ProcessorJMX implements Processor { private StreamsMetrics streamsMetrics; private Sensor sensorStartTs; @Override public void init(ProcessorContext processorContext) {

Re: question on serialization ..

2018-02-12 Thread Debasish Ghosh
Regarding “has an according overload” I agree. But some operators like reduce and leftJoin use the serdes implicitly and from the config. So if the developer is not careful enough to have the default serdes correct then it results in runtime error. Also one more confusion on my part is that in

Kafka Transactions with a stretch cluster

2018-02-12 Thread Joe Hammerman
Good evening Apache Kafka users group, I am architecting an Apache Kafka 5 node stretch cluster for 2 datacenters. If we set min.isr to 4 and acks to 4, it would seem that we have a full record of consumer and producer events on at least one node should datacenter 1 get hit by a meteor.

Re: [VOTE] 1.0.1 RC1

2018-02-12 Thread Ewen Cheslack-Postava
Thanks for the heads up, I forgot to drop the old ones, I've done that and rc1 artifacts should be showing up now. -Ewen On Mon, Feb 12, 2018 at 12:57 PM, Ted Yu wrote: > +1 > > Ran test suite which passed. > > BTW it seems the staging repo hasn't been updated yet: > >

Re: Typo in "Processing with local state"

2018-02-12 Thread Matthias J. Sax
If you think it's an mistake, please report at O'Reilly web page and the authors will review there. Thx. http://www.oreilly.com/catalog/errata.csp?isbn=0636920044123 -Matthias On 2/12/18 1:49 PM, Ted Yu wrote: > Hi, > In "Kafka the definitive guide", page 257: > > to calculate the minimum

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-19680 and https://issues.apache.org/jira/browse/KAFKA-3370 has a good explanation. Verify that it works correctly with auto offset set to latest, to rule out other issues. Then try providing explicit starting offsets reasonably near the beginning of

Confluent Replicator

2018-02-12 Thread Tauzell, Dave
Does anybody have any experience with Confluent Replicator? Has it worked well for you? -Dave This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If

Typo in "Processing with local state"

2018-02-12 Thread Ted Yu
Hi, In "Kafka the definitive guide", page 257: to calculate the minimum and average price ... It seems average should be maximum. Cheers

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Matthias J. Sax
I don't know if any workaround. Maybe ask at Spark mailing list? -Matthias On 2/12/18 1:20 PM, Ted Yu wrote: > Have you looked at SPARK-19888 ? > > Please give the full stack trace of the exception you saw. > > Cheers > > On Mon, Feb 12, 2018 at 12:38 PM, Mina Aslani

Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread jan
A human-readable log file is likely to have much less activity in it (it was a year ago I was using kafka and we could eat up gigs for the data files but the log files were a few meg). So there's perhaps little to gain. Also if the power isn't pulled and the OS doesn't crash, log messages will

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Ted Yu
Have you looked at SPARK-19888 ? Please give the full stack trace of the exception you saw. Cheers On Mon, Feb 12, 2018 at 12:38 PM, Mina Aslani wrote: > Hi Matthias, > Are you referring to https://issues.apache.org/jira/browse/SPARK-19976? > Doesn't look like that the

Causes of partition leader re election

2018-02-12 Thread Sunil Parmar
Environment : Cloudera Kafka 0.9 Centos 6.6 We notices that every Friday our Kafka partition's leader gets re-elected. This causes temp issue in producers and consumers. Does / can anybody has experience similar behavior and share what can cause partition leader re-election ? This will help us

Re: [VOTE] 1.0.1 RC1

2018-02-12 Thread Ted Yu
+1 Ran test suite which passed. BTW it seems the staging repo hasn't been updated yet: https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka-clients/ On Mon, Feb 12, 2018 at 10:16 AM, Ewen Cheslack-Postava wrote: > And of course I'm +1 since I've

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Mina Aslani
Hi Matthias, Are you referring to https://issues.apache.org/jira/browse/SPARK-19976? Look like that the jira was not fixed. (e.g. Resolution: "Not a Problem"). So, is there any suggested workaround? Regards, Mina On Mon, Feb 12, 2018 at 3:03 PM, Matthias J. Sax wrote: >

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Mina Aslani
Hi Matthias, Are you referring to https://issues.apache.org/jira/browse/SPARK-19976? Doesn't look like that the jira was not fixed. (e.g. Resolution: "Not a Problem"). So, is there any suggested workaround? Regards, Mina On Mon, Feb 12, 2018 at 3:03 PM, Matthias J. Sax

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Matthias J. Sax
AFAIK, Spark does not pass this config to the consumer on purpose... It's not a Kafka issues -- IIRC, there is Spark JIRA ticket for this. -Matthias On 2/12/18 11:04 AM, Mina Aslani wrote: > Hi, > > I am getting below error > Caused by:

org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Mina Aslani
Hi, I am getting below error Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {topic1-0=304337} as soon as I submit a spark app to my cluster. I am using below dependency name:

Re: question on serialization ..

2018-02-12 Thread Matthias J. Sax
Each operator that needs to use a Serde, has a an according overload method that allows you to overwrite the Serde. If you don't overwrite it, the operator uses the Serde from the config. > If one gets the default >> serializer wrong then she gets run time errors in serialization / >>

Re: [VOTE] 1.0.1 RC1

2018-02-12 Thread Ewen Cheslack-Postava
And of course I'm +1 since I've already done normal release validation before posting this. -Ewen On Mon, Feb 12, 2018 at 10:15 AM, Ewen Cheslack-Postava wrote: > Hello Kafka users, developers and client-developers, > > This is the second candidate for release of Apache

[VOTE] 1.0.1 RC1

2018-02-12 Thread Ewen Cheslack-Postava
Hello Kafka users, developers and client-developers, This is the second candidate for release of Apache Kafka 1.0.1. This is a bugfix release for the 1.0 branch that was first released with 1.0.0 about 3 months ago. We've fixed 49 significant issues since that release. Most of these are

Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread Vincent Dautremont
Just a guess : wouldn't it be because the log files on disk can be made of compressed data when produced but needs to be uncompressed on consumption (of a single message) ? 2018-02-12 15:50 GMT+01:00 YuFeng Shen : > Hi jan , > > I think the reason is the same as why index file

Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread YuFeng Shen
Hi jan , I think the reason is the same as why index file using memory mapped file. As the memory mapped file can avoid the data copy between user and kernel buffer space, so it can improve the performance for the index file IO operation ,right? If it is ,why Log file cannot achieve the same

Re: Cross-cluster mirror making

2018-02-12 Thread Andrew Otto
Hi Husna, The trouble with cross cluster Kafka use is unpredictable network latency. If a consumer encounters high latency, it will just lag, but (hopefully) eventually it will catch up. If a producer encounters high latency, it will have to buffer messages locally until those messages are ACKed

Re: question on serialization ..

2018-02-12 Thread Debasish Ghosh
Thanks a lot for the clear answer. One of the concerns that I have is that it's not always obvious when the default serializers are used. e.g. it looks like KGroupedStream#reduce also uses the default serializer under the hood. If one gets the default serializer wrong then she gets run time