Re: Avro Serialization & Schema Registry ..

2017-07-19 Thread Michael Noll
In short, Avro serializers/deserializers provided by Confluent always integrate with (and thus require) Confluent Schema Registry. That's why you must set the `schema.registry.url` configuration for them. If you want to use Avro but without a schema registry, you'd need to work with the Avro API

Re: KTable-KTable Join Semantics on NULL Key

2017-09-14 Thread Michael Noll
Perhaps a clarification to what Damian said: It is shown in the (HTML) table at the link you shared [1] what happens when you get null values for a key. We also have slightly better join documentation at [2], the content/text of which we are currently migrating over to the official Apache Kafka d

Re: left join between PageViews(KStream) & UserProfile (KTable)

2017-10-23 Thread Michael Noll
> *What key should the join on ? * The message key, on both cases, should contain the user ID in String format. > *There seems to be no common key (eg. user) between the 2 classes - PageView and UserProfile* The user ID is the common key, but the user ID is stored in the respective message *keys

Re: Kafka 11 | Stream Application crashed the brokers

2017-12-01 Thread Michael Noll
Thanks for reporting back, Sameer! On Fri, Dec 1, 2017 at 2:46 AM, Guozhang Wang wrote: > Thanks for confirming Sameer. > > > Guozhang > > On Thu, Nov 30, 2017 at 3:52 AM, Sameer Kumar > wrote: > > > Just wanted to let everyone know that this issue got fixed in Kafka > 1.0.0. > > I recently mi

Re: Kafka-streams: mix Processor API with windowed grouping

2018-04-10 Thread Michael Noll
Also, if you want (or can tolerate) probabilistic counting, with the option to also do TopN in that manner, we also have an example that uses Count Min Sketch: https://github.com/confluentinc/kafka-streams-examples/blob/4.0.0-post/src/test/scala/io/confluent/examples/streams/ProbabilisticCountingSc

Re: Question about applicability of Kafka Streams

2016-05-30 Thread Michael Noll
Kim, yes, Kafka Stream is very suitable for this kind of application. The code example that Tobias linked to should be a good starting point for you (thanks, Tobias!). Best, Michael On Fri, May 27, 2016 at 4:35 AM, BYEONG-GI KIM wrote: > Thank you very much for the information! > I'll look

Re: Avro deserialization

2016-05-30 Thread Michael Noll
eems that KafkaAvroDecoder does not implement Deserializer, and > thus can’t be used in the normal way deserializers are registered. > > Has anyone gotten this stuff to work? > > Thanks, > > Rick > > -- Best regards, Michael Noll *Michael G. Noll | Product Manager |

Re: Avro deserialization

2016-05-31 Thread Michael Noll
pecific-avro-consumer/src/main/java/io/confluent/examples/consumer/AvroClicksSessionizer.java > > IMHO it’s confusing having the same class names in different packages when > most people probably rely on an IDE to manage their imports. > > Thanks! > > Rick > > > On May

Re: Best monitoring tool for Kafka in production

2016-06-02 Thread Michael Noll
Hafsa, since you specifically asked about non-free Kafka monitoring options as well: As of version 3.0.0, the Confluent Platform provides a commercial monitoring tool for Kafka called Confluent Control Center. (Disclaimer: I work for Confluent.) Quoting from the product page at http://www.confl

Re: Does the Kafka Streams DSL support non-Kafka sources/sinks?

2016-06-03 Thread Michael Noll
r now. So, as Gwen > > mentioned, Connect would be the way to go to bring the data to a Kafka > > Topic first. > > Got it — thank you! > > -- Best regards, Michael Noll *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download Apache Kafka and Confluent Platform: www.confluent.io/download <http://www.confluent.io/download>*

Re: Python kafka client benchmarks

2016-06-16 Thread Michael Noll
Thanks a lot for the investigation and sharing the results, John! -Michael On Thu, Jun 16, 2016 at 7:59 AM, Dana Powers wrote: > Very nice! > > On Wed, Jun 15, 2016 at 6:40 PM, John Dennison > wrote: > > My team has published a post comparing python kafka clients. Might be of > > interest to

Re: Groupby Operator

2016-06-16 Thread Michael Noll
Davood, you are reading the input topic into a KTable, which means that subsequent records for the same key (such as the key `1`, which appears twice in the input messages/records) will be considered as updates to any previous records for that key. So I think what you actually want to do is read

Re: WordCount example errors

2016-06-16 Thread Michael Noll
Jeyhun, just to confirm for you: Kafka Streams only works with Kafka 0.10 brokers [1]. Best, Michael [1] http://docs.confluent.io/3.0.0/streams/faq.html#can-i-use-kafka-streams-with-kafka-clusters-running-0-9-0-8-or-0-7 On Tue, Jun 14, 2016 at 3:03 PM, Jeyhun Karimov wrote: > Thanks for rep

Re: Kafka without Storm/Spark

2016-07-01 Thread Michael Noll
to do that in Java? > >>To sum up, it should be looking like this:Kafka reads from topic > "kafkaSource" and writes into the topic "kafkaResult". > >> > >>Thanks in advance and Best Regards, Numan > -- Best regards, Michael Noll *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download Apache Kafka and Confluent Platform: www.confluent.io/download <http://www.confluent.io/download>*

Re: Kafka without Storm/Spark

2016-07-01 Thread Michael Noll
> Because i know that by using Storm, you can guarantee the messages (depending on the type of the Topology) > such as exactly once, at least once. If i simply use kafka consumer and another producer to forward the > messages, could the data tranfer completely be guaranteed as well? Addendum: If

Re: documenting Kafka Streams PageView examples

2016-07-06 Thread Michael Noll
em with Confluent Platform 3 producer/consumer shell > scripts but would need guidance as to how to invoke them and how to specify > some input file (or stdin format). > > That would help me better understand how to get serialization and streams > to work together. > > Thanks, &g

Re: documenting Kafka Streams PageView examples

2016-07-06 Thread Michael Noll
Correction: Just realized that I misread your message. You are indeed referring to the code examples in Apache Kafka. ;-) On Wed, Jul 6, 2016 at 11:35 AM, Michael Noll wrote: > Phil, > > I suggest to ask this question in the Confluent Platform mailing list > because you're

Re: Question on partitions while consuming multiple topics

2016-07-06 Thread Michael Noll
Snehal beat me to it, as my suggestion would have also been to take a look at Kafka Streams. :-) Kafka Streams should be the easiest way to achieve what you're describing. Snehal's links are good starting points. Further pointers are: https://github.com/confluentinc/examples/blob/master/kafka-s

Re: Question on partitions while consuming multiple topics

2016-07-06 Thread Michael Noll
/examples/blob/kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/confluent/examples/streams/JoinLambdaIntegrationTest.java -Michael On Wed, Jul 6, 2016 at 11:49 AM, Michael Noll wrote: > Snehal beat me to it, as my suggestion would have also been to take a look > at Kafka S

Re: Kafka Streams : Old Producer

2016-07-08 Thread Michael Noll
t; I guess not since the old producer would not add timestamp. ( I am > > > getting > > > > invalid timestamp exception) > > > > > > > > As I cannot change our producer application setup, I have to use > > 0.9.0.1 > > > > produ

Re: Contribution : KafkaStreams CEP library

2016-07-11 Thread Michael Noll
Thanks for sharing, Florian! -Michael On Fri, Jul 8, 2016 at 6:48 PM, Florian Hussonnois wrote: > Hi All, > > Since a few weeks I'm working for fun on a CEP library on top of > KafkaStreams. > There is some progress and I think my project start to look something, or > at least I hope ;) > > ht

Re: Kafka Consumer for Real-Time Application?

2016-07-12 Thread Michael Noll
To explain what you're seeing: After you have run a consumer application once, it will have stored its latest consumer offsets in Kafka. Upon restart -- e.g. after a brief shutdown time as you mentioned -- the consumer application will continue processing from the point of these stored consumer o

Re: NetFlow metrics to Kafka

2016-07-13 Thread Michael Noll
Mathieu, yes, this is possible. In a past project of mine we have been doing this, though I wasn't directly involved with coding the Cisco-Kafka part. As far as I know there aren't ready-to-use Netflow connectors available (for Kafka Connect), so you most probably have to write your own connecto

Re: KTable DSL join

2016-07-14 Thread Michael Noll
Srikant, > Its not a case for join() as the keys don't match. Its more a lookup table. Yes, the semantics of streaming joins in Kafka Streams are bit different from joins in traditional RDBMS. See http://docs.confluent.io/3.0.0/streams/developer-guide.html#joining-streams. Also, a heads up: It

Re: Kafka Streams reducebykey and tumbling window - need final windowed KTable values

2016-07-14 Thread Michael Noll
Srikanth, > This would be useful in place where we use a key-value store just to > duplicate a KTable for get() operations. > Any rough idea when this is targeted for release? We are aiming to add the queryable state feature into the next release of Kafka. > Its still not clear how to use this

Re: KStream-to-KStream Join Example

2016-07-14 Thread Michael Noll
Vivek, another option for you is to replace the `map()` calls with `mapValues()`. Can you give that a try? Background: You are calling `map()` on your two input streams, but in neither of the two cases are you actually changing the key (it is always the userId before the map, and it stays the us

Re: Building API to make Kafka reactive

2016-07-22 Thread Michael Noll
Shekar, you mentioned: > The API should give different status at each part of the pipeline. > At the ingestion, the API responds with "submitted" > During the progression, the API returns "in progress" > After successful completion, the API returns "Success" May I ask what your motivation is to

Re: Kafka Streams Latency

2016-07-22 Thread Michael Noll
The SimpleBenchmark included in Apache Kafka [1] is, as Eno mentioned, quite rudimentary. It does some simple benchmark runs of Kafka's standard/native producer and consumer clients, and then some Kafka Streams specific ones. So you can compare somewhat between the native clients and Kafka Stream

Re: How to get the Global IP address of the Producer?

2016-07-28 Thread Michael Noll
One option is to implement the producer applications in such a way that each producer includes its own IP address in the payload of the messages it produces. -Michael On Tue, Jul 26, 2016 at 10:15 AM, Gourab Chowdhury wrote: > Hi, > > I am working on a project that includes Apache Kafka. My co

Re: Questions about Kafka Streams Partitioning & Deployment

2016-07-29 Thread Michael Noll
Michael, > Guozhang, in (2) above did you mean "some keys *may be* hashed to different > partitions and the existing local state stores will not be valid?" > That fits with out understanding. Yes, that's what Guozhang meant. Corrected version: When you increase the number of input partition

Re: Reactive Kafka performance

2016-08-02 Thread Michael Noll
David, you wrote: > Each task would effectively block on DB-io for every history-set retrieval; > obviously we would use a TTL cache (KTable could be useful here, but it > wouldn’t be able to hold “all” of the history for every user) Can you elaborate a bit why you think a KTable wouldn't be abl

Re: Kafka Streams on Windows?

2016-08-05 Thread Michael Noll
Thanks a lot for investing and also for sharing back your findings, Mathieu! -Michael On Fri, Aug 5, 2016 at 3:10 PM, Mathieu Fenniak < mathieu.fenn...@replicon.com> wrote: > I took that approach. It was painful, but, ultimately did get me a working > Windows development environment. > > To an

Re: Automated Testing w/ Kafka Streams

2016-08-15 Thread Michael Noll
Mathieu, follow-up question: Are you also doing or considering integration testing by spawning a local Kafka cluster and then reading/writing to that cluster (often called embedded or in-memory cluster)? This approach would be in the middle between ProcessorTopologyTestDriver (that does not spaw

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Michael Noll
world" approach between > test > > cases (or test suites), which probably makes test speed much worse, or, > > you > > find another way to isolate test's state. > > > > I'd have to face all these problems at the higher level that I'm calling > >

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Michael Noll
ovided a snippet how to pull in the artifact that includes k.u.TestUtils, here's the same snippet for Maven/pom.xml, with dependency scope set to `test`: org.apache.kafka kafka_2.11 0.10.0.0 test test On Tue, Aug 16, 2016 at 7:14 PM, Michael Noll wrot

Re: Joining Streams with Kafka Streams

2016-08-26 Thread Michael Noll
First a follow-up question, just in case (sorry if that was obvious to you already): Have you considered using Kafka Streams DSL, which has much more convenient join functionality built-in out of the box? The reason I am asking is that you didn't specifically mention that you did try using the DS

Re: How distributed countByKey works in KStream ?

2016-08-29 Thread Michael Noll
In Kafka Streams, data is partitioned according to the keys of the key-value records, and operations such as countByKey operate on these stream partitions. When reading data from Kafka, these stream partitions map to the partitions of the Kafka input topic(s), but these may change once you add pro

Re: kafka-streams project compiles using maven but failed using sbt

2016-08-29 Thread Michael Noll
Most probably because, in your build.sbt, you didn't enable the -Xexperimental compiler flag for Scala. This is required when using Scala 2.11 (as you do) to enable SAM for Java 8 lambda support. Because this compiler flag is not set your build fails because it can translate `_.toUpperCase()` int

Re: Kafka Streaming Join for range of gps coordinates

2016-08-29 Thread Michael Noll
Quick reply only, since I am on my mobile. Not an exact answer to your problem but still somewhat related: http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/ (perhaps you have seen this already). -Michael On Sun, Aug 28, 2016 at 4:55 AM, Farhon Zaharia wrote:

Re: How distributed countByKey works in KStream ?

2016-08-31 Thread Michael Noll
nternally: > > >>> > > >>> shuffled across the network : > > >>>> > > >>> > > >>> > > >>>> _internal_topic_shuffle_0 : [ (a, "a"), (a, "a") ] > > >>>> _internal_topic

Re: How distributed countByKey works in KStream ?

2016-09-01 Thread Michael Noll
> > > > > > Guozhang > > > > > > On Wed, Aug 31, 2016 at 4:41 AM, Tommy Q wrote: > > > > > >> I cleaned up all the zookeeper & kafka states and run the > WordCountDemo > > >> again, the results in wc-out is still wrong: > > >> > &

Re: Kafka Streams: joins without windowing (KStream) and without being KTables

2016-09-06 Thread Michael Noll
Also, another upcoming feature is (slightly simplified description follows) "global" state/KTable. Today, a KTable is always partitioned/sharded. One advantage of global state/tables will be for use cases where you need to perform non-key joins, similar to what Guillermo described previously in t

Re: enhancing KStream DSL

2016-09-09 Thread Michael Noll
Ara, you have shared this code snippet: >allRecords.branch( >(imsi, callRecord) -> "VOICE".equalsIgnoreCase( callRecord.getCallCommType()), >(imsi, callRecord) -> "DATA".equalsIgnoreCase( callRecord.getCallCommType()), >(imsi, callRecord) -> true >); T

Re: enhancing KStream DSL

2016-09-09 Thread Michael Noll
ecords = branches[1]; // No third branched stream like before. Then, to count "everything" (VOICE + DATA + everything else), simply reuse the original `allRecords` stream. On Fri, Sep 9, 2016 at 2:23 PM, Michael Noll wrote: > Ara, > > you have shared this code snippet

Re: what's the relationship between Zookeeper and Kafka ?

2016-09-13 Thread Michael Noll
Eric, the latest versions of Kafka use ZooKeeper only on the side of the Kafka brokers, i.e. the servers in a Kafka cluster. Background: In older versions of Kafka, the Kafka consumer API required client applications (that would read from data Kafka) to also talk to ZK. Why would they need to do

Re: micro-batching in kafka streams

2016-09-26 Thread Michael Noll
Ara, may I ask why you need to use micro-batching in the first place? Reason why I am asking: Typically, when people talk about micro-batching, they are refer to the way some originally batch-based stream processing tools "bolt on" real-time processing by making their batch sizes really small. H

Re: Kafka Streams dynamic partitioning

2016-10-05 Thread Michael Noll
> So, in this case I should know the max number of possible keys so that > I can create that number of partitions. Assuming I understand your original question correctly, then you would not need to do/know this. Rather, pick the number of partitions in a way that matches your needs to process the

Re: Kafka Streams dynamic partitioning

2016-10-06 Thread Michael Noll
titions (say, 50) so you have some wiggle > > >> room to add more machines (11...50) later if need be. > > >> > > >> If you create e.g 30 partitions, but only have e.g 5 instances of > > >> your program, all on the same consumer group, all using kaf

Re: Printing to stdin from KStreams?

2016-10-07 Thread Michael Noll
Ali, adding to what Matthias said: Kafka 0.10 changed the message format to add so-called "embedded timestamps" into each Kafka message. The Java producer included in Kafka 0.10 includes such embedded timestamps into any generated message as expected. However, other clients (like the go kafka p

Re: kafka stream to new topic based on message key

2016-10-07 Thread Michael Noll
Gary, adding to what Guozhang said: Yes, you can programmatically create a new Kafka topic from within your application. But how you'd do that varies a bit between current Kafka versions and the upcoming 0.10.1 release. As of today (Kafka versions before the upcoming 0.10.1 release), you would

Re: kafka stream to new topic based on message key

2016-10-07 Thread Michael Noll
Great, happy to hear that, Gary! On Fri, Oct 7, 2016 at 3:30 PM, Gary Ogden wrote: > Thanks for all the help gents. I really appreciate it. It's exactly what I > needed. > > On 7 October 2016 at 06:56, Michael Noll wrote: > > > Gary, > > > > addin

Re: Printing to stdin from KStreams?

2016-10-07 Thread Michael Noll
7, 2016 at 1:36 PM, Ali Akhtar wrote: > Is it possible to have kafka-streams-reset be automatically called during > development? Something like streams.cleanUp() but which also does reset? > > On Fri, Oct 7, 2016 at 2:45 PM, Michael Noll wrote: > > > Ali, > > > >

Re: Printing to stdin from KStreams?

2016-10-07 Thread Michael Noll
me application id in all applications in > development (transparently reading the application id from environment > variables, so in my kubernetes config I can specify different app ids for > production) > > On Fri, Oct 7, 2016 at 8:05 PM, Michael Noll wrote: > > > > I

Re: puncutuate() never called

2016-10-07 Thread Michael Noll
David, punctuate() is still data-driven at this point, even when you're using the WallClock timestamp extractor. To use an example: Imagine you have configured punctuate() to be run every 5 seconds. If there's no data being received for a minute, then punctuate won't be called -- even though you

Re: In Kafka Streaming, Serdes should use Optionals

2016-10-07 Thread Michael Noll
Ali, the Apache Kafka project still targets Java 7, which means we can't use Java 8 features just yet. FYI: There's on ongoing conversation about when Kafka would move from Java 7 to Java 8. On Fri, Oct 7, 2016 at 6:14 PM, Ali Akhtar wrote: > Since we're using Java 8 in most cases anyway, Serde

Re: Kafka null keys - OK or a problem?

2016-10-09 Thread Michael Noll
FYI: Kafka's new Java producer (which ships with Kafka) the behavior is as follows: If no partition is explicitly specified (to send the message to) AND the key is null, then the DefaultPartitioner [1] will assign messages to topic partitions in a round-robin fashion. See the javadoc and also the

Re: puncutuate() never called

2016-10-09 Thread Michael Noll
received) for over 30 mins…with a 60-second timer. So, do we > need to just rebuild our cluster with bigger machines? > > -David > > On 10/7/16, 11:18 AM, "Michael Noll" wrote: > > David, > > punctuate() is still data-driven at this point, even when yo

Re: Kafka null keys - OK or a problem?

2016-10-10 Thread Michael Noll
/sarama#Partitioner [2] https://github.com/Shopify/sarama/blob/master/partitioner.go On Mon, Oct 10, 2016 at 8:51 AM, Ali Akhtar wrote: > Hey Michael, > > We're using this one: https://github.com/Shopify/sarama > > Any ideas how that one works? > > On Mon, Oct 10, 2016

Re: Convert a KStream to KTable

2016-10-10 Thread Michael Noll
Elias, yes, that is correct. I also want to explain why: One can always convert a KTable to a KStream (note: this is the opposite direction of what you want to do) because one only needs to iterate through the table to generate the stream. To convert a KStream into a KTable (what you want to do

Re: sFlow/NetFlow/Pcap Plugin for Kafka Producer

2016-10-10 Thread Michael Noll
Aris, even today you can already use Kafka to deliver Netflow/Pcap/etc. messages, and people are already using it for that (I did that in previous projects of mine, too). Simply encode your Pcap/... messages appropriately (I'd recommend to take a look at Avro, which allows you to structure your d

Re: sFlow/NetFlow/Pcap Plugin for Kafka Producer

2016-10-10 Thread Michael Noll
kafka ecosystem page. > > https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem > > > Best Regards, > Aris. > > > On Mon, Oct 10, 2016 at 6:55 PM, Michael Noll > wrote: > > > Aris, > > > > even today you can already use Kafka to deliver Netflow/Pcap/et

Re: KafkaStream Merging two topics is not working fro custom datatypes

2016-10-11 Thread Michael Noll
Ratha, if you based your problematic code on the PipeDemo example, then you should have these two lines in your code (which most probably you haven't changed): props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass()); props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, Se

Re: KafkaStream Merging two topics is not working fro custom datatypes

2016-10-11 Thread Michael Noll
When I wrote: "If you haven't changed to default key and value serdes, then `to()` will fail because [...]" it should have read: "If you haven't changed the default key and value serdes, then `to()` will fail because [...]" On Tue, Oct 11, 2016 at

Re: puncutuate() never called

2016-10-11 Thread Michael Noll
re empty > topics. Punctuate will never be called. > > -David ” > > On 10/10/16, 1:55 AM, "Michael Noll" wrote: > > > We have run the application (and have confirmed data is being > received) > for over 30 mins…with a 60-second timer. > >

Re: Support for Kafka

2016-10-11 Thread Michael Noll
Regarding the JVM, we recommend running the latest version of JDK 1.8 with the G1 garbage collector: http://docs.confluent.io/current/kafka/deployment.html#jvm And yes, Kafka does run on Ubuntu 16.04, too. (Confluent provides .deb packages [1] for Apache Kafka if you are looking for these to inst

Re: Support for Kafka

2016-10-11 Thread Michael Noll
Actually, I wanted to include the following link for the JVM docs (the information matches what's written in the earlier link I shared): http://kafka.apache.org/documentation#java On Tue, Oct 11, 2016 at 11:21 AM, Michael Noll wrote: > Regarding the JVM, we recommend running the latest

Re: KafkaStream Merging two topics is not working fro custom datatypes

2016-10-12 Thread Michael Noll
t; > >> } > >> > >> > >> *Serilizer/Deserializer* > >> > >> > >> > >> public class KafkaPayloadSerializer implements Serializer, > >> Deserializer { > >> > >> private static final Logger log = org.apache.logging.log4j.LogManager > >

Re: Understanding out of order message processing w/ Streaming

2016-10-13 Thread Michael Noll
> But if they arrive out of order, I have to detect / process that myself in > the processor logic. Yes -- if your processing logic depends on the specific ordering of messages (which is the case for you), then you must manually implement this ordering-specific logic at the moment. Other use case

Re: How to detect an old consumer

2016-10-17 Thread Michael Noll
Old consumers use ZK to store their offsets. Could you leverage the timetamps of the corresponding znodes [1] for this? [1] https://zookeeper.apache.org/doc/r3.4.5/zookeeperProgrammers.html#sc_zkDataModel_znodes On Mon, Oct 17, 2016 at 4:45 PM, Fernando Bugni wrote: > Hello, > > I want to de

Re: kafka streams metadata request fails on 0.10.0.1 broker/topic from 0.10.1.0 client

2016-10-19 Thread Michael Noll
Apps built with Kafka Streams 0.10.1 only work against Kafka clusters running 0.10.1+. This explains your error message above. Unfortunately, Kafka's current upgrade story means you need to upgrade your cluster in this situation. Moving forward, we're planning to improve the upgrade/compatibilit

Re: kafka streams metadata request fails on 0.10.0.1 broker/topic from 0.10.1.0 client

2016-10-20 Thread Michael Noll
t; > Just my 2 cents > > Decoupling the kafka streams from the core kafka changes will help so that > the broker can be upgraded without notice and streaming apps can evolve to > newer streaming features on their own pace > > Regards > Sai > > > On Wednesday, October 19, 2

Re: How to block tests of Kafka Streams until messages processed?

2016-10-20 Thread Michael Noll
Ali, my main feedback is similar to what Eno and Dave have already said. In your situation, options like these are what you'd currently need to do since you are writing directly from your Kafka Stream app to Cassandra, rather than writing from your app to Kafka and then using Kafka Connect to ing

Re: How to block tests of Kafka Streams until messages processed?

2016-10-20 Thread Michael Noll
dvantage to using the kafka connect method? Seems like > it'd just add an extra step of overhead? > > On Thu, Oct 20, 2016 at 12:35 PM, Michael Noll > wrote: > > > Ali, > > > > my main feedback is similar to what Eno and Dave have already said. In > >

Re: Dismissing late messages in Kafka Streams

2016-10-20 Thread Michael Noll
Nicolas, > I set the maintain duration of the window to 30 days. > If it consumes a message older than 30 days, then a new aggregate is created for this old window. I assume you mean: If a message should have been included in the original ("old") window but that message happens to arrive late (a

Re: Kafka Streaming

2016-10-20 Thread Michael Noll
I suspect you are running Kafka 0.10.0.x on Windows? If so, this is a known issue that is fixed in Kafka 0.10.1 that was just released today. Also: which examples are you referring to? And, to confirm: which git branch / Kafka version / OS in case my guess above was wrong. On Thursday, October

Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Michael Noll
Hi all! Great to see we are in the process of creating a cool logo for Kafka Streams. First, I apologize for sharing feedback so late -- I just learned about it today. :-) Here's my *personal, subjective* opinion on the currently two logo candidates for Kafka Streams. TL;DR: Sorry, but I really

Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo

2020-08-19 Thread Michael Noll
: users@kafka.apache.org > > Cc: d...@kafka.apache.org > > Subject: Re: [VOTE] KIP-657: Add Customized Kafka Streams Logo > > > > I echo what Michael says here. > > > > Another consideration is that logos are often shrunk (when used on > slides) > > an

Re: Is it possible to resubcribe KafkaStreams in runtime to different set of topics?

2016-11-09 Thread Michael Noll
This is not possible at the moment. However, depending on your use case, you might be able to leverage regex topic subscriptions (think: "b*" to read from all topics starting with letter `b`). On Wed, Nov 9, 2016 at 10:56 AM, Timur Yusupov wrote: > Hello, > > In our system it is possible to add

Re: Is it possible to resubcribe KafkaStreams in runtime to different set of topics?

2016-11-09 Thread Michael Noll
I am not aware of any short-term plans to support that, but perhaps others in the community / mailing list are. On Wed, Nov 9, 2016 at 11:15 AM, Timur Yusupov wrote: > Are there any nearest plans to support that? > > On Wed, Nov 9, 2016 at 1:11 PM, Michael Noll wrote: > >

Re: Process KTable on Predicate

2016-11-15 Thread Michael Noll
Nick, if I understand you correctly you can already do this today: Think: KTable.toStream().filter().foreach() (or just KTable.filter().foreach(), depending on what you are aiming to do) Would that work for you? On Sun, Nov 13, 2016 at 12:12 AM, Nick DeCoursin wrote: > Feature proposal: > >

Re: Kafka Streams internal topic naming

2016-11-16 Thread Michael Noll
Srikanth, no, there's isn't any API to control the naming of internal topics. Is the reason you're asking for such functionality only/mostly about multi-tenancy issues (as you mentioned in your first message)? -Michael On Wed, Nov 16, 2016 at 8:20 PM, Srikanth wrote: > Hello, > > Does kafka

Re: Kafka Streams internal topic naming

2016-11-18 Thread Michael Noll
> > > > Srikanth > > > > On Wed, Nov 16, 2016 at 2:32 PM, Michael Noll > wrote: > > > >> Srikanth, > >> > >> no, there's isn't any API to control the naming of internal topics. > >> > >> Is the reason you'

Re: Kafka windowed table not aggregating correctly

2016-11-21 Thread Michael Noll
On Mon, Nov 21, 2016 at 1:06 PM, Sachin Mittal wrote: > I am using kafka_2.10-0.10.0.1. > Say I am having a window of 60 minutes advanced by 15 minutes. > If the stream app using timestamp extractor puts the message in one or more > bucket(s), it will get aggregated in those buckets. > I assume t

Re: Kafka Streaming message loss

2016-11-21 Thread Michael Noll
Please don't take this comment the wrong way, but have you double-checked whether your counting code is working correctly? (I'm not implying this could be the only reason for what you're observing.) -Michael On Fri, Nov 18, 2016 at 4:52 PM, Eno Thereska wrote: > Hi Ryan, > > Perhaps you could

Re: Kafka Streaming message loss

2016-11-21 Thread Michael Noll
Also: Since your testing is purely local, feel free to share the code you have been using so that we can try to reproduce what you're observing. -Michael On Mon, Nov 21, 2016 at 4:04 PM, Michael Noll wrote: > Please don't take this comment the wrong way, but have you d

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Michael Noll
Mikael, regarding your second question: > 2) Regarding the use case, the topology looks like this: > > .stream(...) > .aggregate(..., "store-1") > .mapValues(...) > .through(..., "store-2") The last operator above would, without "..." ellipsis, be sth like `KTable#through("through-topic", "store

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-23 Thread Michael Noll
e > created by `through()` and `to()` [...] Addendum: And that's because the topic that is created by `KTable#through()` and `KTable#to()` is, by definition, a changelog of that KTable and the associated state store. On Wed, Nov 23, 2016 at 4:15 PM, Michael Noll wrote: > Mikael,

Re: KafkaStreams KTable#through not creating changelog topic

2016-11-25 Thread Michael Noll
> > > works > > > > perfectly fine with with a naming convention for the topics and by > > > creating > > > > them in Kafka upfront. > > > > > > > > My point is that it would help me (and maybe others), if the API of >

Re: Data (re)processing with Kafka (new wiki page)

2016-11-25 Thread Michael Noll
Thanks a lot, Matthias! I have already begun to provide feedback. -Michael On Wed, Nov 23, 2016 at 11:41 PM, Matthias J. Sax wrote: > Hi, > > we added a new wiki page that is supposed to collect data (re)processing > scenario with Kafka: > > https://cwiki.apache.org/confluence/display/KAFKA/

Re: Interactive Queries

2016-11-28 Thread Michael Noll
There are also some examples/demo applications at https://github.com/confluentinc/examples that demonstrate the use of interactive queries: - https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExampl

Re: I need some help with the production server architecture

2016-12-01 Thread Michael Noll
+1 to what Dave said. On Thu, Dec 1, 2016 at 4:29 PM, Tauzell, Dave wrote: > For low volume zookeeper doesn't seem to use many resources. I would put > it on nodejs server as that will have less IO and heavy IO could impact > zookeeper. Or, you could put some ZK nodes on nodejs and some on

Re: Kafka Logo as HighRes or Vectorgraphics

2016-12-02 Thread Michael Noll
Jan, Here's vector files for the logo. One of our teammates went ahead and helped resized it to fit nicely into a 2x4m board with 15cm of margin all around. Note: I was told to kindly remind you (and other readers of this) to follow the Apache branding guidelines for the logo, and please not mani

Re: kafka streams consumer partition assignment is uneven

2017-01-09 Thread Michael Noll
What does the processing topology of your Kafka Streams application look like, and what's the exact topic and partition configuration? You say you have 12 partitions in your cluster, presumably across 7 topics -- that means that most topics have just a single partition. Depending on your topology

Re: [ANNOUNCE] New committer: Grant Henke

2017-01-16 Thread Michael Noll
My congratulations, Grant -- more work's awaiting you then. ;-) Best wishes, Michael On Fri, Jan 13, 2017 at 2:50 PM, Jeff Holoman wrote: > Well done Grant! Congrats! > > On Thu, Jan 12, 2017 at 1:13 PM, Joel Koshy wrote: > > > Hey Grant - congrats! > > > > On Thu, Jan 12, 2017 at 10:00 AM,

Re: Kafka Streams: from a KStream, aggregating records with the same key and updated metrics ?

2017-01-16 Thread Michael Noll
Nicolas, quick feedback on timestamps: > In our system, clients send data to an HTTP API. This API produces the > records in Kafka. I can't rely on the clock of the clients sending the > original data, (so the records' timestamps are set by the servers ingesting > the records in Kafka), but I can

Re: Kafka Streams: from a KStream, aggregating records with the same key and updated metrics ?

2017-01-17 Thread Michael Noll
1:00 Nicolas Fouché : > > > Hi Michael, > > > > got it. I understand that it would be less error-prone to generate the > > final "altered" timestamp on the Producer side, instead of trying to > > compute it each time the record is consumed. > > > > T

Re: Kafka Streams: how can I get the name of the Processor when calling `KStream.process`

2017-01-18 Thread Michael Noll
Nicolas, if I understand your question correctly you'd like to add further operations after having called `KStream#process()`, which -- as you report -- doesn't work because `process()` returns void. If that's indeed the case, +1 to Damian's suggest to use `KStream.transform()` instead of `KStrea

Re: Kafka Streams: how can I get the name of the Processor when calling `KStream.process`

2017-01-18 Thread Michael Noll
essor was generated by a > high-level topologies. And names of processors created by `KStreamBuilder` > are not accessible. (unless by inspecting the topology nodes I guess) > > [1] https://gist.github.com/nfo/c4936a24601352db23b18653a8ccc352 > > Thanks. > Nicolas > &

Re: Streams: Global state & topic multiplication questions

2017-01-20 Thread Michael Noll
As Eno said I'd use the interactive queries API for Q2. Demo apps: - https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java - https://github.com/confluentinc/examples/blob/3.1.x/kafka-stream

Re: Fwd: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-27 Thread Michael Noll
t;>>>> afterwards >>>>> >>>>>> but we have already decided to materialize it, we can replace the >>>>>>>> >>>>>>> internal >>>>>>> >>>>>>>> name with the user's provided

  1   2   >