Re: KSQL with Apache Kafka

2017-09-19 Thread Koert Kuipers
we are using the other components of confluent platform without installing the confluent platform, and its no problem at all. i dont see why it would be any different with this one. On Tue, Sep 19, 2017 at 1:38 PM, Buntu Dev wrote: > Based on the prerequisites mentioned on Github, Confluent plat

Re: struggling with runtime Schema in connect

2017-07-26 Thread Koert Kuipers
if your sink and source tasks are > in > > different jvms. > > > > [1] > > https://github.com/apache/kafka/blob/trunk/connect/json/ > > src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L299-L321 > > > > On Mon, Jul 10, 2017 at 9:06 AM, Koer

Re: struggling with runtime Schema in connect

2017-07-10 Thread Koert Kuipers
c8c/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/ > WorkerSourceTask.java#L182-L183 > > [3] > https://github.com/confluentinc/schema-registry/ > blob/master/avro-converter/src/main/java/io/confluent/ > connect/avro/AvroConverter.java > > [4] > https://githu

struggling with runtime Schema in connect

2017-07-08 Thread Koert Kuipers
i see kafka connect invented its own runtime data type system in org.apache.kafka.connect.data however i struggle to understand how this is used. the payload in kafka is bytes. kafka does not carry any "schema" metadata. so how does connect know what the schema is of a ConnectRecord? if i write j

Re: connect in 0.11.0.0 warnings due to class not found exceptions

2017-07-06 Thread Koert Kuipers
, is Connect starting up successfully after all these warnings are > logged? > > Konstantine > > > On Thu, Jul 6, 2017 at 3:33 PM, Koert Kuipers wrote: > > > i just did a test upgrade to kafka 0.11.0.0 and i am seeing lots of > > ClassNotFoundException in the logs for con

connect in 0.11.0.0 warnings due to class not found exceptions

2017-07-06 Thread Koert Kuipers
i just did a test upgrade to kafka 0.11.0.0 and i am seeing lots of ClassNotFoundException in the logs for connect-distributed upon startup, see below. is this expected? kind of curious why its looking for say gson while gson jar is not in libs folder. best, koert [2017-07-06 22:20:41,844] INFO R

Re: kafka connect architecture

2017-01-31 Thread Koert Kuipers
see inline. best, koert On Tue, Jan 31, 2017 at 1:56 AM, Ewen Cheslack-Postava wrote: > On Mon, Jan 30, 2017 at 8:24 AM, Koert Kuipers wrote: > > > i have been playing with kafka connect in standalone and distributed > mode. > > > > i like standalone because: > &

kafka connect architecture

2017-01-30 Thread Koert Kuipers
i have been playing with kafka connect in standalone and distributed mode. i like standalone because: * i get to configure it using a file. this is easy for automated deployment (chef, puppet, etc.). configuration using a rest api i find inconvenient. * erors show up in log files instead of having

Re: possible bug or inconsistency in kafka-clients

2017-01-28 Thread Koert Kuipers
that sounds like it. thanks! On Sat, Jan 28, 2017 at 12:49 PM, Vahid S Hashemian < vahidhashem...@us.ibm.com> wrote: > Could this be the same issue as the one reported here? > https://issues.apache.org/jira/browse/KAFKA-4547 > > --Vahid > > > > > > Fr

possible bug or inconsistency in kafka-clients

2017-01-27 Thread Koert Kuipers
hello all, i just wanted to point out a potential issue in kafka-clients 0.10.1.1 i was using spark-sql-kafka-0-10, which is spark structured streaming integration for kafka. it depends on kafka-clients 0.10.0.1 but since my kafka servers are 0.10.1.1 i decided to upgrade kafka-clients to 0.10.1.

Re: sharing storage topics between different distributed connect clusters

2016-11-27 Thread Koert Kuipers
> you used the same topics in the same cluster, the values between different > Connect clusters targeted at the same Kafka cluster would become conflated. > > -Ewen > > On Sat, Nov 26, 2016 at 7:59 PM, Koert Kuipers wrote: > > > if i were to run multiple distributed c

sharing storage topics between different distributed connect clusters

2016-11-26 Thread Koert Kuipers
if i were to run multiple distributed connect clusters (so with different group.id) does each connect cluster need its own offset.storage.topic, config.storage.topic and status.storage.topic? or can they safely be shared between the clusters? thanks! koert

Re: no luck with kafka-connect on secure cluster

2016-11-26 Thread Koert Kuipers
erceptors, which use the same config name but different interfaces > for ProducerInterceptor vs ConsumerInterceptor). For configs we know might > be shared, we'd like to find a way to make this configuration simpler. > > -Ewen > > On Fri, Nov 25, 2016 at 10:51 AM, Koert Kuipers wrote

Re: no luck with kafka-connect on secure cluster

2016-11-25 Thread Koert Kuipers
well it seems if you run connect in distributed mode... its again security.protocol=SASL_PLAINTEXT and not producer.security.protocol= SASL_PLAINTEXT dont ask me why On Thu, Nov 24, 2016 at 10:40 PM, Koert Kuipers wrote: > for anyone that runs into this. turns out i also had to

Re: no luck with kafka-connect on secure cluster

2016-11-24 Thread Koert Kuipers
for anyone that runs into this. turns out i also had to set: producer.security.protocol=SASL_PLAINTEXT producer.sasl.kerberos.service.name=kafka On Thu, Nov 24, 2016 at 8:54 PM, Koert Kuipers wrote: > i have a secure kafka 0.10.1 cluster using SASL_PLAINTEXT > > the kafka servers

no luck with kafka-connect on secure cluster

2016-11-24 Thread Koert Kuipers
i have a secure kafka 0.10.1 cluster using SASL_PLAINTEXT the kafka servers seem fine, and i can start console-consumer and console-producer and i see the message i type in the producer pop up in the consumer. no problems so far. for example to start console-producer: $ kinit $ export KAFKA_OPTS=

Re: Common Form of Data Written to Kafka for Data Ingestion

2015-03-24 Thread Koert Kuipers
avro seems to be the standard at linked-in i know json and protobuf are used at a few places On Tue, Mar 24, 2015 at 11:49 PM, Rendy Bambang Junior < rendy.b.jun...@gmail.com> wrote: > Hi, > > I'm a new Kafka user. I'm planning to send web usage data from application > to S3 for EMR and MongoDB

Re: Alternative to camus

2015-03-19 Thread Koert Kuipers
code base for dumping > data into hdfs from kafka using spark ? > > > On Fri, Mar 20, 2015 at 12:20 AM, Koert Kuipers wrote: > >> we load from kafka into hdfs using spark in batch mode, once a day. it's >> very simple (74 lines of code) and works fine. >> &

Re: Alternative to camus

2015-03-19 Thread Koert Kuipers
we load from kafka into hdfs using spark in batch mode, once a day. it's very simple (74 lines of code) and works fine. On Fri, Mar 13, 2015 at 4:11 PM, Gwen Shapira wrote: > Camus uses MapReduce though. > If Alberto uses Spark exclusively, I can see why installing MapReduce > cluster (with or w

Re: ping kafka server

2015-02-09 Thread Koert Kuipers
wrote: > I have used nagios in this manner with kafaka before and worked fine. > > On Mon, Feb 9, 2015 at 2:48 PM, Koert Kuipers wrote: > > > i would like to be able to ping kafka servers from nagios to confirm they > > are alive. since kafka servers dont run a http

ping kafka server

2015-02-09 Thread Koert Kuipers
i would like to be able to ping kafka servers from nagios to confirm they are alive. since kafka servers dont run a http server (web ui) i am not sure how to do this. is it safe to establish a "test" tcp connection (so connect and immediately disconnect using telnet or netstat or something like th

Re: How to handle broker disk failure

2015-01-21 Thread Koert Kuipers
same situation with us. we run jbod and actually dont replace the failed data disks at all. we simply keep boxes running until non-failed drives falls below some threshold. so our procedure with kafka would be: 1) ideally kafka server simply survives failed disk and keeps going, and fixes itself wi

Re: Poll: Producer/Consumer impl/language you use?

2015-01-20 Thread Koert Kuipers
no scala? although scala can indeed use the java api, its ugly we prefer to use the scala api (which i believe will go away unfortunately) On Tue, Jan 20, 2015 at 2:52 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > I was wondering which implementations/languages people us

Re: How to handle broker disk failure

2015-01-20 Thread Koert Kuipers
i think it would be nice if the recommended setup for kafka is jbod and not raid because: * it makes it easy to "test" kafka on an existing hadoop/spark cluster * co-location, for example we colocate kafka and spark streaming (our spark streaming app is kafka partition location aware) ideally kafk

Re: spark kafka batch integration

2014-12-15 Thread Koert Kuipers
gwen, i thought about it a little more and i feel pretty confident i can make it so that it's deterministic in case of node failure. will push that change out after holidays. On Mon, Dec 15, 2014 at 12:03 AM, Koert Kuipers wrote: > > hey gwen, > > no immediate plans to contribut

getOffsetsBefore and latest

2014-12-15 Thread Koert Kuipers
i read in several places that getOffsetsBefore does not necessary returns the last offset before the timestamp, because it is basically file based (so it works at the granularity of the files kafka produces). what about getOffsetsBefore using kafka.api.OffsetRequest.LatestTime? am i safe to assume

Re: spark kafka batch integration

2014-12-14 Thread Koert Kuipers
adoop <http://www.twitter.com/allthingshadoop> > ********/ > > On Sun, Dec 14, 2014 at 8:22 PM, Koert Kuipers wrote: > > > > hello all, > > we at tresata wrote a library to provide for batch integration between > > spark and kafka. it supports: > > * distribute

Re: spark kafka batch integration

2014-12-14 Thread Koert Kuipers
ark > App is running? Will the RDD recovery process get the exact same data > from Kafka as the original? even if we wrote additional data to Kafka > in the mean time? > > Gwen > > On Sun, Dec 14, 2014 at 5:22 PM, Koert Kuipers wrote: > > hello all, > > we at tre

spark kafka batch integration

2014-12-14 Thread Koert Kuipers
hello all, we at tresata wrote a library to provide for batch integration between spark and kafka. it supports: * distributed write of rdd to kafa * distributed read of rdd from kafka our main use cases are (in lambda architecture speak): * periodic appends to the immutable master dataset on hdfs

Re: No longer supporting Java 6, if? when?

2014-11-06 Thread Koert Kuipers
when is java 6 dropped by the hadoop distros? i am still aware of many clusters that are java 6 only at the moment. On Thu, Nov 6, 2014 at 12:44 PM, Gwen Shapira wrote: > +1 for dropping Java 6 > > On Thu, Nov 6, 2014 at 9:31 AM, Steven Schlansker < > sschlans...@opentable.com > > wrote: > >

Re: [ANNOUNCEMENT] Apache Kafka 0.8.2-beta Released

2014-11-01 Thread Koert Kuipers
joe, looking at those 0.8.2 beta javadoc I also see a Consumer api and KafkaConsumer class. they look different from what I currently use in 8.1.1. Is this new? And this is not the 0.9 consumer? thanks, koert On Oct 30, 2014 8:01 AM, "Joe Stein" wrote: > Hey, yeah! > > For the new producer > http