Re: Performance issue with KafkaStreams

2016-09-19 Thread Guozhang Wang
Hello Caleb,

`./gradlew jar` should be sufficient to run the SimpleBenchmark. Could you
look into kafka-run-class.sh and see if "streams/build/libs
/kafka-streams*.jar" is added to the dependent path? In trunk it is added.


Guozhang


On Sat, Sep 17, 2016 at 11:30 AM, Eno Thereska 
wrote:

> Hi Caleb,
>
> I usually do './gradlew installAll' first and that places all the jars in
> my local maven repo in ~/.m2/repository.
>
> Eno
>
> > On 17 Sep 2016, at 00:30, Caleb Welton  wrote:
> >
> > Is there a specific way that I need to build kafka for that to work?
> >
> > bash$  export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh
> > org.apache.kafka.streams.perf.SimpleBenchmark
> > Error: Could not find or load main class
> > org.apache.kafka.streams.perf.SimpleBenchmark
> >
> > bash$ find . -name SimpleBenchmark.java
> > ./streams/src/test/java/org/apache/kafka/streams/perf/
> SimpleBenchmark.java
> >
> > bash$  find . -name SimpleBenchmark.class
> >
> > bash$ jar -tf streams/build/libs/kafka-streams-0.10.1.0-SNAPSHOT.jar |
> grep
> > SimpleBenchmark
> >
> >
> >
> > On Sat, Sep 10, 2016 at 6:18 AM, Eno Thereska 
> > wrote:
> >
> >> Hi Caleb,
> >>
> >> We have a benchmark that we run nightly to keep track of performance.
> The
> >> numbers we have do indicate that consuming through streams is indeed
> slower
> >> than just a pure consumer, however the performance difference is not as
> >> large as you are observing. Would it be possible for you to run this
> >> benchmark in your environment and report the numbers? The benchmark is
> >> included with Kafka Streams and you run it like this:
> >>
> >> export INCLUDE_TEST_JARS=true; ./bin/kafka-run-class.sh
> >> org.apache.kafka.streams.perf.SimpleBenchmark
> >>
> >> You'll need a Kafka broker to be running. The benchmark will report
> >> various numbers, but one of them is the performance of the consumer and
> the
> >> performance of streams reading.
> >>
> >> Thanks
> >> Eno
> >>
> >>> On 9 Sep 2016, at 18:19, Caleb Welton 
> >> wrote:
> >>>
> >>> Same in both cases:
> >>> client.id=Test-Prototype
> >>> application.id=test-prototype
> >>>
> >>> group.id=test-consumer-group
> >>> bootstrap.servers=broker1:9092,broker2:9092zookeeper.connect=zk1:2181
> >>> replication.factor=2
> >>>
> >>> auto.offset.reset=earliest
> >>>
> >>>
> >>> On Friday, September 9, 2016 8:48 AM, Eno Thereska <
> >> eno.there...@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> Hi Caleb,
> >>>
> >>> Could you share your Kafka Streams configuration (i.e., StreamsConfig
> >>> properties you might have set before the test)?
> >>>
> >>> Thanks
> >>> Eno
> >>>
> >>>
> >>> On Thu, Sep 8, 2016 at 12:46 AM, Caleb Welton 
> >> wrote:
> >>>
>  I have a question with respect to the KafkaStreams API.
> 
>  I noticed during my prototyping work that my KafkaStreams application
> >> was
>  not able to keep up with the input on the stream so I dug into it a
> bit
> >> and
>  found that it was spending an inordinate amount of time in
>  org.apache.kafka.common.network.Seloctor.select().  Not exactly a
> >> shooting
>  gun itself, so I dropped the implementation down to a single processor
>  reading off a source.
> 
>  public class TestProcessor extends AbstractProcessor {
>    static long start = -1;
>    static long count = 0;
> 
>    @Override
>    public void process(String key, String value) {
>    if (start < 0) {
>    start = System.currentTimeMillis();
>    }
>    count += 1;
>    if (count > 100) {
>    long end = System.currentTimeMillis();
>    double time = (end-start)/1000.0;
>    System.out.printf("Processed %d records in %f seconds (%f
>  records/s)\n", count, time, count/time);
>    start = -1;
>    count = 0;
>    }
> 
>    }
> 
>  }
> 
>  ...
> 
> 
>  TopologyBuilder topologyBuilder = new TopologyBuilder();
>  topologyBuilder
>    .addSource("SOURCE", stringDeserializer, StringDeserializer,
>  "input")
>    .addProcessor("PROCESS", TestProcessor::new, "SOURCE");
> 
> 
> 
>  Which I then ran through the KafkaStreams API, and then repeated with
>  the KafkaConsumer API.
> 
>  Using the KafkaConsumer API:
>  Processed 101 records in 1.79 seconds (558659.776536
> records/s)
>  Processed 101 records in 1.229000 seconds (813670.463792
> records/s)
>  Processed 101 records in 1.106000 seconds (904160.036166
> records/s)
>  Processed 101 records in 1.19 seconds (840336.974790
> records/s)
> 
>  Using the KafkaStreams API:
>  Processed 101 records in 6.407000 seconds (156079.444358
> records/s)
>  Processed 101 records in 5.256000 seconds 

Re: Exception while deserializing in kafka streams

2016-09-19 Thread Guozhang Wang
Hello Walter,

The WARN log entry should not be the cause of this issue.

I double checked the 0.10.0.0 release and this issue should not really
happen, so your observation is a bit weird to me. Could your add a log
entry in the `configure` function which constructs the registry client to
make sure it is indeed triggered when the streams app start up?


Guozhang



On Fri, Sep 16, 2016 at 2:27 PM, Walter rakoff 
wrote:

> Guozhang,
>
> Any clues on this one?
>
> Walter
>
> On Wed, Sep 14, 2016 at 9:46 PM, Walter rakoff 
> wrote:
>
> > Guozhang,
> >
> > I am using 0.10.0.0. Could the below log be the cause?
> >
> > 16/09/14 17:24:35 WARN ConsumerConfig: The configuration
> > schema.registry.url = http://192.168.50.6: 
> 8081
> > was supplied but isn't a known config.
> > 16/09/14 17:24:35 INFO AppInfoParser: Kafka version : 0.10.0.0
> > 16/09/14 17:24:35 INFO AppInfoParser: Kafka commitId : b8642491e78c5a13
> >
> > Walter
> >
> >
> > On Wed, Sep 14, 2016 at 8:11 PM, Guozhang Wang 
> wrote:
> >
> >> Hello Walter,
> >>
> >> Which version of Kafka were you using?
> >>
> >> I ask this because there was a bug causing the serde passed through
> config
> >> to NOT being configured when constructed:
> >> https://issues.apache.org/jira/browse/KAFKA-3639
> >>
> >>
> >> Which is fixed in the 0.10.0.0 release, which means you will only hit it
> >> if
> >> you are using the tech-preview release version.
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Wed, Sep 14, 2016 at 11:10 AM, Walter rakoff <
> walter.rak...@gmail.com>
> >> wrote:
> >>
> >> > Hello,
> >> >
> >> > I get the below exception when deserilaizing avro records
> >> > using KafkaAvroDeserializer.
> >> >
> >> > 16/09/14 17:24:39 INFO StreamThread: Stream thread shutdown complete
> >> > [StreamThread-1]
> >> > Exception in thread "StreamThread-1"
> >> > org.apache.kafka.common.errors.SerializationException:
> >> > Error deserializing Avro message for id 4
> >> > Caused by: java.lang.NullPointerException
> >> > at io.confluent.kafka.serializers.
> AbstractKafkaAvroDeserializer
> >> .
> >> > deserialize(AbstractKafkaAvroDeserializer.java:120)
> >> > at io.confluent.kafka.serializers.
> AbstractKafkaAvroDeserializer
> >> .
> >> > deserialize(AbstractKafkaAvroDeserializer.java:92)
> >> >
> >> > I can confirm that schema registry URL is accessible and
> >> url/schemas/ids/4
> >> > does return valid schema.
> >> > May be some initialization didn't happen correctly?
> >> >
> >> > props.put(AbstractKafkaAvroSerDeConfig.
> SCHEMA_REGISTRY_URL_CONFIG,
> >> "
> >> > 192.168.50.6:8081")
> >> > props.put(StreamsConfig.KEY_SERDE_CLASS_CONFIG,
> >> > Serdes.Long.getClass.getName)
> >> > props.put(StreamsConfig.VALUE_SERDE_CLASS_CONFIG, classOf[
> >> > GenericAvroSerdeWithSchemaRegistry])
> >> >
> >> > GenericAvroSerdeWithSchemaRegistry code -->
> https://www.dropbox.com/s/
> >> > y471k9nj94tlxro/avro_serde.txt?dl=0
> >> >
> >> > Walter
> >> >
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
> >
>



-- 
-- Guozhang


Re: Error kafka-stream method punctuate in context.forward()

2016-09-19 Thread Guozhang Wang
Hello Hamza,

Which Kafka version are you using with this application? Also could you
share your code skeleton of the StatsByDay processor implementation?


Guozhang


On Fri, Sep 16, 2016 at 6:58 AM, Hamza HACHANI 
wrote:

> Good morning,
>
> I have a problem with a kafka-stream application.
>
> In fact I 've created already two kafka stream applications :
>
> StatsByMinute : entry topic : uplinks, out topic : statsM.
>
> StatsByHour : entrey topic : statsM, out topic : statsH.
>
> StatsByDay : entry topic : statsH, out topic : statsD.
>
>
>
> The three of these application hava naerly the same struture (
> StatsByMinute and StatsBy Hour/Stats By Day are only different in the
> application ID KVstore and the mthos process() ).
>
> StatsBy Day and Stats BY Hour have exactly the same structure (the only
> exception is the ID parameters) .
>
>
> The Problem is that stastByMinute and StatsByHour works parfectly.
>
> But this this not the case for StatsByDay where i verified that i do
> receive data and process it (so process works). but in the line
> context.forward in punctuate  there is a problem.
>
> I get the following error :
>
>
> [2016-09-16 15:44:24,467] ERROR Streams application error during
> processing in thread [StreamThread-1]:  (org.apache.kafka.streams.
> processor.internals.StreamThread)
> java.lang.NullPointerException
> at org.apache.kafka.streams.processor.internals.
> StreamTask.forward(StreamTask.java:336)
> at org.apache.kafka.streams.processor.internals.
> ProcessorContextImpl.forward(ProcessorContextImpl.java:187)
> at com.actility.tpk.stat.BaseProcessor.punctuate(
> BaseProcessor.java:54)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.punctuate(StreamTask.java:227)
> at org.apache.kafka.streams.processor.internals.
> PunctuationQueue.mayPunctuate(PunctuationQueue.java:45)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.maybePunctuate(StreamTask.java:212)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.maybePunctuate(StreamThread.java:407)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.runLoop(StreamThread.java:325)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:218)
> Exception in thread "StreamThread-1" java.lang.NullPointerException
> at org.apache.kafka.streams.processor.internals.
> StreamTask.forward(StreamTask.java:336)
> at org.apache.kafka.streams.processor.internals.
> ProcessorContextImpl.forward(ProcessorContextImpl.java:187)
> at com.actility.tpk.stat.BaseProcessor.punctuate(
> BaseProcessor.java:54)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.punctuate(StreamTask.java:227)
> at org.apache.kafka.streams.processor.internals.
> PunctuationQueue.mayPunctuate(PunctuationQueue.java:45)
> at org.apache.kafka.streams.processor.internals.
> StreamTask.maybePunctuate(StreamTask.java:212)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.maybePunctuate(StreamThread.java:407)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.runLoop(StreamThread.java:325)
> at org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:218)
>
>
>


-- 
-- Guozhang


Kafka cluster behavior: broker no longer belonging to ISR, offsets reset to middle of partition

2016-09-19 Thread Chris Richardson
Hi,

One of my Kafka 0.9.0.1 clusters (3 brokers,
default.replication.factor=2) that has been working fine until yesterday.
The message volume was pretty low. There were no obvious problems except

The first symptom was *kafka-consumer-groups.sh* failing with an empty.head
exception.

When I used *kafka-topics --describe* I saw that one of the brokers was no
longer part of the appropriate ISRs.
Restarting that broker appeared not solve the problem.
In fact, I got the impression that the broker was temporarily in the ISR
and then left again.

I think I restarted each broker and eventually things returned to normal.
The problem then reoccurred a couple of hours later.

During this time, I also had a problem with one of the Kafka 0.8.2.1
clients:

ERROR kafka.consumer.ConsumerFetcherThread -
[ConsumerFetcherThread-mytopic-consumer-81a939d49903-1474231957102-90b6fc16-0-6],
Current offset 5488 for partition [mytopic,5] out of range; reset offset to
2340\n","stream":"stdout","time":"2016-09-18T21:13:54.151815047Z"}

This topic partition had >5488 messages so there offset was definitely not
out of range. The result was that the consumer reprocessed old messages.
The lag as reported by kafka-consumer-groups.sh when from <10 to > 2500

Thoughts? Recommendations for debugging this problem when it occurs again?

Chris


Re: Running Zookeeper on same instance as kafka brokers in multi node cluster

2016-09-19 Thread Craig Swift
Hello,

Initially we had our setup with both ZK and Kafka on the same nodes as
well. Over time though this proved problematic as we both increased the
usage of Kafka and had to scale out the cluster. ZK specifically can be
very touchy if it gets IO bound - so when the Kafka cluster would get under
high load we would start to see slow response and in the worst case
scenario a cascading failure. Also, from my understanding ZK doesn't really
scale well past five nodes - so once your cluster grows large enough your
nodes become non-homogeneous as you're only running ZK on certain nodes. So
from our experience it was much cleaner to spin up a 3 node ZK cluster for
each Kafka cluster we ran, dedicated to that cluster. That allows us to
grow out the Kafka cluster when needed without ever worrying about ZK. In
most cases we're more concerned about the ZK integrity and keeping that
solid and find it fairly easy to reprovision/add Kafka nodes when
necessary. Hope that helps.

Craig


On Mon, Sep 19, 2016 at 3:01 PM, Digumarthi, Prabhakar Venkata Surya <
prabhakarvenkatasurya.digumar...@capitalone.com> wrote:

> Hi Team,
>
>
> What are the downsides of installing Zookeeper and kafka on same machine,
> in multi broker environment?
>
> We are trying to install Zookeeper and kafka in AWS world and its becoming
> difficult for us to maintain ZK and Kafka with some issues. Also
> re-provisioning  ZK and Kafka instances separately is getting complicated.
> Thanks,
> Prabhakar
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>


Running Zookeeper on same instance as kafka brokers in multi node cluster

2016-09-19 Thread Digumarthi, Prabhakar Venkata Surya
Hi Team,


What are the downsides of installing Zookeeper and kafka on same machine, in 
multi broker environment?

We are trying to install Zookeeper and kafka in AWS world and its becoming 
difficult for us to maintain ZK and Kafka with some issues. Also 
re-provisioning  ZK and Kafka instances separately is getting complicated.
Thanks,
Prabhakar


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Schema for jsonConverter

2016-09-19 Thread Srikrishna Alla
Thanks Shikar. I made this change and it's working now. 

Thanks,
Sri

> On Sep 19, 2016, at 2:25 PM, Shikhar Bhushan  wrote:
> 
> Hi Srikrishna,
> 
> The issue is that you are using "name" to specify the field name for the
> struct's fields. The correct key to use is "field".
> 
> Best,
> 
> Shikhar
> 
>> On Thu, Sep 15, 2016 at 4:23 PM Gwen Shapira  wrote:
>> 
>> ah, never mind - I just noticed you do use a schema... Maybe you are
>> running into this? https://issues.apache.org/jira/browse/KAFKA-3055
>> 
>>> On Thu, Sep 15, 2016 at 4:20 PM, Gwen Shapira  wrote:
>>> Most people use JSON without schema, so you should probably change
>>> your configuration to:
>>> 
>>> key.converter.schemas.enable=false
>>> value.converter.schemas.enable=false
>>> 
>>> On Thu, Sep 15, 2016 at 4:04 PM, Srikrishna Alla
>>>  wrote:
 I am trying to use jdbc connector to send records from Kafka 0.9 to DB.
>> I
 am using jsonConverter to convert the records. My connector is failing
>> when
 its checking the Schema I am using. Please let me know what is the issue
 with my json schema.
 
 Configuration used:
 key.converter=org.apache.kafka.connect.storage.StringConverter
 value.converter=org.apache.kafka.connect.json.JsonConverter
 # Converter-specific settings can be passed in by prefixing the
>> Converter's
 setting with the converter we want to apply
 # it to
 key.converter.schemas.enable=true
 value.converter.schemas.enable=true
 
 Record that has been sent to the topic -
>> {"schema":{"type":"struct","fields":[{"name":"error_code","type":"string","optional":"false"},{"name":"error_time","type":"string","optional":"false"},{"name":"error_msg","type":"string","optional":"false"},{"name":"source","type":"string","optional":"false"},{"name":"criticality","type":"string","optional":"false"}]},"payload":{"error_code":"RAW104","error_time":"09/15/2016@18
>> :00:32","error_msg":"Not
 accepting","source":"APPLICATION","criticality":"WARN"}}
 
 
 Error I am seeing:
 [2016-09-15 18:01:07,513] ERROR Thread WorkerSinkTask-jdbc-sink-test-0
 exiting with uncaught exception:
 (org.apache.kafka.connect.util.ShutdownableThread:84)
 *org.apache.kafka.connect.errors.DataException: Struct schema's field
>> name
 not specified properly*
   at
>> org.apache.kafka.connect.json.JsonConverter.asConnectSchema(JsonConverter.java:493)
   at
>> org.apache.kafka.connect.json.JsonConverter.jsonToConnect(JsonConverter.java:344)
   at
>> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:334)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
   at
>> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
 Exception in thread "WorkerSinkTask-jdbc-sink-test-0"
 *org.apache.kafka.connect.errors.DataException:
 Struct schema's field name not specified properly*
   at
>> org.apache.kafka.connect.json.JsonConverter.asConnectSchema(JsonConverter.java:493)
   at
>> org.apache.kafka.connect.json.JsonConverter.jsonToConnect(JsonConverter.java:344)
   at
>> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:334)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
   at
>> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
   at
>> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
 
 Thanks,
 Sri
>>> 
>>> 
>>> 
>>> --
>>> Gwen Shapira
>>> Product Manager | Confluent
>>> 650.450.2760 | @gwenshap
>>> Follow us: Twitter | blog
>> 
>> 
>> 
>> --
>> Gwen Shapira
>> Product Manager | Confluent
>> 650.450.2760 | @gwenshap
>> Follow us: Twitter | blog
>> 


Re: Schema for jsonConverter

2016-09-19 Thread Shikhar Bhushan
Hi Srikrishna,

The issue is that you are using "name" to specify the field name for the
struct's fields. The correct key to use is "field".

Best,

Shikhar

On Thu, Sep 15, 2016 at 4:23 PM Gwen Shapira  wrote:

> ah, never mind - I just noticed you do use a schema... Maybe you are
> running into this? https://issues.apache.org/jira/browse/KAFKA-3055
>
> On Thu, Sep 15, 2016 at 4:20 PM, Gwen Shapira  wrote:
> > Most people use JSON without schema, so you should probably change
> > your configuration to:
> >
> > key.converter.schemas.enable=false
> > value.converter.schemas.enable=false
> >
> > On Thu, Sep 15, 2016 at 4:04 PM, Srikrishna Alla
> >  wrote:
> >> I am trying to use jdbc connector to send records from Kafka 0.9 to DB.
> I
> >> am using jsonConverter to convert the records. My connector is failing
> when
> >> its checking the Schema I am using. Please let me know what is the issue
> >> with my json schema.
> >>
> >> Configuration used:
> >> key.converter=org.apache.kafka.connect.storage.StringConverter
> >> value.converter=org.apache.kafka.connect.json.JsonConverter
> >> # Converter-specific settings can be passed in by prefixing the
> Converter's
> >> setting with the converter we want to apply
> >> # it to
> >> key.converter.schemas.enable=true
> >> value.converter.schemas.enable=true
> >>
> >> Record that has been sent to the topic -
> >>
> {"schema":{"type":"struct","fields":[{"name":"error_code","type":"string","optional":"false"},{"name":"error_time","type":"string","optional":"false"},{"name":"error_msg","type":"string","optional":"false"},{"name":"source","type":"string","optional":"false"},{"name":"criticality","type":"string","optional":"false"}]},"payload":{"error_code":"RAW104","error_time":"09/15/2016@18
> :00:32","error_msg":"Not
> >> accepting","source":"APPLICATION","criticality":"WARN"}}
> >>
> >>
> >> Error I am seeing:
> >> [2016-09-15 18:01:07,513] ERROR Thread WorkerSinkTask-jdbc-sink-test-0
> >> exiting with uncaught exception:
> >> (org.apache.kafka.connect.util.ShutdownableThread:84)
> >> *org.apache.kafka.connect.errors.DataException: Struct schema's field
> name
> >> not specified properly*
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.asConnectSchema(JsonConverter.java:493)
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.jsonToConnect(JsonConverter.java:344)
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:334)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
> >>at
> >>
> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
> >> Exception in thread "WorkerSinkTask-jdbc-sink-test-0"
> >> *org.apache.kafka.connect.errors.DataException:
> >> Struct schema's field name not specified properly*
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.asConnectSchema(JsonConverter.java:493)
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.jsonToConnect(JsonConverter.java:344)
> >>at
> >>
> org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:334)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:266)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:175)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
> >>at
> >>
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
> >>at
> >>
> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
> >>
> >> Thanks,
> >> Sri
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Consul / Zookeeper [was Re: any update on this?]

2016-09-19 Thread Dana Powers
[+ dev list]

I have not worked on KAFKA-1793 directly, but I believe most of the
work so far has been in removing all zookeeper dependencies from
clients. The two main areas for that are (1) consumer rebalancing, and
(2) administrative commands.

1) Consumer rebalancing APIs were added to the broker in 0.9. The "new
consumer" uses these apis and does not connect directly to zookeeper
to manage group leadership and rebalancing. So my understanding is
that this work is complete.

2) Admin commands are being converted to direct API calls instead of
direct zookeeper access with KIP-4. A small part of this project was
released in 0.10.0.0 and are open PRs for additional chunks that may
make it into 0.10.1.0 . If you are interested in helping or getting
involved, you can follow the KIP-4 discussions on the dev mailing
list.

When the client issues are completed I think the next step will be to
refactor the broker's zookeeper access (zkUtils) into an abstract
interface that could potentially be provided by consul or etcd. With
an interface in place, it should be possible to write an alternate
implementation of that interface for consul.

Hope this helps,

-Dana

On Mon, Sep 19, 2016 at 6:31 AM, Martin Gainty  wrote:
> Jens/Kant
> not aware of any shortfall with zookeeper so perhaps you can suggest 
> advantages for Consul vs Zookeeper?
> Maven (I am building, testing and running kafka internally with maven) 
> implements wagon-providers for URLConnection vs HttpURLConnection 
> wagonshttps://maven.apache.org/guides/mini/guide-wagon-providers.html
> Thinking a network_provider should work for integrating external network 
> provider. how would you architect this integration?
>
> would a configurable network-provider such as maven-wagon-provider work for 
> kafka?Martin
>
>> From: kanth...@gmail.com
>> To: users@kafka.apache.org
>> Subject: Re: any update on this?
>> Date: Mon, 19 Sep 2016 09:41:10 +
>>
>> Yes ofcourse the goal shouldn't be moving towards consul. It should just be
>> flexible enough for users to pick any distributed coordinated system.
>>
>>
>>
>>
>>
>>
>> On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se
>> wrote:
>> I think I read somewhere that the long-term goal is to make Kafka
>>
>> independent of Zookeeper alltogether. Maybe not worth spending time on
>>
>> migrating to Consul in that case.
>>
>>
>>
>>
>> Cheers,
>>
>> Jens
>>
>>
>>
>>
>> On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain 
>>
>> wrote:
>>
>>
>>
>>
>> > +2 watching.
>>
>> >
>>
>> > On Sat, Sep 17, 2016 at 2:45 AM, kant kodali  wrote:
>>
>> >
>>
>> > > https://issues.apache.org/jira/browse/KAFKA-1793
>>
>> > > It would be great to use Consul instead of Zookeeper for Kafka and I
>>
>> > think
>>
>> > > it
>>
>> > > would benefit Kafka a lot from the exponentially growing consul
>>
>> > community.
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> > --
>>
>> >
>>
>> >
>>
>> > Jennifer Fountain
>>
>> > DevOPS
>>
>> >
>>
>> --
>>
>>
>>
>>
>> Jens Rantil
>>
>> Backend Developer @ Tink
>>
>>
>>
>>
>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>>
>> For urgent matters you can reach me at +46-708-84 18 32.
>


Re: Partition creation issues (0.9.0.1)

2016-09-19 Thread Apurva Sharma
Bumping for visibility - appreciate any responses on this.

Thanks,
Apurva

On Thu, Sep 15, 2016 at 3:57 PM, Apurva Sharma 
wrote:

>
> Kafka Topic Creation issues (via Kafka-Manager with
> auto.create.topics.enable = false)
> Version: 0.9.0.1
>
> We created a topic "web" via Kafka-Manager (our brokers are configured for
> autocreate to be false) and then clicked on Generate Partitions and
> according to the tool, the topic has been created cleanly with partitions
> assigned correctly to brokers.
> However, when we look into the individual broker logs:
>
> [2016-09-15 18:46:10,268] ERROR [ReplicaFetcherThread-5-1006], Error for
> partition [web,2] to broker 1006:org.apache.kafka.common.errors.
> UnknownTopicOrPartitionException: This server does not host this
> topic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-09-15 18:46:10,391] ERROR [ReplicaFetcherThread-4-1005], Error for
> partition [web,1] to broker 1005:org.apache.kafka.common.errors.
> UnknownTopicOrPartitionException: This server does not host this
> topic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-09-15 18:46:10,391] WARN [Replica Manager on Broker 1001]: While
> recording the replica LEO, the partition [web,5] hasn't been created.
> (kafka.server.ReplicaManager)
> [2016-09-15 18:46:10,407] ERROR [ReplicaFetcherThread-4-1004], Error for
> partition [web,16] to broker 1004:org.apache.kafka.common.errors.
> UnknownTopicOrPartitionException: This server does not host this
> topic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-09-15 18:46:10,472] ERROR [ReplicaFetcherThread-11-1003], Error for
> partition [web,23] to broker 1003:org.apache.kafka.common.errors.
> UnknownTopicOrPartitionException: This server does not host this
> topic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-09-15 18:46:10,520] WARN [Replica Manager on Broker 1001]: While
> recording the replica LEO, the partition [web,29] hasn't been created.
> (kafka.server.ReplicaManager)
> [2016-09-15 18:46:10,667] ERROR [ReplicaFetcherThread-11-1004], Error for
> partition [web,8] to broker 1004:org.apache.kafka.common.errors.
> UnknownTopicOrPartitionException: This server does not host this
> topic-partition. (kafka.server.ReplicaFetcherThread)
> [2016-09-15 18:46:10,895] WARN [Replica Manager on Broker 1001]: While
> recording the replica LEO, the partition [web,21] hasn't been created.
> (kafka.server.ReplicaManager)
> [2016-09-15 18:46:10,931] WARN [Replica Manager on Broker 1001]: While
> recording the replica LEO, the partition [web,13] hasn't been created.
> (kafka.server.ReplicaManager)
>
> Indeed, when we actually inspect if the partitions were actually created,
> we see disparity with what's reported by the tool. (Many partitions are
> actually not present on the brokers yet - leading me to believe that the
> partitioning process is stuck)
>
>
> | ZK Id | Broker Id  |According to Kafka Manager|
>Actual Partitions Created |
> |-||
> ||
> |  1001 | broker-1001... | 1,2,5,8,9,13,16,21,23,29,30,31   |
> web-1,web-16,web-2,web-23,web-8,web-9|
> |  1002 | broker-1004... | 2,3,6,9,10,14,16,17,22,24,30,31  |
> web-10,web-16,web-3,web-9|
> |  1003 | broker-1002... | 3,4,7,10,11,15,17,18,23,24,25,31 |
> web-10,web-11,web-3,web-4|
> |  1004 | broker-1003... | 0,4,5,8,11,12,16,18,19,24,25,26  |
> web-12,web-18,web-19,web-25,web-26   |
> |  1005 | broker-1005... | 1,5,6,9,12,13,17,19,20,25,26,27  |
> web-12,web-13,web-19,web-26,web-27,web-5 |
> |  1006 | broker-1006... | 2,6,7,10,13,14,18,20,21,26,27,28 |
> web-27   |
> |  1007 | broker-1007... | 0,3,7,11,14,15,19,21,22,27,28,29 |
> web-21,web-22,web-7  |
> |  1008 | broker-1008... | 0,1,4,8,12,15,20,22,23,28,29,30  |
> web-1,web-15,web-22,web-29,web-8 |
>
> Finally, when we try to repartition the topic we get:
> "Yikes! Partition reassignment currently in progress for. Aborting
> operation" (This seems expected because the initial partitioning has not
> completed for 3 days - so clearly it's stuck somewhere).
>
> We have created topics with the Kafka-Manager tool before without issues,
> but this has started happening consistently for the past 2-3 topics that we
> tried creating through the tool.
> Any help on this is greatly appreciated.
>
> --
> Regards,
> Apurva
>



-- 
Regards,
Apurva


Benchmarking kafka performance

2016-09-19 Thread Vadim Keylis
Good morning. Which benchmarking tools we should use to compare performance
of 0.8 and 0.10 versions? Which metrics  should we monitor ?

Thanks in advance,
Vadim


RE: Kafka usecase

2016-09-19 Thread Ghosh, Achintya (Contractor)
Please find my response here.

1. Kafka can be used as a message store.
2. What is the message arrival rate per second? 20 per sec
3. What is the SLA for the messages to be processed? 500 ms per message
4. If your messages arrive faster than they are consumed, you will get a 
backlog of messages. In that case, you may need to grow your cluster so that 
more messages are processed in parallel.
 You mean here to create more partitions or any thing else we need to do?

-Original Message-
From: Lohith Samaga M [mailto:lohith.sam...@mphasis.com] 
Sent: Monday, September 19, 2016 12:24 AM
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: RE: Kafka usecase

Hi Achintya,
1. Kafka can be used as a message store.
2. What is the message arrival rate per second?
3. What is the SLA for the messages to be processed?
4. If your messages arrive faster than they are consumed, you will get 
a backlog of messages. In that case, you may need to grow your cluster so that 
more messages are processed in parallel.

Best regards / Mit freundlichen Grüßen / Sincères salutations M. Lohith Samaga



-Original Message-
From: Ghosh, Achintya (Contractor) [mailto:achintya_gh...@comcast.com]
Sent: Monday, September 19, 2016 08.39
To: users@kafka.apache.org
Cc: d...@kafka.apache.org
Subject: Kafka usecase

Hi there,

We have an usecase where we do a lot of business logic to process each message 
and sometime it takes 1-2 sec, so will be Kafka fit in our usecase?

Thanks
Achintya
Information transmitted by this e-mail is proprietary to Mphasis, its 
associated companies and/ or its customers and is intended for use only by the 
individual or entity to which it is addressed, and may contain information that 
is privileged, confidential or exempt from disclosure under applicable law. If 
you are not the intended recipient or it appears that this mail has been 
forwarded to you without proper authority, you are notified that any use or 
dissemination of this information in any manner is strictly prohibited. In such 
cases, please notify us immediately at mailmas...@mphasis.com and delete this 
mail from your records.




RE: any update on this?

2016-09-19 Thread Martin Gainty
Jens/Kant
not aware of any shortfall with zookeeper so perhaps you can suggest advantages 
for Consul vs Zookeeper?
Maven (I am building, testing and running kafka internally with maven) 
implements wagon-providers for URLConnection vs HttpURLConnection 
wagonshttps://maven.apache.org/guides/mini/guide-wagon-providers.html
Thinking a network_provider should work for integrating external network 
provider. how would you architect this integration?

would a configurable network-provider such as maven-wagon-provider work for 
kafka?Martin

> From: kanth...@gmail.com
> To: users@kafka.apache.org
> Subject: Re: any update on this?
> Date: Mon, 19 Sep 2016 09:41:10 +
> 
> Yes ofcourse the goal shouldn't be moving towards consul. It should just be
> flexible enough for users to pick any distributed coordinated system.
>  
> 
> 
> 
> 
> 
> On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se
> wrote:
> I think I read somewhere that the long-term goal is to make Kafka
> 
> independent of Zookeeper alltogether. Maybe not worth spending time on
> 
> migrating to Consul in that case.
> 
> 
> 
> 
> Cheers,
> 
> Jens
> 
> 
> 
> 
> On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain 
> 
> wrote:
> 
> 
> 
> 
> > +2 watching.
> 
> >
> 
> > On Sat, Sep 17, 2016 at 2:45 AM, kant kodali  wrote:
> 
> >
> 
> > > https://issues.apache.org/jira/browse/KAFKA-1793
> 
> > > It would be great to use Consul instead of Zookeeper for Kafka and I
> 
> > think
> 
> > > it
> 
> > > would benefit Kafka a lot from the exponentially growing consul
> 
> > community.
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > --
> 
> >
> 
> >
> 
> > Jennifer Fountain
> 
> > DevOPS
> 
> >
> 
> -- 
> 
> 
> 
> 
> Jens Rantil
> 
> Backend Developer @ Tink
> 
> 
> 
> 
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> 
> For urgent matters you can reach me at +46-708-84 18 32.
  

Re: Time of derived records in Kafka Streams

2016-09-19 Thread Eno Thereska
Hi Elias,

- out of order records: the timestamp is that of the out of order record, i.e., 
time goes backwards sometimes
- joins: the same, the timestamp could be that of either record.

We'll update the docs, thanks for your question.

Eno

> On 17 Sep 2016, at 00:43, Elias Levy  wrote:
> 
> On Sat, Sep 10, 2016 at 9:17 AM, Eno Thereska 
> wrote:
> 
>> 
>> For aggregations, the timestamp will be that of the latest record being
>> aggregated.
>> 
> 
> How does that account for out of order records?
> 
> What about kstream-kstream joins?  The output from the join could be
> triggered by a record received from either stream depending on the order
> they are received and processed.  If the timestamp of the output is just
> the timestamp of the latest received record, then it seems that the
> timestamp could be that of either record.  Although I suppose that the best
> effort stream synchronization effort that Kafka Streams attempts means that
> usually the timestamp will be that of the later record.



Re: any update on this?

2016-09-19 Thread kant kodali
Yes ofcourse the goal shouldn't be moving towards consul. It should just be
flexible enough for users to pick any distributed coordinated system.
 





On Mon, Sep 19, 2016 2:23 AM, Jens Rantil jens.ran...@tink.se
wrote:
I think I read somewhere that the long-term goal is to make Kafka

independent of Zookeeper alltogether. Maybe not worth spending time on

migrating to Consul in that case.




Cheers,

Jens




On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain 

wrote:




> +2 watching.

>

> On Sat, Sep 17, 2016 at 2:45 AM, kant kodali  wrote:

>

> > https://issues.apache.org/jira/browse/KAFKA-1793

> > It would be great to use Consul instead of Zookeeper for Kafka and I

> think

> > it

> > would benefit Kafka a lot from the exponentially growing consul

> community.

>

>

>

>

> --

>

>

> Jennifer Fountain

> DevOPS

>

-- 




Jens Rantil

Backend Developer @ Tink




Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden

For urgent matters you can reach me at +46-708-84 18 32.

Re: any update on this?

2016-09-19 Thread Jens Rantil
I think I read somewhere that the long-term goal is to make Kafka
independent of Zookeeper alltogether. Maybe not worth spending time on
migrating to Consul in that case.

Cheers,
Jens

On Sat, Sep 17, 2016 at 10:38 PM Jennifer Fountain 
wrote:

> +2 watching.
>
> On Sat, Sep 17, 2016 at 2:45 AM, kant kodali  wrote:
>
> > https://issues.apache.org/jira/browse/KAFKA-1793
> > It would be great to use Consul instead of Zookeeper for Kafka and I
> think
> > it
> > would benefit Kafka a lot from the exponentially growing consul
> community.
>
>
>
>
> --
>
>
> Jennifer Fountain
> DevOPS
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Kafka with Flume

2016-09-19 Thread Baris Akgun (Garanti Teknoloji)
Hi,

We are using flume with kafka source. Flume gets json messages from kafka topic 
with given morphline file. If any message that has not well formed with 
morphline file json structure is added to kafka topic, flume (consumer) always 
tries to consume that wrong message. When we looked the kafka topic offset, we 
noticed that flume can not update offset value if any not well formed message 
comes. Is there any paremeter to pass wrong message and update the kafka offset 
in flume confugration file?

Thanks.

Barış Akgün

Bu mesaj ve ekleri, mesajda gonderildigi belirtilen kisi/kisilere ozeldir ve 
gizlidir. Bu mesajin muhatabi olmamaniza ragmen tarafiniza ulasmis olmasi 
halinde mesaj iceriginin gizliligi ve bu gizlilik yukumlulugune uyulmasi 
zorunlulugu tarafiniz icin de soz konusudur. Mesaj ve eklerinde yer alan 
bilgilerin dogrulugu ve guncelligi konusunda gonderenin ya da sirketimizin 
herhangi bir sorumlulugu bulunmamaktadir. Sirketimiz mesajin ve bilgilerinin 
size degisiklige ugrayarak veya gec ulasmasindan, butunlugunun ve gizliliginin 
korunamamasindan, virus icermesinden ve bilgisayar sisteminize verebilecegi 
herhangi bir zarardan sorumlu tutulamaz.

This message and attachments are confidential and intended solely for the 
individual(s) stated in this message. If you received this message although you 
are not the addressee, you are responsible to keep the message confidential. 
The sender has no responsibility for the accuracy or correctness of the 
information in the message and its attachments. Our company shall have no 
liability for any changes or late receiving, loss of integrity and 
confidentiality, viruses and any damages caused in anyway to your computer 
system.