I need some help with the production server architecture

2016-11-28 Thread Sachin Mittal
Hi,
Sometime back i was informed on the group that in production we should
never run kafka on same physical machine. So based on that I have a
question on how to divide the server nodes we have to run zookeper and
kafka brokers.

I have a following setup
Data center 1
Lan 1 (3 VMs)
192.168.xx.yy1
192.168.xx.yy2
192.168.xx.yy3
Right now here we are running a cluster of 3 nodejs web servers.
These collect data from web and write to kafka queue. Each VM has 70 GB of
space.

Lan 2 (3 VMs)
192.168.zz.aa1
192.168.zz.aa2
192.168.zz.aa3
These are served the cluster of our database server. Each VM has 400 GB of
space.

Date center 2
Lan 1 (3 VMs)
192.168.yy.bb1
192.168.yy.bb2
192.168.yy.bb3
Three new machines where we plan to run a cluster of new database to be
served as sink of kafka stream applications. Each VM has 400 GB of space.
These have connectivity only between Lan 2 of Data center 1 with a 100MBs
of data transfer rate.

Each VM has a 4 core processor and 16 GB of RAM. They all run linux.

Now I would like my topics to be replicated with a factor of 3. Since we
don't foresee much volume of data, I don't want it to be partitioned.

Also we would like one server to be used as streaming application server,
where we can run one or more kafka stream applications to process the
topics and write to the new database.

 So please let me know what is a suitable division to run brokers and
zookeeper.


Thanks
Sachin


Question about state store

2016-11-28 Thread Simon Teles

Hello all,

I'm not sure how to do the following use-case, if anyone can help :)

- I have an admin UI where the admin can choose wich item are allowed on 
the application


- When the admin choose an item, it pushes an object to a topic kafka : 
test.admin.item with the following object {"id":"1234", "name":"toto"}


- I have a Kafka Stream which is connected to a topic test.item which 
receive all the items update from our DB.


- When the stream receives an item, it needs to check if its allowed by 
the admin. If yes, it saves it in on another DB and pushes a 
notification on another topic if the save on the DB is ok. If not, it 
does nothing.


My idea is when i start the stream, i "push" all the contents from the 
topic test.admin.item to a state-store and when i receive a new item on 
test.item, i check its id against the state-store.


Is this the proper way ?

My problem is :

-> if i use the TopologyBuilder, i don't know how can i load the topic 
on a state-store at start to after use it on a Processor ?


-> With the KStreamBuilder i can use :

KStreamBuilder builder = new KStreamBuilder();

// We create a KTable from topic admin.item and load it on the Store 
"AdminItem"
 builder.table(Serdes.String(), new SerdeItem(),"test.admin.item", 
"storeAdminItem");


-> But with the KStreamBuilder, i don't how can i access the state-store 
when i map/filter/etc ?


I you can help me figure it out, it would be much appreciated.

Thanks,

Regards,



Two possible issues with 2.11 0.10.1.0

2016-11-28 Thread Jon Yeargers
Ran into these two internal exceptions today.  Likely they are known or my
fault but I'm reporting them anyway:

Exception in thread "StreamThread-1" java.lang.NullPointerException

at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:304)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)



and this one:


Exception in thread "StreamThread-1"
org.apache.kafka.streams.errors.StreamsException: stream-thread
[StreamThread-1] Failed to rebalance

at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:410)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:242)

Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error
opening store table_stream-20161129 at location
/tmp/kafka-streams/BreakoutProcessor/0_0/table_stream/table_stream-20161129

at
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:196)

at
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:158)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore$Segment.openDB(RocksDBWindowStore.java:72)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore.getOrCreateSegment(RocksDBWindowStore.java:402)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore.putInternal(RocksDBWindowStore.java:333)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore.access$100(RocksDBWindowStore.java:51)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore$2.restore(RocksDBWindowStore.java:212)

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.restoreActiveState(ProcessorStateManager.java:235)

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.register(ProcessorStateManager.java:198)

at
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.register(ProcessorContextImpl.java:123)

at
org.apache.kafka.streams.state.internals.RocksDBWindowStore.init(RocksDBWindowStore.java:206)

at
org.apache.kafka.streams.state.internals.MeteredWindowStore.init(MeteredWindowStore.java:66)

at
org.apache.kafka.streams.state.internals.CachingWindowStore.init(CachingWindowStore.java:64)

at
org.apache.kafka.streams.processor.internals.AbstractTask.initializeStateStores(AbstractTask.java:81)

at
org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:120)

at
org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:633)

at
org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:660)

at
org.apache.kafka.streams.processor.internals.StreamThread.access$100(StreamThread.java:69)

at
org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:124)

at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:228)

at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:313)

at
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:277)

at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:259)

at
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1013)

at
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)

at
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:407)

... 1 more

Caused by: org.rocksdb.RocksDBException: IO error: lock
/tmp/kafka-streams/BreakoutProcessor/0_0/table_stream/table_stream-20161129/LOCK:
No locks available

at org.rocksdb.RocksDB.open(Native Method)

at org.rocksdb.RocksDB.open(RocksDB.java:184)

at
org.apache.kafka.streams.state.internals.RocksDBStore.openDB(RocksDBStore.java:189)

... 26 more


Re: while publishing message need to add multiple keys in a single message

2016-11-28 Thread Tauzell, Dave
Kafka doesn't have the concept of message headers like some other messaging 
systems.

You will have to create a payload that contains these  headers and whatever 
bytes you are sending.

Dave

> On Nov 28, 2016, at 16:47, Prasad Dls  wrote:
>
> Hi,
>
> While publishing each message (single message) to Kafka, I need to add
> below headers/key
>
>
> ID:  123456
> TYPE:  xyz
> EVENTE: A
> OPERATION: Insert
> createTimeStampt: 2016-11-24T19:41:23.354Z
> updatedTimeStamp: 2016-11-30T19:41:23.354Z
>
> Please help me on this, how can i add all these into single message.
>
>
> Thanks
> Prasad
This e-mail and any files transmitted with it are confidential, may contain 
sensitive information, and are intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this e-mail in error, 
please notify the sender by reply e-mail immediately and destroy all copies of 
the e-mail and any attachments.


HOW TO GET KAFKA CURRENT TOPIC GROUP OFFSET

2016-11-28 Thread ??????
HI! WAIT YOUR ANSWER

Re: Kafka Streaming

2016-11-28 Thread Mohit Anchlia
I just cloned 3.1x and tried to run a test. I am still seeing rocksdb error:

Exception in thread "StreamThread-1" java.lang.UnsatisfiedLinkError:
C:\Users\manchlia\AppData\Local\Temp\librocksdbjni108789031344273.dll:
Can't find dependent libraries


On Mon, Oct 24, 2016 at 11:26 AM, Matthias J. Sax 
wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> It's a client issues... But CP 3.1 should be our in about 2 weeks...
> Of course, you can use Kafka 0.10.1.0 for now. It was released last
> week and does contain the fix.
>
> - -Matthias
>
> On 10/24/16 9:19 AM, Mohit Anchlia wrote:
> > Would this be an issue if I connect to a remote Kafka instance
> > running on the Linux box? Or is this a client issue. What's rockdb
> > used for to keep state?
> >
> > On Mon, Oct 24, 2016 at 12:08 AM, Matthias J. Sax
> >  wrote:
> >
> > Kafka 0.10.1.0 which was release last week does contain the fix
> > already. The fix will be in CP 3.1 coming up soon!
> >
> > (sorry that I did mix up versions in a previous email)
> >
> > -Matthias
> >
> > On 10/23/16 12:10 PM, Mohit Anchlia wrote:
>  So if I get it right I will not have this fix until 4
>  months? Should I just create my own example with the next
>  version of Kafka?
> 
>  On Sat, Oct 22, 2016 at 9:04 PM, Matthias J. Sax
>   wrote:
> 
>  Current version is 3.0.1 CP 3.1 should be release the next
>  weeks
> 
>  So CP 3.2 should be there is about 4 month (Kafka follows a
>  time base release cycle of 4 month and CP usually aligns with
>  Kafka releases)
> 
>  -Matthias
> 
> 
>  On 10/20/16 5:10 PM, Mohit Anchlia wrote:
> >>> Any idea of when 3.2 is coming?
> >>>
> >>> On Thu, Oct 20, 2016 at 4:53 PM, Matthias J. Sax
> >>>  wrote:
> >>>
> >>> No problem. Asking questions is the purpose of mailing
> >>> lists. :)
> >>>
> >>> The issue will be fixed in next version of examples
> >>> branch.
> >>>
> >>> Examples branch is build with CP dependency and not
> >>> with Kafka dependency. CP-3.2 is not available yet;
> >>> only Kafka 0.10.1.0. Nevertheless, they should work
> >>> with Kafka dependency, too. I never tried it, but you
> >>> should give it a shot...
> >>>
> >>> But you should use example master branch because of
> >>> API changes from 0.10.0.x to 0.10.1 (and thus, changing
> >>> CP-3.1 to 0.10.1.0 will not be compatible and not
> >>> compile, while changing CP-3.2-SNAPSHOT to 0.10.1.0
> >>> should work -- hopefully ;) )
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 10/20/16 4:02 PM, Mohit Anchlia wrote:
> >> So this issue I am seeing is fixed in the next
> >> version of example branch? Can I change my pom to
> >> point it the higher version of Kafka if that is
> >> the issue? Or do I need to wait until new branch
> >> is made available? Sorry lot of questions :)
> >>
> >> On Thu, Oct 20, 2016 at 3:56 PM, Matthias J. Sax
> >>  wrote:
> >>
> >> The branch is 0.10.0.1 and not 0.10.1.0 (sorry
> >> for so many zeros and ones -- super easy to mix
> >> up)
> >>
> >> However, examples master branch uses
> >> CP-3.1-SNAPSHOT (ie, Kafka 0.10.1.0) -- there
> >> will be a 0.10.1 examples branch, after CP-3.1
> >> was released
> >>
> >>
> >> -Matthias
> >>
> >> On 10/20/16 3:48 PM, Mohit Anchlia wrote:
> > I just now cloned this repo. It seems to be
> > using 10.1
> >
> > https://github.com/confluentinc/examples
> > and running examples in
> > https://github.com/confluentinc/examples/tree/kafka-0.10.0
> .1-
> >
> >
> cp-
> 
> >
> > 3.0
> >>>
> >
>  .1/
> >>
> >
> >>> kafka-streams
> >
> > On Thu, Oct 20, 2016 at 3:10 PM, Michael
> > Noll  wrote:
> >
> >> I suspect you are running Kafka 0.10.0.x
> >> on Windows? If so, this is a known issue
> >> that is fixed in Kafka 0.10.1 that was
> >> just released today.
> >>
> >> Also: which examples are you referring
> >> to? And, to confirm: which git branch /
> >> Kafka version / OS in case my guess above
> >> was wrong.
> >>
> >>
> >> On Thursday, October 20, 2016, Mohit
> >> Anchlia  wrote:
> >>
> >>> I am trying to run the examples from
> >>> git. While running the wordcount
> >>> example I see this error:
> >>>
> 

Re: Kafka Connect consumer not using the group.id provided in connect-distributed.properties

2016-11-28 Thread Srikrishna Alla
Hi,

I am using Kafka Connect Sink Connector to consume messages from a Kafka
topic in a secure Kafka cluster. I have provided the group.id in
connect-distributed.properties. I am using security.protocol as
SASL_PLAINTEXT.

Here is definition of group id in connect-distributed.properties -
bootstrap.servers=:6667
*group.id *=alert
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
offset.flush.interval.ms=1
config.storage.topic=connect-configs
security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
producer.sasl.kerberos.service.name=kafka
producer.security.protocol=SASL_PLAINTEXT
producer.sasl.mechanism=GSSAPI
consumer.sasl.kerberos.service.name=kafka
consumer.security.protocol=SASL_PLAINTEXT
consumer.sasl.mechanism=GSSAPI

In the Kafka Connect log, its picking up the group.id in the properties
file -
2016-11-28 15:49:40 INFO  DistributedConfig : 165 - DistributedConfig
values:
request.timeout.ms = 4
retry.backoff.ms = 100
..
group.id = alert
metric.reporters = []
ssl.truststore.type = JKS
cluster = connect

But, when its instantiating a sink connector task, its using  a new
consumer group -
2016-11-28 15:49:43 INFO  ConsumerConfig : 165 - ConsumerConfig values:
request.timeout.ms = 4
check.crcs = true
retry.backoff.ms = 100
..
group.id = connect-sink-connector
enable.auto.commit = false
metric.reporters = []
ssl.truststore.type = JKS
send.buffer.bytes = 131072

Why is this happening? I have not faced this issue on an unsecure cluster.
Is there any configuration property I am missing?

Thanks in advance for your help.
-Sri



On Mon, Nov 28, 2016 at 6:52 PM, Srikrishna Alla 
wrote:

> Hi,
>
> I am using Kafka Connect Sink Connector to consume messages from a Kafka
> topic in a secure Kafka cluster. I have provided the group.id in
> connect-distributed.properties. I am using security.protocol as
> SASL_PLAINTEXT.
>
> Here is definition of group id in connect-distributed.properties -
> [sa9726@clpd355 conf]$ less connect-distributed.properties |grep -i group
> group.id=alert
>
> In the Kafka Connect log, its picking up the group.id in the properties
> file -
> 2016-11-28 15:49:40 INFO  DistributedConfig : 165 - DistributedConfig
> values:
> request.timeout.ms = 4
> retry.backoff.ms = 100
>
>
>
>


Kafka Connect consumer not using the group.id provided in connect-distributed.properties

2016-11-28 Thread Srikrishna Alla
Hi,

I am using Kafka Connect Sink Connector to consume messages from a Kafka
topic in a secure Kafka cluster. I have provided the group.id in
connect-distributed.properties. I am using security.protocol as
SASL_PLAINTEXT.

Here is definition of group id in connect-distributed.properties -
[sa9726@clpd355 conf]$ less connect-distributed.properties |grep -i group
group.id=alert

In the Kafka Connect log, its picking up the group.id in the properties
file -
2016-11-28 15:49:40 INFO  DistributedConfig : 165 - DistributedConfig
values:
request.timeout.ms = 4
retry.backoff.ms = 100


Re: Initializing StateStores takes *really* long for large datasets

2016-11-28 Thread Frank Lyaruu
I'll write an update on where I am now.

I've got about 40 'primary' topics, some small, some up to about 10M
messages,
and about 30 internal topics, divided over 6 stream instances, all running
in a single
app, talking to a 3 node Kafka cluster.

I use a single thread per stream instance, as my prime concern is now to
get it
to run stable, rather than optimizing performance.

My biggest issue was that after a few hours my application started to slow
down
to ultimately freeze up or crash. It turned out that RocksDb consumed all
my
memory, which I overlooked as it was off-heap.

I was fooling around with RocksDb settings a bit but I had missed the most
important
one:

BlockBasedTableConfig tableConfig = new BlockBasedTableConfig();
tableConfig.setBlockCacheSize(BLOCK_CACHE_SIZE);
tableConfig.setBlockSize(BLOCK_SIZE);
options.setTableFormatConfig(tableConfig);

The block cache size defaults to a whopping 100Mb per store, and that gets
expensive
fast. I reduced it to a few megabytes. My data size is so big that I doubt
it is very effective
anyway. Now it seems more stable.

I'd say that a smaller default makes sense, especially because the failure
case is
so opaque (running all tests just fine but with a serious dataset it dies
slowly)

Another thing I see is that while starting all my instances, some are quick
and some take
time (makes sense as the data size varies greatly), but as more instances
start up, they
start to use more and more CPU I/O and network, that the initialization of
the bigger ones
takes even longer, increasing the chance that one of them takes longer than
the
MAX_POLL_INTERVAL_MS_CONFIG, and then all hell breaks loose. Maybe we can
separate the 'initialize' and 'start' step somehow.

In this case we could log better: If initialization is taking longer than
the timeout, it ends up
being reassigned (in my case to the same instance) and then it errors out
on being unable
to lock the state dir. That message isn't too informative as the timeout is
the actual problem.

regards, Frank


On Mon, Nov 28, 2016 at 8:01 PM, Guozhang Wang  wrote:

> Hello Frank,
>
> How many instances do you have in your apps and how many threads did you
> use per thread? Note that besides the topology complexity (i.e. number of
> state stores, number of internal topics etc) the (re-)initialization
> process is depending on the underlying consumer's membership protocol, and
> hence its rebalance latency could be longer with larger groups.
>
> We have been optimizing our rebalance latency due to state store migration
> and restoration in the latest release, but for now the re-initialization
> latency is still largely depends on 1) topology complexity regarding to
> state stores and 2) number of input topic partitions and instance / threads
> in the application.
>
>
> Guozhang
>
>
> On Sat, Nov 26, 2016 at 12:57 AM, Damian Guy  wrote:
>
> > Hi Frank,
> >
> > If you are running on a single node then the RocksDB state should be
> > re-used by your app. However, it relies on the app being cleanly shutdown
> > and the existence of ".checkpoint" files in the state directory for the
> > store, .i.e, /tmp/kafka-streams/application-id/0_0/.checkpoint. If the
> > file
> > doesn't exist then the entire state will be restored from the changelog -
> > which could take some time. I suspect this is what is happening?
> >
> > As for the RocksDB memory settings, yes the off heap memory usage does
> > sneak under the radar. There is a memory management story for Kafka
> Streams
> > that is yet to be started. This would involve limiting the off-heap
> memory
> > that RocksDB uses.
> >
> > Thanks,
> > Damian
> >
> > On Fri, 25 Nov 2016 at 21:14 Frank Lyaruu  wrote:
> >
> > > I'm running all on a single node, so there is no 'data mobility'
> > involved.
> > > So if Streams does not use any existing data, I might as well wipe the
> > > whole RocksDb before starting, right?
> > >
> > > As for the RocksDb tuning, I am using a RocksDBConfigSetter, to reduce
> > the
> > > memory usage a bit:
> > >
> > > options.setWriteBufferSize(300);
> > > options.setMaxBytesForLevelBase(3000);
> > > options.setMaxBytesForLevelMultiplier(3);
> > >
> > > I needed to do this as my 16Gb machine would die otherwise but I
> honestly
> > > was just reducing values more or less randomly until it wouldn't fall
> > over.
> > > I have to say this is a big drawback of Rocks, I monitor Java memory
> > usage
> > > but this just sneaks under the radar as it is off heap, and it isn't
> very
> > > clear what the implications are of different settings, as I can't says
> > > something like the Xmx heap setting, meaning: Take whatever you need up
> > to
> > > this maximum. Also, if I get this right, in the long run, as the data
> set
> > > changes and grows, I can never be sure it won't take too much memory.
> > >
> > > I get the impression I'll be better off with an external 

Kafka Clients Survey

2016-11-28 Thread Gwen Shapira
Hey Kafka Community,

I'm trying to take a pulse on the current state of the Kafka clients ecosystem.
Which languages are most popular in our community? What does the
community value in clients?

You can help me out by filling in the survey:
https://goo.gl/forms/cZg1CJyf1PuqivTg2

I will lock the survey and publish results by Jan 15.

Gwen


while publishing message need to add multiple keys in a single message

2016-11-28 Thread Prasad Dls
Hi,

While publishing each message (single message) to Kafka, I need to add
below headers/key


ID:  123456
TYPE:  xyz
EVENTE: A
OPERATION: Insert
createTimeStampt: 2016-11-24T19:41:23.354Z
updatedTimeStamp: 2016-11-30T19:41:23.354Z

Please help me on this, how can i add all these into single message.


Thanks
Prasad


Re: kafka 0.10.1 ProcessorTopologyTestDriver issue with .map and .groupByKey.count

2016-11-28 Thread Matthias J. Sax
Hamid,

would you mind creating a Jira? Thanks.


-Matthias

On 11/28/16 9:36 AM, Guozhang Wang wrote:
> Damian, Hamid:
> 
> I looked at the source code and suspect that it is because of the
> auto-repartitioning which causes the topology to not directly forward the
> record to the child processors but send to an intermediate topic. In our
> tests we only do "groupByKey" without map, and hence auto-repartitioning
> will not be introduced.
> 
> What bothers me is that, for KStreamTestDriver it should have the similar
> issue as well (it does not handle intermediate topic either), but Hamid
> reports it actually works fine.
> 
> Anyways, I think there is an issue with at least ProcessorTopologyTestDriver,
> and very likely with KStreamTestDriver as well, and we should file a JIRA
> and continue investigating and fixing it.
> 
> 
> Guozhang
> 
> 
> On Thu, Nov 24, 2016 at 7:34 AM, Hamidreza Afzali <
> hamidreza.afz...@hivestreaming.com> wrote:
> 
>> Hi Damian,
>>
>> It processes correctly when using KStreamTestDriver.
>>
>> Best,
>> Hamid
>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: Kafka 0.10.1.0 consumer group rebalance broken?

2016-11-28 Thread Radek Gruchalski
There has been plenty of changes in the GroupCoordinator and co between
these two releases:
https://github.com/apache/kafka/commit/40b1dd3f495a59abef8a0cba5450526994c92c04#diff-96e4cf31cd54def6b2fb3f7a118c1db3

It might be related to this:
https://github.com/apache/kafka/commit/4b003d8bcfffded55a00b8ecc9eed8eb373fb6c7#diff-d2461f78377b588c4f7e2fe8223d5679R633

If your group is empty, your group is marked dead, when the group is dead,
no matter what you do, it’ll reply with:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/GroupCoordinator.scala#L353

Food for thought.

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On November 28, 2016 at 9:04:16 PM, Bart Vercammen (b...@cloutrix.com)
wrote:

Hi,

It seems that consumer group rebalance is broken in Kafka 0.10.1.0 ?
When running a small test-project :
- consumers running in own JVM (with different 'client.id')
- producer running in own JVM
- kafka broker : the embedded kafka : KafkaServerStartable

It looks like the consumers loose their hart-beat after a rebalance got
triggered.
In the logs on the consumer I can actually see that the heartbeat failed
due to "invalid member_id"

When running the exact same code on a 0.10.0.1 setup, all works perfectly.
Anyone else seen this problem?

Greets,
Bart


Kafka 0.10.1.0 consumer group rebalance broken?

2016-11-28 Thread Bart Vercammen
Hi,

It seems that consumer group rebalance is broken in Kafka 0.10.1.0 ?
When running a small test-project :
- consumers running in own JVM (with different 'client.id')
- producer running in own JVM
- kafka broker : the embedded kafka : KafkaServerStartable

It looks like the consumers loose their hart-beat after a rebalance got
triggered.
In the logs on the consumer I can actually see that the heartbeat failed
due to "invalid member_id"

When running the exact same code on a 0.10.0.1 setup, all works perfectly.
Anyone else seen this problem?

Greets,
Bart


Re: Initializing StateStores takes *really* long for large datasets

2016-11-28 Thread Guozhang Wang
Hello Frank,

How many instances do you have in your apps and how many threads did you
use per thread? Note that besides the topology complexity (i.e. number of
state stores, number of internal topics etc) the (re-)initialization
process is depending on the underlying consumer's membership protocol, and
hence its rebalance latency could be longer with larger groups.

We have been optimizing our rebalance latency due to state store migration
and restoration in the latest release, but for now the re-initialization
latency is still largely depends on 1) topology complexity regarding to
state stores and 2) number of input topic partitions and instance / threads
in the application.


Guozhang


On Sat, Nov 26, 2016 at 12:57 AM, Damian Guy  wrote:

> Hi Frank,
>
> If you are running on a single node then the RocksDB state should be
> re-used by your app. However, it relies on the app being cleanly shutdown
> and the existence of ".checkpoint" files in the state directory for the
> store, .i.e, /tmp/kafka-streams/application-id/0_0/.checkpoint. If the
> file
> doesn't exist then the entire state will be restored from the changelog -
> which could take some time. I suspect this is what is happening?
>
> As for the RocksDB memory settings, yes the off heap memory usage does
> sneak under the radar. There is a memory management story for Kafka Streams
> that is yet to be started. This would involve limiting the off-heap memory
> that RocksDB uses.
>
> Thanks,
> Damian
>
> On Fri, 25 Nov 2016 at 21:14 Frank Lyaruu  wrote:
>
> > I'm running all on a single node, so there is no 'data mobility'
> involved.
> > So if Streams does not use any existing data, I might as well wipe the
> > whole RocksDb before starting, right?
> >
> > As for the RocksDb tuning, I am using a RocksDBConfigSetter, to reduce
> the
> > memory usage a bit:
> >
> > options.setWriteBufferSize(300);
> > options.setMaxBytesForLevelBase(3000);
> > options.setMaxBytesForLevelMultiplier(3);
> >
> > I needed to do this as my 16Gb machine would die otherwise but I honestly
> > was just reducing values more or less randomly until it wouldn't fall
> over.
> > I have to say this is a big drawback of Rocks, I monitor Java memory
> usage
> > but this just sneaks under the radar as it is off heap, and it isn't very
> > clear what the implications are of different settings, as I can't says
> > something like the Xmx heap setting, meaning: Take whatever you need up
> to
> > this maximum. Also, if I get this right, in the long run, as the data set
> > changes and grows, I can never be sure it won't take too much memory.
> >
> > I get the impression I'll be better off with an external store,
> something I
> > can monitor, tune and restart separately.
> >
> > But I'm getting ahead of myself. I'll wipe the data before I start, see
> if
> > that gets me any stability
> >
> >
> >
> >
> > On Fri, Nov 25, 2016 at 4:54 PM, Damian Guy 
> wrote:
> >
> > > Hi Frank,
> > >
> > > If you have run the app before with the same applicationId, completely
> > shut
> > > it down, and then restarted it again, it will need to restore all of
> the
> > > state which will take some time depending on the amount of data you
> have.
> > > In this case the placement of the partitions doesn't take into account
> > any
> > > existing state stores, so it might need to load quite a lot of data if
> > > nodes assigned certain partitions don't have that state-store (this is
> > > something we should look at improving).
> > >
> > > As for RocksDB tuning - you can provide an implementation of
> > > RocksDBConfigSetter via config: StreamsConfig.ROCKSDB_CONFIG_
> SETTER_CLASS
> > > it has a single method:
> > >
> > > public void setConfig(final String storeName, final Options options,
> > > final Map configs)
> > >
> > > in this method you can set various options on the provided Options
> > object.
> > > The options that might help in this case are:
> > > options.setWriteBufferSize(..)  - default in streams is 32MB
> > > options.setMaxWriteBufferNumer(..) - default in streams is 3
> > >
> > > However, i'm no expert on RocksDB and i suggest you have look at
> > > https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide for more
> > > info.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Fri, 25 Nov 2016 at 13:02 Frank Lyaruu  wrote:
> > >
> > > > @Damian:
> > > >
> > > > Yes, it ran before, and it has that 200gb blob worth of Rocksdb stuff
> > > >
> > > > @Svente: It's on a pretty high end san in a managed private cloud,
> I'm
> > > > unsure what the ultimate storage is, but I doubt there is a
> performance
> > > > problem there.
> > > >
> > > > On Fri, 25 Nov 2016 at 13:37, Svante Karlsson <
> svante.karls...@csi.se>
> > > > wrote:
> > > >
> > > > > What kind of disk are you using for the rocksdb store? ie spinning
> or
> > > > ssd?
> > > > >
> > > > > 2016-11-25 12:51 GMT+01:00 Damian Guy 

Re: KafkaStream: puncutuate() never called even when data is received by process()

2016-11-28 Thread Guozhang Wang
Yes we are considering to differentiate "process" and "punctuate" function
to be "data-driven" and "time-driven" computations. That is, triggering of
punctuate should NOT be depending on the arrival of messages, or the
message's associated timestamps.

As for now I think periodically inserting the "time markers" into input
topics as suggested by David to make the current "data-driven" punctuate
function to be triggered is a good idea.

Guozhang

On Sat, Nov 26, 2016 at 1:39 PM, David Garcia  wrote:

> I know that the Kafka team is working on a new way to reason about time.
> My team's solution was to not use punctuate...but this only works if you
> have guarantees that all of the tasks will receive messages..if not all the
> partitions.  Another solution is to periodically send canaries to all
> partitions your app is listening to.  In either case it's a bandaid.  I
> know the team is aware of this bug and they are working on it.  Hopefully
> it will be addressed in 0.10.1.1
>
> Sent from my iPhone
>
> > On Nov 24, 2016, at 1:55 AM, shahab  wrote:
> >
> > Thanks for the comments.
> > @David: yes, I have a source which is reading data from two topics and
> one
> > of them were empty while the second one was loaded with plenty of data.
> > So what do you suggest to solve this ?
> > Here is snippet of my code:
> >
> > StreamsConfig config = new StreamsConfig(configProperties);
> > TopologyBuilder builder = new TopologyBuilder();
> > AppSettingsFetcher appSettingsFetcher = initAppSettingsFetcher();
> >
> > StateStoreSupplier company_bucket= Stores.create("CBS")
> >.withKeys(Serdes.String())
> >.withValues(Serdes.String())
> >.persistent()
> >.build();
> >
> > StateStoreSupplier profiles= Stores.create("PR")
> >.withKeys(Serdes.String())
> >.withValues(Serdes.String())
> >.persistent()
> >.build();
> >
> >
> > builder
> >.addSource("deltaSource", topicName, LoaderListener.
> LoadedDeltaTopic)
> >
> >.addProcessor("deltaProcess1", () -> new DeltaProcessor(
> >
> >), "deltaSource")
> >.addProcessor("deltaProcess2", () -> new LoadProcessor(
> >
> >), "deltaProcess1")
> >.addStateStore(profiles, "deltaProcess2", "deltaProcess1")
> >.addStateStore(company_bucket, "deltaProcess2", "deltaProcess1");
> >
> > KafkaStreams streams = new KafkaStreams(builder, config);
> >
> > streams.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler()
> {
> >@Override
> >public void uncaughtException(Thread t, Throwable e) {
> >e.printStackTrace();
> >}
> > });
> >
> > streams.start();
> >
> >
> >> On Wed, Nov 23, 2016 at 8:30 PM, David Garcia 
> wrote:
> >>
> >> If you are consuming from more than one topic/partition, punctuate is
> >> triggered when the “smallest” time-value changes.  So, if there is a
> >> partition that doesn’t have any more messages on it, it will always have
> >> the smallest time-value and that time value won’t change…hence punctuate
> >> never gets called.
> >>
> >> -David
> >>
> >> On 11/23/16, 1:01 PM, "Matthias J. Sax"  wrote:
> >>
> >>Your understanding is correct:
> >>
> >>Punctuate is not triggered base on wall-clock time, but based in
> >>internally tracked "stream time" that is derived from
> >> TimestampExtractor.
> >>Even if you use WallclockTimestampExtractor, "stream time" is only
> >>advance if there are input records.
> >>
> >>Not sure why punctuate() is not triggered as you say that you do have
> >>arriving data.
> >>
> >>Can you share your code?
> >>
> >>
> >>
> >>-Matthias
> >>
> >>
> >>>On 11/23/16 4:48 AM, shahab wrote:
> >>> Hello,
> >>>
> >>> I am using low level processor and I set the context.schedule(1),
> >>> assuming that punctuate() method is invoked every 10 sec .
> >>> I have set
> >>> configProperties.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,
> >>> WallclockTimestampExtractor.class.getCanonicalName()) )
> >>>
> >>> Although data is keep coming to the topology (as I have logged the
> >> incoming
> >>> tuples to process() ),  punctuate() is never executed.
> >>>
> >>> What I am missing?
> >>>
> >>> best,
> >>> Shahab
> >>
> >>
> >>
> >>
>



-- 
-- Guozhang


Problem consuming using 0.10.1.0

2016-11-28 Thread Jon Yeargers
(caveat - was having this issue with 0.10.0 but thought an update might
help)

App has a simple layout: consume -> aggregate -> output to 2nd topic

Starting the app I see pages of this sort of message:

DEBUG  org.apache.zookeeper.ClientCnxn - Got ping response for sessionid:
0xb58abf2daaf0002 after 0ms


These repeat every 10 seconds until the app runs out of memory and gets
killed by the OS.


Watching the system from another window I can see the
`/tmp/kafka-streams/` space slowly consume many 10s of Gb. (> 60Gb
when it finally crashed).  IO wait jumps around quite a bit but often
exceed 20%.


Any thoughts on what I might have misconfigured in my cluster or my app? I
have some other (standard) producers and consumers but this one
KStream-based app is really struggling.


RE: Kafka consumers are not equally distributed

2016-11-28 Thread Ghosh, Achintya (Contractor)
Thank you, Guozhang.

I see a lot of this exception:

org.apache.kafka.clients.consumer.internals.ConsumerCoordinator::Offset commit 
failed. : TimeoutException: The request timed out. We are committing the offset 
manually by Asynch mode and session.timeout.ms is 5 mins and poll time 10 secs 
and still we see a lot of this exception. So could you please let us know what 
could be the reason for this exception. Here is our consumer configuration. 

enable.auto.commit=false
session.timeout.ms=29
request.timeout.ms=30 auto.offset.reset=earliest
kafka.consumer.concurreny=40
max.partitions_fetch_bytes=20485760
kafka.consumer.poll.timeout=1
kafka.consumer.syncCommits=false

Thanks
Achintya

-Original Message-
From: Guozhang Wang [mailto:wangg...@gmail.com] 
Sent: Friday, November 25, 2016 7:14 PM
To: users@kafka.apache.org
Subject: Re: Kafka consumers are not equally distributed

You can take a look at this FAQ wiki:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?

And even if you are using the new Java producer, if you specify the key and key 
distribution is not even, then it will not be evenly distributed.

Guozhang

On Fri, Nov 25, 2016 at 9:12 AM, Ghosh, Achintya (Contractor) < 
achintya_gh...@comcast.com> wrote:

> So what is the option to messages make it equally distributed from 
> that point? I mean is any other option to make the consumers to speed up?
>
> Thanks
> Acintya
>
> -Original Message-
> From: Guozhang Wang [mailto:wangg...@gmail.com]
> Sent: Friday, November 25, 2016 12:09 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka consumers are not equally distributed
>
> Note that consumer's fetching parallelism is per-partition, i.e., one 
> partition is fetched by only a single consumer instance, so even if 
> some partitions have heavy load other idle consumers will not come to 
> share the messages.
>
> If you observed that some partitions have no messages while others 
> have a lot, then it means the producing load on the partitions are not 
> evenly distributed, as I mentioned in the previous comment it is not a 
> consumer issue but a producer issue.
>
>
> Guozhang
>
> On Fri, Nov 25, 2016 at 7:11 AM, Ghosh, Achintya (Contractor) < 
> achintya_gh...@comcast.com> wrote:
>
> > Thank you Guozhang.
> >
> > Let me clarify : "some of the partitions are sitting idle and some 
> > of are overloaded", I mean we stopped the load after 9 hours as see 
> > the messages were processing very slow. That time we observed that 
> > some partitions had lot of messages and some were sitting idle. So 
> > my question why messages were not shared if we see some are 
> > overloaded and some are having 0 messages. Even we started the kafka 
> > servers and application servers too but nothing happened, still it 
> > was processing very slow and messages were not distributed. So we 
> > are concerned what should do this kind of situation and make the consumers 
> > more speedy.
> >
> > Thanks
> > Achintya
> >
> > -Original Message-
> > From: Guozhang Wang [mailto:wangg...@gmail.com]
> > Sent: Thursday, November 24, 2016 11:21 PM
> > To: users@kafka.apache.org
> > Subject: Re: Kafka consumers are not equally distributed
> >
> > The default partition assignment strategy is the RangePartitioner.
> > Note it is per-topic, so if you use the default partitioner then in 
> > your case 160 partitions of each of the topic will be assigned to 
> > the first 160 consumer instances, each getting two partitions, one 
> > partition from each. So the consumer should be balanced  on the
> consumer-instance basis.
> >
> > I'm not sure what you meant by "some of the partitions are sitting 
> > idle and some of are overloaded", do you mean that some partitions 
> > does not have new data coming in and others keep getting high 
> > traffic producing to it that the consumer cannot keep up? In this 
> > case it is no the consumer's issue, but the producer not producing 
> > in a balanced
> manner.
> >
> >
> >
> >
> > Guozhang
> >
> >
> >
> >
> > On Thu, Nov 24, 2016 at 7:45 PM, Ghosh, Achintya (Contractor) < 
> > achintya_gh...@comcast.com> wrote:
> >
> > > Java consumer. 0.9.1
> > >
> > > Thanks
> > > Achintya
> > >
> > > -Original Message-
> > > From: Guozhang Wang [mailto:wangg...@gmail.com]
> > > Sent: Thursday, November 24, 2016 8:28 PM
> > > To: users@kafka.apache.org
> > > Subject: Re: Kafka consumers are not equally distributed
> > >
> > > Which version of Kafka are you using with your consumer? Is it 
> > > Scala or Java consumers?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Wed, Nov 23, 2016 at 6:38 AM, Ghosh, Achintya (Contractor) < 
> > > achintya_gh...@comcast.com> wrote:
> > >
> > > > No, that is not the reason. Initially all the partitions were 
> > > > assigned the messages and those were processed very fast and sit 
> > > > idle even other partitions  are having a lot of messages 

Re: Kafka windowed table not aggregating correctly

2016-11-28 Thread Guozhang Wang
Sachin,

This is indeed a bit wired, and we'd like to try to re-produce your issue
locally. Do you have a sample input data for us to try out?

Guozhang

On Fri, Nov 25, 2016 at 10:12 PM, Sachin Mittal  wrote:

> Hi,
> I fixed that sorted set issue but I am facing a weird problem which I am
> not able to replicate.
>
> Here is the sample problem that I could isolate:
> My class is like this:
> public static class Message implements Comparable {
> public long ts;
> public String message;
> public String key;
> public Message() {};
> public Message(long ts, String message, String key) {
> this.ts = ts;
> this.key = key;
> this.message = message;
> }
> public int compareTo(Message paramT) {
> long ts1 = paramT.ts;
> return ts > ts1 ? 1 : -1;
> }
> }
>
> pipeline is like this:
> builder.stream(Serdes.String(), messageSerde, "test-window-stream")\
>  .map(new KeyValueMapper>() {
>  public KeyValue apply(String key, Message value) {
>  return new KeyValue(value.key, value);
>   }
>  })
> .through(Serdes.String(), messageSerde, "test-window-key-stream")
> .aggregateByKey(new Initializer() {
> public SortedSet apply() {
> return new TreeSet();
> }
> }, new Aggregator() {
> public SortedSet apply(String aggKey, Message value,
> SortedSet aggregate) {
> aggregate.add(value);
> return aggregate;
> }
> }, TimeWindows.of("stream-table", 10 * 1000L).advanceBy(5 * 1000L),
> Serdes.String(), messagesSerde)
> .foreach(new ForeachAction() {
> public void apply(Windowed key, SortedSet messages) {
> ...
> }
> });
>
> So basically I rekey the original message into another topic and then
> aggregate it based on that key.
> What I have observed is that when I used windowed aggregation the
> aggregator does not use previous aggregated value.
>
> public SortedSet apply(String aggKey, Message value,
> SortedSet aggregate) {
> aggregate.add(value);
> return aggregate;
> }
>
> So in the above function the aggregate is an empty set of every value
> entering into pipeline. When I remove the windowed aggregation, the
> aggregate set retains previously aggregated values in the set.
>
> I am just not able to wrap my head around it. When I ran this type of test
> locally on windows it is working fine. However a similar pipeline setup
> when run against production on linux is behaving strangely and always
> getting an empty aggregate set.
> Any idea what could be the reason, where should I look at the problem. Does
> length of key string matters here? I will later try to run the same simple
> setup on linux and see what happens. But this is a very strange behavior.
>
> Thanks
> Sachin
>
>
>
> On Wed, Nov 23, 2016 at 12:04 AM, Guozhang Wang 
> wrote:
>
> > Hello Sachin,
> >
> > In the implementation of SortedSet, if the object's implemented the
> > Comparable interface, that compareTo function is applied in "
> > aggregate.add(value);", and hence if it returns 0, this element will not
> be
> > added since it is a Set.
> >
> >
> > Guozhang
> >
> >
> > On Mon, Nov 21, 2016 at 10:06 PM, Sachin Mittal 
> > wrote:
> >
> > > Hi,
> > > What I find is that when I use sorted set as aggregation it fails to
> > > aggregate the values which have compareTo returning 0.
> > >
> > > My class is like this:
> > > public class Message implements Comparable {
> > > public long ts;
> > > public String message;
> > > public Message() {};
> > > public Message(long ts, String message) {
> > > this.ts = ts;
> > > this.message = message;
> > > }
> > > public int compareTo(Message paramT) {
> > > long ts1 = paramT.ts;
> > > return ts == ts1 ? 0 : ts > ts1 ? 1 : -1;
> > > }
> > > }
> > >
> > > pipeline is like this:
> > > builder.stream(Serdes.String(), messageSerde, "test-window-stream")
> > > .aggregateByKey(new Initializer() {
> > > public SortedSet apply() {
> > > return new TreeSet();
> > > }
> > > }, new Aggregator() {
> > > public SortedSet apply(String aggKey, Message value,
> > > SortedSet aggregate) {
> > > aggregate.add(value);
> > > return aggregate;
> > > }
> > > }, TimeWindows.of("stream-table", 10 * 1000L).advanceBy(5 * 1000L),
> > > Serdes.String(), messagesSerde)
> > > .foreach(new ForeachAction() {
> > > public void apply(Windowed key, SortedSet
> messages)
> > {
> > > ...
> > > }
> > > });
> > >
> > > So any message published between 10 and 20 seconds gets aggregated in
> 10
> > -
> > > 20 bucket and I print the size of the 

Re: Interactive Queries

2016-11-28 Thread Alan Kash
Thanks All.

On Mon, Nov 28, 2016 at 3:09 AM, Michael Noll  wrote:

> There are also some examples/demo applications at
> https://github.com/confluentinc/examples that demonstrate the use of
> interactive queries:
>
> -
> https://github.com/confluentinc/examples/blob/3.
> 1.x/kafka-streams/src/main/java/io/confluent/examples/
> streams/interactivequeries/kafkamusic/KafkaMusicExample.java
>
> -
> https://github.com/confluentinc/examples/blob/3.
> 1.x/kafka-streams/src/main/java/io/confluent/examples/
> streams/interactivequeries/WordCountInteractiveQueriesExample.java
>
> Note: The `3.1.x` branch is for Kafka 0.10.1.
>
> -Michael
>
>
>
>
> On Sun, Nov 27, 2016 at 3:35 AM, David Garcia 
> wrote:
>
> > I would start here: http://docs.confluent.io/3.1.0/streams/index.html
> >
> >
> > On 11/26/16, 8:27 PM, "Alan Kash"  wrote:
> >
> > Hi,
> >
> > New to Kafka land.
> >
> > I am looking into Interactive queries feature, which transforms
> Topics
> > into
> > Tables with history, neat !
> >
> > 1. What kind of queries we can run on the store ?  Point or Range ?
> > 2. Is Indexing supported ? primary or seconday ?
> > 3. Query language - SQL ? Custom Java Native Query ?
> >
> > I see rocksdb is the persistent layer.
> >
> > Did the team look at JCache API (JSR 107) -
> > https://jcp.org/en/jsr/detail?id=107 ?
> >
> > Thanks,
> > Alan
> >
> >
> >
>


Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
I can’t see the code, I think it’s not added to the email?


Thanks
Eno

 

From: Hamza HACHANI 
Reply-To: 
Date: Monday, 28 November 2016 at 13:25
To: "users@kafka.apache.org" 
Subject: RE: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms

 

Hi Eno,

Here is the code for the application ExclusiveStatsConnectionDevice which is 
composed of 4 nodes.

For example when i put print("")  you would sess the problem of the 
infinite loop.

I preferred to send the whole code sto make it easier to you even though you 
don't need all of it

De : Eno Thereska 
Envoyé : lundi 28 novembre 2016 01:12:14
À : users@kafka.apache.org
Objet : Re: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms 

 

Hi Hamza,

Would you be willing to share some of your code so we can have a look?

Thanks
Eno
> On 28 Nov 2016, at 12:58, Hamza HACHANI  wrote:
> 
> Hi Eno.
> 
> The problem is that there is no infinite while loop that i write.
> 
> So I can't understand why the application is doing so.
> 
> 
> Hamza
> 
> 
> De : Eno Thereska 
> Envoyé : dimanche 27 novembre 2016 23:21:24
> À : users@kafka.apache.org
> Objet : Re: Abnormal working in the method punctuate and error linked to 
> seesion.timeout.ms
> 
> Hi Hamza,
> 
> If you have an infinite while loop, that would mean the app would spend all 
> the time in that loop and poll() would never be called.
> 
> Eno
> 
>> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
>> 
>> Hi,
>> 
>> I've some troubles with the method puctuate.In fact when i would like to 
>> print a string in the method punctuate.
>> 
>> this string would be printed in an indefinitly way as if I printed (while 
>> (true){print(string)}.
>> 
>> I can't understand what happened.Does any body has an explenation ?.
>> 
>> 
>> Besides In the other hand,for another application it print the following 
>> error :
>> 
>> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
>> (org.apache.kafka.streams.processor.internals.StreamThread)
>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
>> completed since the group has already rebalanced and assigned the partitions 
>> to another member. This means that the time between subsequent calls to 
>> poll() was longer than the configured session.timeout.ms, which typically 
>> implies that the poll loop is spending too much time message processing. You 
>> can address this either by increasing the session timeout or by reducing the 
>> maximum size of batches returned in poll() with max.poll.records.
>> 
>> When i tried to modify the configuration of the consumer nothing happened.
>> 
>> Any Ideas for this too ?
>> 
>> Thanks in Advance.
>> 
>> 
>> Hamza
>> 
> 



Re: Messages intermittently get lost

2016-11-28 Thread Zac Harvey
Thanks Martin, I will look at those links.


But you seem to be 100% confident that the problem is with ZooKeeper...can I 
ask why? What is it about my problem description that makes you think this is 
an issue with ZooKeeper?


From: Martin Gainty 
Sent: Friday, November 25, 2016 1:46:28 PM
To: users@kafka.apache.org
Subject: Re: Messages intermittently get lost




From: Zac Harvey 
Sent: Friday, November 25, 2016 6:17 AM
To: users@kafka.apache.org
Subject: Re: Messages intermittently get lost

Hi Martin,


My server.properties looks like this:


listeners=PLAINTEXT://0.0.0.0:9092

advertised.host.name=

broker.id=2

port=9092

num.partitions=4

zookeeper.connect=zkA:2181,zkB:2181,zkC:2181

MG>can you check status for each ZK Node in the quorum?

sh>$ZOOKEEPER_HOME/bin/zkServer.sh status

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.pd.doc/doc/containerstreamszookeeper.html

ZooKeeper problems and solutions - 
IBM
www.ibm.com
Use these solutions to resolve the problems that you might encounter with 
Apache ZooKeeper.



MG>*greetings from Plimoth Mass*

MG>M


num.network.threads=3

num.io.threads=8

socket.send.buffer.bytes=102400

socket.receive.buffer.bytes=102400

log.dirs=/tmp/kafka-logs

num.recovery.threads.per.data.dir=1

log.retention.hours=168

log.segment.bytes=1073741824

log.retention.check.interval.ms=30

zookeeper.connection.timeout.ms=6000

delete.topic.enable=true

auto.leader.rebalance.enable=true


Above, 'zkA', 'zkB' and 'zkC' are defined in /etc/hosts and are valid ZK 
servers, and  is the public DNS of the EC2 (AWS) node that 
this Kafka is running on.


Anything look incorrect to you?


And yes, yesterday was a holiday, but there is only work! I'll celebrate one 
big, long holiday when I'm dead!


Thanks for any-and-all input here!


Best,

Zac



From: Martin Gainty 
Sent: Thursday, November 24, 2016 9:03:33 AM
To: users@kafka.apache.org
Subject: Re: Messages intermittently get lost

Hi Zach


there is a rumour that today thursday is a holiday?

in server.properties how are you configuring your server?


specifically what are these attributes?


num.network.threads=


num.io.threads=


socket.send.buffer.bytes=


socket.receive.buffer.bytes=


socket.request.max.bytes=


num.partitions=


num.recovery.threads.per.data.dir=


?

Martin
__




From: Zac Harvey 
Sent: Thursday, November 24, 2016 7:05 AM
To: users@kafka.apache.org
Subject: Re: Messages intermittently get lost

Anybody?!? This is very disconcerting!


From: Zac Harvey 
Sent: Wednesday, November 23, 2016 5:07:45 AM
To: users@kafka.apache.org
Subject: Messages intermittently get lost

I am playing around with Kafka and have a simple setup:


* 1-node Kafka (Ubuntu) server

* 3-node ZK cluster (each on their own Ubuntu server)


I have a consumer written in Scala and am using the kafka-console-producer 
(v0.10) that ships with the distribution.


I'd say about 20% of the messages I send via the producer never get consumed by 
the Scala process (which is running continuously). No errors on either side 
(producer or consumer): the producer sends, and, nothing...


Any ideas as to what might be going on here, or how I could start 
troubleshooting?


Thanks!


RE: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Hamza HACHANI
the print is in line 40 of the class Base...


De : Hamza HACHANI 
Envoyé : lundi 28 novembre 2016 01:25:08
À : users@kafka.apache.org
Objet : RE: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms


Hi Eno,

Here is the code for the application ExclusiveStatsConnectionDevice which is 
composed of 4 nodes.

For example when i put print("")  you would sess the problem of the 
infinite loop.

I preferred to send the whole code sto make it easier to you even though you 
don't need all of it


De : Eno Thereska 
Envoyé : lundi 28 novembre 2016 01:12:14
À : users@kafka.apache.org
Objet : Re: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms

Hi Hamza,

Would you be willing to share some of your code so we can have a look?

Thanks
Eno
> On 28 Nov 2016, at 12:58, Hamza HACHANI  wrote:
>
> Hi Eno.
>
> The problem is that there is no infinite while loop that i write.
>
> So I can't understand why the application is doing so.
>
>
> Hamza
>
> 
> De : Eno Thereska 
> Envoyé : dimanche 27 novembre 2016 23:21:24
> À : users@kafka.apache.org
> Objet : Re: Abnormal working in the method punctuate and error linked to 
> seesion.timeout.ms
>
> Hi Hamza,
>
> If you have an infinite while loop, that would mean the app would spend all 
> the time in that loop and poll() would never be called.
>
> Eno
>
>> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
>>
>> Hi,
>>
>> I've some troubles with the method puctuate.In fact when i would like to 
>> print a string in the method punctuate.
>>
>> this string would be printed in an indefinitly way as if I printed (while 
>> (true){print(string)}.
>>
>> I can't understand what happened.Does any body has an explenation ?.
>>
>>
>> Besides In the other hand,for another application it print the following 
>> error :
>>
>> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
>> (org.apache.kafka.streams.processor.internals.StreamThread)
>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
>> completed since the group has already rebalanced and assigned the partitions 
>> to another member. This means that the time between subsequent calls to 
>> poll() was longer than the configured session.timeout.ms, which typically 
>> implies that the poll loop is spending too much time message processing. You 
>> can address this either by increasing the session timeout or by reducing the 
>> maximum size of batches returned in poll() with max.poll.records.
>>
>> When i tried to modify the configuration of the consumer nothing happened.
>>
>> Any Ideas for this too ?
>>
>> Thanks in Advance.
>>
>>
>> Hamza
>>
>



RE: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Hamza HACHANI
Hi Eno,

Here is the code for the application ExclusiveStatsConnectionDevice which is 
composed of 4 nodes.

For example when i put print("")  you would sess the problem of the 
infinite loop.

I preferred to send the whole code sto make it easier to you even though you 
don't need all of it


De : Eno Thereska 
Envoyé : lundi 28 novembre 2016 01:12:14
À : users@kafka.apache.org
Objet : Re: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms

Hi Hamza,

Would you be willing to share some of your code so we can have a look?

Thanks
Eno
> On 28 Nov 2016, at 12:58, Hamza HACHANI  wrote:
>
> Hi Eno.
>
> The problem is that there is no infinite while loop that i write.
>
> So I can't understand why the application is doing so.
>
>
> Hamza
>
> 
> De : Eno Thereska 
> Envoyé : dimanche 27 novembre 2016 23:21:24
> À : users@kafka.apache.org
> Objet : Re: Abnormal working in the method punctuate and error linked to 
> seesion.timeout.ms
>
> Hi Hamza,
>
> If you have an infinite while loop, that would mean the app would spend all 
> the time in that loop and poll() would never be called.
>
> Eno
>
>> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
>>
>> Hi,
>>
>> I've some troubles with the method puctuate.In fact when i would like to 
>> print a string in the method punctuate.
>>
>> this string would be printed in an indefinitly way as if I printed (while 
>> (true){print(string)}.
>>
>> I can't understand what happened.Does any body has an explenation ?.
>>
>>
>> Besides In the other hand,for another application it print the following 
>> error :
>>
>> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
>> (org.apache.kafka.streams.processor.internals.StreamThread)
>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
>> completed since the group has already rebalanced and assigned the partitions 
>> to another member. This means that the time between subsequent calls to 
>> poll() was longer than the configured session.timeout.ms, which typically 
>> implies that the poll loop is spending too much time message processing. You 
>> can address this either by increasing the session timeout or by reducing the 
>> maximum size of batches returned in poll() with max.poll.records.
>>
>> When i tried to modify the configuration of the consumer nothing happened.
>>
>> Any Ideas for this too ?
>>
>> Thanks in Advance.
>>
>>
>> Hamza
>>
>



Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
Hi Hamza,

Would you be willing to share some of your code so we can have a look?

Thanks
Eno
> On 28 Nov 2016, at 12:58, Hamza HACHANI  wrote:
> 
> Hi Eno.
> 
> The problem is that there is no infinite while loop that i write.
> 
> So I can't understand why the application is doing so.
> 
> 
> Hamza
> 
> 
> De : Eno Thereska 
> Envoyé : dimanche 27 novembre 2016 23:21:24
> À : users@kafka.apache.org
> Objet : Re: Abnormal working in the method punctuate and error linked to 
> seesion.timeout.ms
> 
> Hi Hamza,
> 
> If you have an infinite while loop, that would mean the app would spend all 
> the time in that loop and poll() would never be called.
> 
> Eno
> 
>> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
>> 
>> Hi,
>> 
>> I've some troubles with the method puctuate.In fact when i would like to 
>> print a string in the method punctuate.
>> 
>> this string would be printed in an indefinitly way as if I printed (while 
>> (true){print(string)}.
>> 
>> I can't understand what happened.Does any body has an explenation ?.
>> 
>> 
>> Besides In the other hand,for another application it print the following 
>> error :
>> 
>> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
>> (org.apache.kafka.streams.processor.internals.StreamThread)
>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
>> completed since the group has already rebalanced and assigned the partitions 
>> to another member. This means that the time between subsequent calls to 
>> poll() was longer than the configured session.timeout.ms, which typically 
>> implies that the poll loop is spending too much time message processing. You 
>> can address this either by increasing the session timeout or by reducing the 
>> maximum size of batches returned in poll() with max.poll.records.
>> 
>> When i tried to modify the configuration of the consumer nothing happened.
>> 
>> Any Ideas for this too ?
>> 
>> Thanks in Advance.
>> 
>> 
>> Hamza
>> 
> 



RE: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Hamza HACHANI
Hi Eno.

The problem is that there is no infinite while loop that i write.

So I can't understand why the application is doing so.


Hamza


De : Eno Thereska 
Envoyé : dimanche 27 novembre 2016 23:21:24
À : users@kafka.apache.org
Objet : Re: Abnormal working in the method punctuate and error linked to 
seesion.timeout.ms

Hi Hamza,

If you have an infinite while loop, that would mean the app would spend all the 
time in that loop and poll() would never be called.

Eno

> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
>
> Hi,
>
> I've some troubles with the method puctuate.In fact when i would like to 
> print a string in the method punctuate.
>
> this string would be printed in an indefinitly way as if I printed (while 
> (true){print(string)}.
>
> I can't understand what happened.Does any body has an explenation ?.
>
>
> Besides In the other hand,for another application it print the following 
> error :
>
> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured session.timeout.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
>
> When i tried to modify the configuration of the consumer nothing happened.
>
> Any Ideas for this too ?
>
> Thanks in Advance.
>
>
> Hamza
>



Re: Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Eno Thereska
Hi Hamza,

If you have an infinite while loop, that would mean the app would spend all the 
time in that loop and poll() would never be called.  

Eno

> On 28 Nov 2016, at 10:49, Hamza HACHANI  wrote:
> 
> Hi,
> 
> I've some troubles with the method puctuate.In fact when i would like to 
> print a string in the method punctuate.
> 
> this string would be printed in an indefinitly way as if I printed (while 
> (true){print(string)}.
> 
> I can't understand what happened.Does any body has an explenation ?.
> 
> 
> Besides In the other hand,for another application it print the following 
> error :
> 
> WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
> (org.apache.kafka.streams.processor.internals.StreamThread)
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
> completed since the group has already rebalanced and assigned the partitions 
> to another member. This means that the time between subsequent calls to 
> poll() was longer than the configured session.timeout.ms, which typically 
> implies that the poll loop is spending too much time message processing. You 
> can address this either by increasing the session timeout or by reducing the 
> maximum size of batches returned in poll() with max.poll.records.
> 
> When i tried to modify the configuration of the consumer nothing happened.
> 
> Any Ideas for this too ?
> 
> Thanks in Advance.
> 
> 
> Hamza
> 



Abnormal working in the method punctuate and error linked to seesion.timeout.ms

2016-11-28 Thread Hamza HACHANI
Hi,

I've some troubles with the method puctuate.In fact when i would like to print 
a string in the method punctuate.

this string would be printed in an indefinitly way as if I printed (while 
(true){print(string)}.

I can't understand what happened.Does any body has an explenation ?.


Besides In the other hand,for another application it print the following error :

 WARN Failed to commit StreamTask #0_0 in thread [StreamThread-1]:  
(org.apache.kafka.streams.processor.internals.StreamThread)
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured session.timeout.ms, which typically implies that the 
poll loop is spending too much time message processing. You can address this 
either by increasing the session timeout or by reducing the maximum size of 
batches returned in poll() with max.poll.records.

When i tried to modify the configuration of the consumer nothing happened.

Any Ideas for this too ?

Thanks in Advance.


Hamza



Re: 0.8 consumers compatibility with Kafka 0.10

2016-11-28 Thread Ismael Juma
Hi Vladi,

Generally, newer brokers are compatible with older clients. As Vladimir
said, the upgrade page (http://kafka.apache.org/documentation.html#upgrade)
includes detailed information about changes that could have a compatibility
impact.

Ismael

On Sun, Nov 27, 2016 at 8:09 PM, Vladi Feigin  wrote:

> Hi All,
>
> We are using low-level API consumer in Kaka 0.8.2
> The question : Is Kafka 0.10 backward compatible with this old consumer ?
> Or we have to rewrite consumers when moving to 0.10?
>
> Thank you,
> Vladi
>
> --
>
> --
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of the
> addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>


Re: Interactive Queries

2016-11-28 Thread Michael Noll
There are also some examples/demo applications at
https://github.com/confluentinc/examples that demonstrate the use of
interactive queries:

-
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/kafkamusic/KafkaMusicExample.java

-
https://github.com/confluentinc/examples/blob/3.1.x/kafka-streams/src/main/java/io/confluent/examples/streams/interactivequeries/WordCountInteractiveQueriesExample.java

Note: The `3.1.x` branch is for Kafka 0.10.1.

-Michael




On Sun, Nov 27, 2016 at 3:35 AM, David Garcia  wrote:

> I would start here: http://docs.confluent.io/3.1.0/streams/index.html
>
>
> On 11/26/16, 8:27 PM, "Alan Kash"  wrote:
>
> Hi,
>
> New to Kafka land.
>
> I am looking into Interactive queries feature, which transforms Topics
> into
> Tables with history, neat !
>
> 1. What kind of queries we can run on the store ?  Point or Range ?
> 2. Is Indexing supported ? primary or seconday ?
> 3. Query language - SQL ? Custom Java Native Query ?
>
> I see rocksdb is the persistent layer.
>
> Did the team look at JCache API (JSR 107) -
> https://jcp.org/en/jsr/detail?id=107 ?
>
> Thanks,
> Alan
>
>
>