Re: producer-consumer issues during deployments

2015-11-26 Thread Prabhjot Bharaj
Hi,

Thanks for your reply
We have 4 phases of deploys and in each phase, we can take down few machines
These releases happen every 2 weeks, because on all machines, there are a
bunch of other micro services running along with the core system - Kafka in
this case

My only concern is that during runtime, I.e. between 2 releases, the
replica distribution per topic can become disoriented because of some
restarts or occasional machines failures/reboots

Because of that, the steps that you've mentioned would become operational
nightmare for us.

What I'm looking for is a more automated solution e.g. even if all the
replicas for a partition are down, the producers (running from around 50
machines) should switch to the other available partitions, until this
partition becomes available
Also, on the consumer side, consumers should not fail but keep consuming
from the available partitions until this partition comes up

Is it possible with the new producer and new consumer or high level
consumer?

Thanks,
Prabhjot
On Nov 27, 2015 12:00 AM, "Ben Stopford" <b...@confluent.io> wrote:

> Hi Prabhjot
>
> I may have slightly misunderstood your question so apologies if that’s the
> case. The general approach to releases is to use a rolling upgrade where
> you take one machine offline at a time, restart it, wait for it to come
> online (you can monitor this via JMX) then move onto the next. If you’re
> taking multiple machines offline at the same time you need to be careful
> about where the replicas for those machines reside. You can examine these
> individually for each topic via kafka-topcis.sh.
>
> Regarding your questions the following points may be of use:
>
> - Only one replica (the leader) will be available for writing at any one
> time in Kafka. If you offline machines then Kafka will switch over to use
> replicas on other machines if they are available.
> - The behaviour of produce requests will depend on the acknowledgment
> setting the producer provides, the setting for minimum in sync replicas and
> how many replicas remain standing after the failure. There are a few things
> going on here but they’re explained quite well here <
> http://kafka.apache.org/090/documentation.html#design_ha>.
> - Consumers consume from the leader also so if the leader for a partition
> is online then you will be able to consumer from it. If the leader is on a
> machine that goes offline then consumption will pause whilst leadership
> switches over to a replica.
>
> All the best
> B
>
> > On 26 Nov 2015, at 17:58, Prabhjot Bharaj <prabhbha...@gmail.com> wrote:
> >
> > Hi,
> >
> > Request your expertise on these doubts of mine
> >
> > Thanks,
> > Prabhjot
> >
> > On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> We arrange our kafka machines in groups and deploy these phases.
> >>
> >> For kafka, we’ll have to map groups with phases. During each phase of
> the
> >> release, all the machines in that group can go down.
> >>
> >> When this happens, there are a couple of cases:-
> >>
> >>   1. All replicas are residing in a group of machines which will all go
> >>   down in this phase
> >>  - Affect on Producer –
> >> - What happens to the produce requests (whether produce can
> >> dynamically keep writing to the remaining partitions now)
> >> - What happens to the already queued requests which were being
> >> sent to the earlier replicas – they will fail (we’ll have to
> use producer
> >> callback feature to take care of retrying in case the above step
> >> works fine)
> >>  - Affect on Consumer -
> >> - Can the consumers consume from a lesser number of partitions?
> >> - Does the consumer 'consume' api gives any callback/failure
> >> when all replicas of a partition go down?
> >>
> >> If you have come across any of the above cases, please provide how you
> >> solved the problem ? or whether everything works just well with Kafka
> >> during deployments and my cases described above are all invalid or
> handled
> >> by kafka and its clients internally ?
> >>
> >> Thanks,
> >> Prabhjot
> >>
> >
> >
> >
> > --
> > -
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>


producer-consumer issues during deployments

2015-11-26 Thread Prabhjot Bharaj
Hi,

We arrange our kafka machines in groups and deploy these phases.

For kafka, we’ll have to map groups with phases. During each phase of the
release, all the machines in that group can go down.

When this happens, there are a couple of cases:-

   1. All replicas are residing in a group of machines which will all go
   down in this phase
  - Affect on Producer –
 - What happens to the produce requests (whether produce can
 dynamically keep writing to the remaining partitions now)
 - What happens to the already queued requests which were being
 sent to the earlier replicas – they will fail (we’ll have to
use producer
 callback feature to take care of retrying in case the above step
 works fine)
  - Affect on Consumer -
 - Can the consumers consume from a lesser number of partitions?
 - Does the consumer 'consume' api gives any callback/failure when
 all replicas of a partition go down?

If you have come across any of the above cases, please provide how you
solved the problem ? or whether everything works just well with Kafka
during deployments and my cases described above are all invalid or handled
by kafka and its clients internally ?

Thanks,
Prabhjot


Re: producer-consumer issues during deployments

2015-11-26 Thread Prabhjot Bharaj
Hi,

Request your expertise on these doubts of mine

Thanks,
Prabhjot

On Thu, Nov 26, 2015 at 4:43 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hi,
>
> We arrange our kafka machines in groups and deploy these phases.
>
> For kafka, we’ll have to map groups with phases. During each phase of the
> release, all the machines in that group can go down.
>
> When this happens, there are a couple of cases:-
>
>1. All replicas are residing in a group of machines which will all go
>down in this phase
>   - Affect on Producer –
>  - What happens to the produce requests (whether produce can
>  dynamically keep writing to the remaining partitions now)
>  - What happens to the already queued requests which were being
>  sent to the earlier replicas – they will fail (we’ll have to use 
> producer
>  callback feature to take care of retrying in case the above step
>  works fine)
>   - Affect on Consumer -
>  - Can the consumers consume from a lesser number of partitions?
>  - Does the consumer 'consume' api gives any callback/failure
>  when all replicas of a partition go down?
>
> If you have come across any of the above cases, please provide how you
> solved the problem ? or whether everything works just well with Kafka
> during deployments and my cases described above are all invalid or handled
> by kafka and its clients internally ?
>
> Thanks,
> Prabhjot
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: kafka 0.8 producer issue

2015-11-25 Thread Prabhjot Bharaj
Hi,

>From the information that you've provided, I think your callback is the
culprit here. It is seen from the stacktrace:-

at com.hpe.ssmamp.kafka.KafkaPDFAProducer$1.onCompletion(
KafkaPDFAProducer.java:62)

Please provide more information like a code snippet etc, so that we can
tell more

Thanks,
Prabhjot

On Wed, Nov 25, 2015 at 9:24 PM, Kudumula, Surender <
surender.kudum...@hpe.com> wrote:

> Hi all
> I am trying to get the producer working. It was working before but now
> getting the following issue. I have created a new topic as well just in
> case if it was the issue with topic but still no luck. I have increased the
> message size in broker as iam trying to send atleast 3mb message here in
> byte array format. Any suggestion please???
>
> 2015-11-25 15:46:11 INFO  Login:185 - TGT refresh sleeping until: Tue Dec
> 01 07:03:07 GMT 2015
> 2015-11-25 15:46:11 INFO  KafkaProducer:558 - Closing the Kafka producer
> with timeoutMillis = 9223372036854775807 ms.
> 2015-11-25 15:46:12 ERROR RecordBatch:96 - Error executing user-provided
> callback on message for topic-partition ResponseTopic-0:
> java.lang.NullPointerException
> at
> com.hpe.ssmamp.kafka.KafkaPDFAProducer$1.onCompletion(KafkaPDFAProducer.java:62)
> at
> org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:93)
> at
> org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:285)
> at
> org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:253)
> at
> org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:55)
> at
> org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:328)
> at
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:237)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:209)
> at
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:140)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>


-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: kafka producer and client use same groupid

2015-11-24 Thread Prabhjot Bharaj
Hi Surender,

Please elaborate on your design
Consumers don't talk to producers directly, Kafka is a brokered system, and
Kafka sits between producers and consumers
Also, consumers consume from partitions of a topic and producers write to
partitions in a topic
These partitions and the logical abstraction -topic-reside on Kafka and
zookeeper respectively

Thanks,
Prabhjot
On Nov 24, 2015 11:38 PM, "Kudumula, Surender" 
wrote:

> Hi all
> Is there anyway we can ensure in 0.8 that kafka remote producer and remote
> consumer work on the same groupId as my java consumer cannot consume
> messages from remote producer. Thanks
>
>
>
>


Re: All brokers are running but some partitions' leader is -1

2015-11-23 Thread Prabhjot Bharaj
Hi,

With the information provided, these are the steps I can think of (based on
the experience I had with kafka):-

1. do a describe on the topic. See if the partitions and replicas are
evenly distributed amongst all. If not, you might want to try the 'Reassign
Partitions Tool' -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-6.ReassignPartitionsTool
2. is/are some partition(s) getting more data than others leading to an
imbalance of disk space amongst the nodes in the cluster, to an extent that
the kafka server process goes down on one or more machines in the cluster ?
3. From what I understand, your kafka and spark machines are the same ?? !!
how much memory usage the replica-0 has when your spark cluster is running
full throttle ?

Workaround -
Try running the Preferred Replica Leader Election Tool -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.PreferredReplicaLeaderElectionTool
to make some replica (the one that you noticed earlier when the cluster was
all good) as the leader for this partition

Regards,
Prabhjot

On Tue, Nov 24, 2015 at 7:11 AM, Gwen Shapira  wrote:

> We fixed many many bugs since August. Since we are about to release 0.9.0
> (with SSL!), maybe wait a day and go with a released and tested version.
>
> On Mon, Nov 23, 2015 at 3:01 PM, Qi Xu  wrote:
>
> > Forgot to mention is that the Kafka version we're using is from Aug's
> > Trunk branch---which has the SSL support.
> >
> > Thanks again,
> > Qi
> >
> >
> > On Mon, Nov 23, 2015 at 2:29 PM, Qi Xu  wrote:
> >
> >> Loop another guy from our team.
> >>
> >> On Mon, Nov 23, 2015 at 2:26 PM, Qi Xu  wrote:
> >>
> >>> Hi folks,
> >>> We have a 10 node cluster and have several topics. Each topic has about
> >>> 256 partitions with 3 replica factor. Now we run into an issue that in
> some
> >>> topic, a few partition (< 10)'s leader is -1 and all of them has only
> one
> >>> synced partition.
> >>>
> >>> From the Kafka manager, here's the snapshot:
> >>> [image: Inline image 2]
> >>>
> >>> [image: Inline image 1]
> >>>
> >>> here's the state log:
> >>> [2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated
> >>> state change for partition [userlogs,84] from OnlinePartition to
> >>> OnlinePartition failed (state.change.logger)
> >>> kafka.common.StateChangeFailedException: encountered error while
> >>> electing leader for partition [userlogs,84] due to: Preferred replica
> 0 for
> >>> partition [userlogs,84] is either not alive or not in the isr. Current
> >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}].
> >>> Caused by: kafka.common.StateChangeFailedException: Preferred replica 0
> >>> for partition [userlogs,84] is either not alive or not in the isr.
> Current
> >>> leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]
> >>>
> >>> My question is:
> >>> 1) how could this happen and how can I fix it or work around it?
> >>> 2) Is 256 partitions too big? We have about 200+ cores for spark
> >>> streaming job.
> >>>
> >>> Thanks,
> >>> Qi
> >>>
> >>>
> >>
> >
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: Kafka Scalability & Partition

2015-11-13 Thread Prabhjot Bharaj
Hi,

Having such a model will not scale. I think it it's mentioned in earlier
posts as well as some wikis available out there

Kafka works very well if you have more partitions, so you can massively
parallel writes to Kafka.
Also, your application need not send partition ids/numbers, unless you've
some logic built over it. Kafka producer client will take care of that

Regards,
Prabhjot
On Nov 13, 2015 3:34 PM, "Thamaraikannan Subramanian" <
thamarai.ba...@gmail.com> wrote:

> All,
>
> I am looking for clarifications. I want to have Highly Scalable Kafka
> Environment, but I will have my topics without Partitions as my Application
> mayn't send the Partition Id.
>
> My question, will the Kafka Scale (Kafka Clusters) when there are Multiple
> Topics but they doesn't have any Partition (in other words single
> Partition)?
>
> Please clarify. Thanks in Advance.
>


New and updated producers and consumers

2015-11-05 Thread Prabhjot Bharaj
Hi,

I'm using the latest update: 0.8.2.2
I would like to use the latest producer and consumer apis
over the past few weeks, I have tried to do some performance benchmarking
using the producer and consumer scripts provided in the bin directory. It
was a fun activity and I have learnt a lot about kafka.

But, I have also experienced that sometimes the implementation of the
performance scripts was not up-to-date or some items were different than
the documentation

Now, I would like to develop my application with kafka. I'm comfortable
using scala/java

Please let me know which producer and consumer (both high level and simple)
class/object should I be using

Thanks a lot,
Prabhjot


Re: New and updated producers and consumers

2015-11-05 Thread Prabhjot Bharaj
Adding users as well

On Thu, Nov 5, 2015 at 3:37 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hi,
>
> I'm using the latest update: 0.8.2.2
> I would like to use the latest producer and consumer apis
> over the past few weeks, I have tried to do some performance benchmarking
> using the producer and consumer scripts provided in the bin directory. It
> was a fun activity and I have learnt a lot about kafka.
>
> But, I have also experienced that sometimes the implementation of the
> performance scripts was not up-to-date or some items were different than
> the documentation
>
> Now, I would like to develop my application with kafka. I'm comfortable
> using scala/java
>
> Please let me know which producer and consumer (both high level and
> simple) class/object should I be using
>
> Thanks a lot,
> Prabhjot
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: New and updated producers and consumers

2015-11-05 Thread Prabhjot Bharaj
Hello Folks,

Requesting your expertise on this.
I see that under core/src/main/scala/kafka/producer/, there are many
implementations - Producer.scala and SyncProducer.scala

Also, going via the producerPerformance.scala, there are 2 implementations
- NewShinyProducer (which points to KafkaProducer.java) and the OldProducer

Similar might be the case with Consumers, but I have not seen that yet.

Please let me know which producer and consumer is supposed to be used and
which ones will be phased out in future releases, so I can focus on only 1
type of producer and consumer (high level as well as simple)

Thanks,
Prabhjot

Thanks,
Prabhjot

On Thu, Nov 5, 2015 at 3:55 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Adding users as well
>
> On Thu, Nov 5, 2015 at 3:37 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm using the latest update: 0.8.2.2
>> I would like to use the latest producer and consumer apis
>> over the past few weeks, I have tried to do some performance benchmarking
>> using the producer and consumer scripts provided in the bin directory. It
>> was a fun activity and I have learnt a lot about kafka.
>>
>> But, I have also experienced that sometimes the implementation of the
>> performance scripts was not up-to-date or some items were different than
>> the documentation
>>
>> Now, I would like to develop my application with kafka. I'm comfortable
>> using scala/java
>>
>> Please let me know which producer and consumer (both high level and
>> simple) class/object should I be using
>>
>> Thanks a lot,
>> Prabhjot
>>
>
>
>
> --
> -
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: New and updated producers and consumers

2015-11-05 Thread Prabhjot Bharaj
Hi Jeff,

Thanks for your response.
On scala side, is there a Producer implementation that I could use? is the
java based KafkaProducer (org.apache.kafka.clients.producer.KafkaProducer;)
same as Producer in Producer.scala ?

Thanks,
Prabhjot

On Thu, Nov 5, 2015 at 11:28 PM, Jeff Holoman <jholo...@cloudera.com> wrote:

> The best thing that I know is the latest javadoc that's committed to trunk:
>
>
> https://github.com/apache/kafka/blob/ef5d168cc8f10ad4f0efe9df4cbe849a4b35496e/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java
>
> Thanks
>
> Jeff
>
>
>
> On Thu, Nov 5, 2015 at 12:51 PM, Cliff Rhyne <crh...@signal.co> wrote:
>
> > Hi Jeff,
> >
> > Is there a writeup of how to use the new consumer API (either in general
> or
> > for Java)?  I've seen various proposals but I don't see a recent one on
> the
> > actual implementation.  My team wants to start the development work to
> > migrate to 0.9.
> >
> > Thanks,
> > Cliff
> >
> > On Thu, Nov 5, 2015 at 11:18 AM, Jeff Holoman <jholo...@cloudera.com>
> > wrote:
> >
> > > Prabhjot,
> > >
> > > The answer changes slightly for the Producer and Consumer and depends
> on
> > > your timeline and comfort with using new APIs
> > >
> > > Today and in the future, for the Producer, you should be using the
> "new"
> > > producer, which isn't all that new anymore:
> > > org.apache.kafka.clients.producer.KafkaProducer;
> > >
> > >
> > > Today with 0.9 yet to be released you'd likely want to use the
> High-Level
> > > Consumer. This is covered in the official docs here:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > and
> > > in this blog post
> > >
> > >
> >
> http://ingest.tips/2014/10/12/kafka-high-level-consumer-frequently-missing-pieces/
> > > along
> > > with most of the other examples that you'll find.
> > >
> > > After .9 is released, I'd encourage you to take a look at the new
> > Consumer
> > > API. This has a lot of advantages in terms of offset management and
> will
> > be
> > > the only consumer client that fully supports security features like SSL
> > > that are slated to be released into the platform.
> > >
> > > Your choice of development language is entirely up to you. Note that
> the
> > > only version of clients that will be maintained in the project going
> > > forward are being implemented in Java, so Scala or Java shouldn't
> matter
> > > too much for you.
> > >
> > > Hope this helps
> > >
> > > Jeff
> > >
> > >
> > > On Thu, Nov 5, 2015 at 12:14 PM, Prabhjot Bharaj <
> prabhbha...@gmail.com>
> > > wrote:
> > >
> > > > Hello Folks,
> > > >
> > > > Requesting your expertise on this.
> > > > I see that under core/src/main/scala/kafka/producer/, there are many
> > > > implementations - Producer.scala and SyncProducer.scala
> > > >
> > > > Also, going via the producerPerformance.scala, there are 2
> > > implementations
> > > > - NewShinyProducer (which points to KafkaProducer.java) and the
> > > OldProducer
> > > >
> > > > Similar might be the case with Consumers, but I have not seen that
> yet.
> > > >
> > > > Please let me know which producer and consumer is supposed to be used
> > and
> > > > which ones will be phased out in future releases, so I can focus on
> > only
> > > 1
> > > > type of producer and consumer (high level as well as simple)
> > > >
> > > > Thanks,
> > > > Prabhjot
> > > >
> > > > Thanks,
> > > > Prabhjot
> > > >
> > > > On Thu, Nov 5, 2015 at 3:55 PM, Prabhjot Bharaj <
> prabhbha...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Adding users as well
> > > > >
> > > > > On Thu, Nov 5, 2015 at 3:37 PM, Prabhjot Bharaj <
> > prabhbha...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I'm using the latest update: 0.8.2.2
> > > > >> I would like to use the latest producer an

Re: Debug kafka code in intellij

2015-11-05 Thread Prabhjot Bharaj
Hello Folks,

Requesting your expertise on this

Thanks,
Prabhjot

On Thu, Nov 5, 2015 at 4:46 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hi,
>
> I'm using kafka 0.8.2.1 version with IntelliJ.
> Sometimes, I change the code and build it using this command:
>
> ./gradlew -PscalaVersion=2.11.7 releaseTarGz
>
> In some cases, I feel the need for debugging the code within IntelliJ.
>
> e.g. I, currently, want to debug the ConsumerOffsetCheker to see how it
> communicates with zookeeper.
>
> How can I do that with IntelliJ ??
>
> Currently, if I try and debug the file ConsumerOffsetChecker.scala in
> IntelliJ, I get this message:-
>
> Error:scalac: Output path
> /Users/pbharaj/Desktop/Dev/OpenSource4/Kafka-0.8.2.1/build is shared
> between: Module 'Kafka-0.8.2.1' production, Module 'Kafka-0.8.2.1' tests
> Please configure separate output paths to proceed with the compilation.
> TIP: you can use Project Artifacts to combine compiled classes if needed.
>
>
> Regards,
> Prabhjot
>
>


-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: Producer becomes slow over time

2015-09-29 Thread Prabhjot Bharaj
Hi Erik,

I've not tested it on a producer that is not a part of the kafka cluster.


*Producer configuration on all machines:-*

root@x.x.x.x:/a/kafka/config# cat producer.properties  | egrep -v '^#|^$'

metadata.broker.list=localhost:9092

producer.type=sync

compression.codec=none

serializer.class=kafka.serializer.DefaultEncoder


*Server configuration on all machines:-*

root@y.y.y.y:/a/kafka/config# cat server.properties  | egrep -v '^#|^$'

broker.id=0

port=9092

num.network.threads=6

num.io.threads=8

socket.send.buffer.bytes=10485760

socket.receive.buffer.bytes=10485760

socket.request.max.bytes=104857600

log.dirs=/tmp/kafka-logs

num.partitions=1

num.recovery.threads.per.data.dir=1

log.retention.hours=168

log.segment.bytes=1073741824

log.retention.check.interval.ms=30

log.cleaner.enable=false

zookeeper.connect=localhost:2181

zookeeper.connection.timeout.ms=6000

num.replica.fetchers=4

*Command used from both the machines (slow and fast producer):*

kafka-producer-perf-test.sh --broker-list
x.x.x.x:9092,y.y.y.y:9092,z.z.z.z:9092,a.a.a.a:9092,b.b.b.b:9092 --messages
1048576 --message-size 500 --topics part_1_repl_3_4 --show-detailed-stats
--threads 32 --request-num-acks 1 --batch-size 1000 --request-timeout-ms
1 --compression-codec 2 --reporting-interval 1000

Regards,

Prabhjot

On Thu, Sep 24, 2015 at 6:36 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> What happens when the new producer that is getting 70 MB/s is started on a
> machine that is not part of the kafka cluster?
>
> Can you include your topic description/configuration, producer
> configuration, and broker configuration?
>
> On 9/24/15, 1:44 AM, "Prabhjot Bharaj" <prabhbha...@gmail.com> wrote:
>
> >Hi,
> >
> >I would like to dig deep into this issue. I've changed log4j.properties
> >for
> >logging in ALL mode in all places. I am getting lost in the logs.
> >
> >Any pointers would be welcome
> >
> >Please let me know if you would need any information regarding this
> >
> >Thanks,
> >Prabhjot
> >
> >On Wed, Sep 23, 2015 at 6:46 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
> >wrote:
> >
> >> Hello Folks,
> >>
> >> I've noticed that 2 producer machines, that I had configured, have
> >>become
> >> very slow over time
> >> They are giving 17-19 MB/s
> >>
> >> But, a producer that I setup today is giving 70MB/s as the write
> >>throughput
> >>
> >> If I see the contents of bin, libs, config directories, nothing is
> >> different in the files on any of the producer machines.
> >>
> >> Producer is running on the kafka machines itself
> >>
> >> Request your expertise
> >>
> >> Regards,
> >> Prabhjot
> >>
> >>
> >>
> >
> >
> >--
> >-
> >"There are only 10 types of people in the world: Those who understand
> >binary, and those who don't"
>
>


-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: Producer becomes slow over time

2015-09-24 Thread Prabhjot Bharaj
Hi,

I would like to dig deep into this issue. I've changed log4j.properties for
logging in ALL mode in all places. I am getting lost in the logs.

Any pointers would be welcome

Please let me know if you would need any information regarding this

Thanks,
Prabhjot

On Wed, Sep 23, 2015 at 6:46 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hello Folks,
>
> I've noticed that 2 producer machines, that I had configured, have become
> very slow over time
> They are giving 17-19 MB/s
>
> But, a producer that I setup today is giving 70MB/s as the write throughput
>
> If I see the contents of bin, libs, config directories, nothing is
> different in the files on any of the producer machines.
>
> Producer is running on the kafka machines itself
>
> Request your expertise
>
> Regards,
> Prabhjot
>
>
>


-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Fwd: Useful metric to check slow ISR catchup

2015-09-22 Thread Prabhjot Bharaj
Hi Dev Folks,

Request your expertise on this doubt of mine

Thanks,
Prabhjot
-- Forwarded message --
From: Prabhjot Bharaj <prabhbha...@gmail.com>
Date: Mon, Sep 21, 2015 at 2:59 PM
Subject: Re: Useful metric to check slow ISR catchup
To: us...@kafka.apache.org


Hi,

Attaching a screenshot of bytes/sec from Ganglia

As you can see, the graph in RED color belongs to the third replica, for
which the bytes/sec is around 10 times lower than its 2 peers (in Green and
Blue)
Earlier, I was thinking that it could be related to that 1 system only, but
when I created a new topic with 1 partition and 3 replicas, I see similar
graph on the other set of machines.

I'm not sure what parameter could be causing this. Any pointers are
appreciated

Thanks,
Prabhjot

On Mon, Sep 21, 2015 at 1:20 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
wrote:

> Hello Folks,
>
> Request your expertise on this
>
> Thanks,
> Prabhjot
>
> On Fri, Sep 18, 2015 at 6:18 PM, Prabhjot Bharaj <prabhbha...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've noticed that 1 follower replica node out of my kafka cluster catches
>> up to the data form the leader pretty slowly.
>> My topic has just 1 partition with 3 replicas. One other follower replica
>> gets the full data from the leader pretty instantly
>>
>> It takes around 22 minutes to catch up 500MB of data.
>>
>> I have setup ganglia monitoring on my cluster and I'm interested in
>> knowing what metric exposed through JMX would be useful in checking the
>> reason for such slowness
>>
>> Thanks,
>> Prabhjot
>>
>
>
>
> --
> -
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: [jira] [Commented] (KAFKA-2569) Kafka should write its metrics to a Kafka topic

2015-09-22 Thread Prabhjot Bharaj
Hi,

I have setup a Ganglia host to get all the stats from my 5 node kafka
cluster. I run jmxtrans on Kafka nodes which export to Ganglia.
I have exposed all of the available stats to Ganglia and have grouped them
as under:-

Kafka Cluster Stats metrics (18)Kafka Consumer Stats metrics (208)Kafka
Controller Stats metrics (23)Kafka Log Stats metrics (72)Kafka Network
Stats metrics (1091)Kafka Server Stats metrics (144)cpu metrics (7)disk
metrics (3)jvmGC metrics (4)jvmheapmemory metrics (8)load metrics (3)memory
metrics (5)network metrics (4)process metrics (2)


This kind of setup helps me debug any issues very clearly.
Recently, I saw an issue on my cluster where I saw very slow ISR catchup on
the last replica. I was able to see that the bytes/sec input to the last
replica was around 10% of the other machines - including leader and the
other followers.

I believe having the current JMX is much better as people can handle these
stats in whichever way they want.
They could plot any stat vs any stat, but if we limit that to a kafka
topic, it will become more complex

On the other hand, like Cassandra exposes its stats in Cassandra as well,
if one Cassandra node is down, the other will still be able to report. But,
the reports will be limited to what Cassandra provides.
Having a generic way like JMX reporting allows greater flexibility

I would suggest to encourage people to contribute to easily setting up the
jmx monitoring pipelines, either using tools like Ganglia or Graphite or
Cacti. Something like Kafka clients, which are not maintained by kafka
committers.

I would love to hear from others on this

Regards,
Prabhjot

On Wed, Sep 23, 2015 at 9:22 AM, James Cheng (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/KAFKA-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903904#comment-14903904
> ]
>
> James Cheng commented on KAFKA-2569:
> 
>
> Yes, although I didn't realize that was a rare thing. I'm pretty new to
> the operations-side of Kafka.
>
> How do people typically monitor things with JMX metrics (like Kafka and
> Zookeeper)?
>
> > Kafka should write its metrics to a Kafka topic
> > ---
> >
> > Key: KAFKA-2569
> > URL: https://issues.apache.org/jira/browse/KAFKA-2569
> > Project: Kafka
> >  Issue Type: New Feature
> >Reporter: James Cheng
> >
> > Kafka is often used to hold and transport monitoring data.
> > In order to monitor Kafka itself, Kafka currently exposes many metrics
> via JMX, which require using a tool to pull the JMX metrics, and then write
> them to the monitoring system.
> > It would be convenient if Kafka could simply send its metrics to a Kafka
> topic. This would make most sense if the Kafka topic was in a different
> Kafka cluster, but could still be useful even if it was sent to a topic in
> the same Kafka cluster.
> > Of course, if sent to the same cluster, it would not be accessible if
> the cluster itself was down.
> > This would allow monitoring of Kafka itself without requiring people to
> set up their own JMX-to-monitoring-system pipelines.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: [ANNOUNCE] New Committer Sriharsha Chintalapani

2015-09-21 Thread Prabhjot Bharaj
Congratulations. It's inspiring for newbies like me

Regards,
Prabhjot
On Sep 22, 2015 10:30 AM, "Ashish Singh"  wrote:

> Congrats Harsha!
>
> On Monday, September 21, 2015, Manikumar Reddy 
> wrote:
>
> > congrats harsha!
> >
> > On Tue, Sep 22, 2015 at 9:48 AM, Dong Lin  > > wrote:
> >
> > > Congratulations Sriharsha!
> > >
> > > Dong
> > >
> > > On Tue, Sep 22, 2015 at 4:17 AM, Guozhang Wang  > > wrote:
> > >
> > > > Congrats Sriharsha!
> > > >
> > > > Guozhang
> > > >
> > > > On Mon, Sep 21, 2015 at 9:10 PM, Jun Rao  > > wrote:
> > > >
> > > > > I am pleased to announce that the Apache Kafka PMC has voted to
> > > > > invite Sriharsha Chintalapani as a committer and Sriharsha has
> > > accepted.
> > > > >
> > > > > Sriharsha has contributed numerous patches to Kafka. The most
> > > significant
> > > > > one is the SSL support.
> > > > >
> > > > > Please join me on welcoming and congratulating Sriharsha.
> > > > >
> > > > > I look forward to your continued contributions and much more to
> come!
> > > > >
> > > > > Jun
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>
>
> --
> Ashish h
>


Re: Contributor List

2015-09-07 Thread Prabhjot Bharaj
Hi,

Request you to add me as well for code contributions

Regards,
Prabhjot
On Sep 8, 2015 2:10 AM, "Gwen Shapira"  wrote:

> Done :)
>
> Happy hacking.
>
> On Mon, Sep 7, 2015 at 11:44 AM, Bill Bejeck  wrote:
>
> > Hi Just a reminder to add me to the contributors list! I've started
> looking
> > at KAFKA-2058,  My Jira username is bbejeck.
> >
> > Thanks!
> >
> > Bill
> >
> > On Fri, Sep 4, 2015 at 7:46 PM, Gwen Shapira  wrote:
> >
> > > Thank you!
> > >
> > > Please create a Jira user if you didn't do so already, and let me know
> > your
> > > user name. I'll add you to the list.
> > >
> > > On Fri, Sep 4, 2015 at 4:33 PM, Bill Bejeck  wrote:
> > >
> > > > Hi can i get added to the contributor list? I'd like to take crack at
> > > > KAFKA-2058 
> > > >
> > > > Thanks!
> > > >
> > > > Bill Bejeck
> > > >
> > >
> >
>


Re: Contributor List

2015-09-07 Thread Prabhjot Bharaj
My username: pbharaj

Thanks,
Prabhjot
On Sep 8, 2015 10:14 AM, "Prabhjot Bharaj" <prabhbha...@gmail.com> wrote:

> Hi,
>
> Request you to add me as well for code contributions
>
> Regards,
> Prabhjot
> On Sep 8, 2015 2:10 AM, "Gwen Shapira" <g...@confluent.io> wrote:
>
>> Done :)
>>
>> Happy hacking.
>>
>> On Mon, Sep 7, 2015 at 11:44 AM, Bill Bejeck <bbej...@gmail.com> wrote:
>>
>> > Hi Just a reminder to add me to the contributors list! I've started
>> looking
>> > at KAFKA-2058,  My Jira username is bbejeck.
>> >
>> > Thanks!
>> >
>> > Bill
>> >
>> > On Fri, Sep 4, 2015 at 7:46 PM, Gwen Shapira <g...@confluent.io> wrote:
>> >
>> > > Thank you!
>> > >
>> > > Please create a Jira user if you didn't do so already, and let me know
>> > your
>> > > user name. I'll add you to the list.
>> > >
>> > > On Fri, Sep 4, 2015 at 4:33 PM, Bill Bejeck <bbej...@gmail.com>
>> wrote:
>> > >
>> > > > Hi can i get added to the contributor list? I'd like to take crack
>> at
>> > > > KAFKA-2058 <https://issues.apache.org/jira/browse/KAFKA-2058>
>> > > >
>> > > > Thanks!
>> > > >
>> > > > Bill Bejeck
>> > > >
>> > >
>> >
>>
>


Re: Hello!

2015-09-03 Thread Prabhjot Bharaj
Hi Sudhanshu,

You can go through the details mentioned here:
http://kafka.apache.org/contributing.html

Also, in case you need to setup your IDE, you can refer to these links:-

http://www.lewuathe.com/blog/2014/10/16/build-apache-kafka-with-intellij-idea/
https://cwiki.apache.org/confluence/display/KAFKA/Eclipse-Scala-Gradle-Git+Developement+Environment+Setup

These links were very useful for me

Also, I am in Bangalore. if you want, we can together start on the
understanding part

Regards,
Prabhjot

On Thu, Sep 3, 2015 at 4:50 PM, Sudhanshu 
wrote:

> Hello Developers,
>
> I am Sudhanshu Gupta, software engineer working in Bangalore, India. We are
> using kafka in production and its fantastic.
>
> I am willing to contribute to the project. I have forked and cloned the
> kafka repo in my local. I want to get start on it. It would be great if
> anyone from the community can help to understand the codebase and let me
> fix some issues.
>
> My Jira id: sudhanshu-gupta
>
> --
> Regards,
> Sudhanshu
>



-- 
-
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"


Re: kafka producer-perf-test.sh compression-codec not working

2015-08-25 Thread Prabhjot Bharaj
Hi Erik,
Thanks for these inputs
I'll implement it

Regards,
Prabhjot
On Aug 25, 2015 11:53 PM, Helleren, Erik erik.helle...@cmegroup.com
wrote:

 Prabhjot,
 You can’t do it with producer perf test, but its relatively simple to
 implement. The message body includes a timestamp of when your producer
 produces, and the consumer looks at the difference between between the
 timestamp in the body and the current timestamp.

 Or, if you were looking for ack latency, you can use the producer’s async
 callback to measure latency.
 -Erik

 From: Prabhjot Bharaj prabhbha...@gmail.commailto:prabhbha...@gmail.com
 
 Date: Tuesday, August 25, 2015 at 9:22 AM
 To: Erik Helleren erik.helle...@cmegroup.commailto:
 erik.helle...@cmegroup.com
 Cc: us...@kafka.apache.orgmailto:us...@kafka.apache.org 
 us...@kafka.apache.orgmailto:us...@kafka.apache.org, 
 dev@kafka.apache.orgmailto:dev@kafka.apache.org dev@kafka.apache.org
 mailto:dev@kafka.apache.org
 Subject: Re: kafka producer-perf-test.sh compression-codec not working

 Hi Erik,

 Thanks for your inputs.

 How can we measure round trip latency using kafka-producer-perf-test.sh ?

 or any other tool ?

 Regards,
 Prabhjot

 On Tue, Aug 25, 2015 at 7:41 PM, Helleren, Erik 
 erik.helle...@cmegroup.commailto:erik.helle...@cmegroup.com wrote:
 Prabhjot,
 When no compression is being used, it should have only a tiny impact on
 performance.  But when it is enabled it will make it as though the message
 payload is small and nearly constant, regardless as to how large the
 configured message size is.

 I think that the answer is that this is room for improvement in the perf
 test, especially where compression is concerned.  If you do implement an
 improvement, a patch might be helpful to the community.  But something to
 consider is that threwput alone isn’t the only important performance
 measure.   Round trip latency is also important.
 Thanks,
 -Erik


 From: Prabhjot Bharaj prabhbha...@gmail.commailto:prabhbha...@gmail.com
 mailto:prabhbha...@gmail.commailto:prabhbha...@gmail.com
 Date: Tuesday, August 25, 2015 at 8:41 AM
 To: Erik Helleren erik.helle...@cmegroup.commailto:
 erik.helle...@cmegroup.commailto:erik.helle...@cmegroup.commailto:
 erik.helle...@cmegroup.com
 Cc: us...@kafka.apache.orgmailto:us...@kafka.apache.orgmailto:
 us...@kafka.apache.orgmailto:us...@kafka.apache.org 
 us...@kafka.apache.orgmailto:us...@kafka.apache.orgmailto:
 us...@kafka.apache.orgmailto:us...@kafka.apache.org, 
 dev@kafka.apache.orgmailto:dev@kafka.apache.orgmailto:
 dev@kafka.apache.orgmailto:dev@kafka.apache.org dev@kafka.apache.org
 mailto:dev@kafka.apache.orgmailto:dev@kafka.apache.orgmailto:
 dev@kafka.apache.org
 Subject: Re: kafka producer-perf-test.sh compression-codec not working

 Hi Erik,

 I have put my efforts on the produce side till now, Thanks for making me
 aware that consumer will decompress automatically.

 I'll also consider your point on creating real-life messages

 But, I have still have one confusion -

 Why would the current ProducerPerformance.scala compress an Array of Bytes
 with all zeros ?
 That will anyways give better throughput. correct ?

 Regards,
 Prabhjot

 On Tue, Aug 25, 2015 at 7:05 PM, Helleren, Erik 
 erik.helle...@cmegroup.commailto:erik.helle...@cmegroup.commailto:
 erik.helle...@cmegroup.commailto:erik.helle...@cmegroup.com wrote:
 Hi Prabhjot,
 There are two important things to know about kafka compression:  First
 uncompression happens automatically in the consumer
 (https://cwiki.apache.org/confluence/display/KAFKA/Compression) so you
 should see ascii returned on the consumer side. The best way to see if
 compression has happened that I know of is to actually look at a packet
 capture.

 Second, the producer does not compress individual messages, but actually
 batches several sequential messages to the same topic and partition
 together and compresses that compound message.
 (
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pro
 tocol#AGuideToTheKafkaProtocol-Compression) Thus, a fixed string will
 still see far better compression ratios than a Œtypical' real life
 message.

 Making a real-life-like message isn¹t easy, and depends heavily on your
 domain. But a general approach would be to generate messages by randomly
 selected words from a dictionary.  And having a dictionary around thousand
 large words means there is a reasonable chance of the same words appearing
 multiple times in the same message.  Also words can be non-sence like
 ³asdfasdfasdfasdf², or large words in the language of your choice.  The
 goal is for each message to be unique, but still have similar chunks that
 a compression algorithm can detect and compress.

 -Erik


 On 8/25/15, 6:47 AM, Prabhjot Bharaj prabhbha...@gmail.commailto:
 prabhbha...@gmail.commailto:prabhbha...@gmail.commailto:
 prabhbha...@gmail.com wrote:

 Hi,
 
 I have bene trying to use kafka-producer-perf-test.sh to arrive at certain
 benchmarks.
 When I try to run

kafka producer-perf-test.sh compression-codec not working

2015-08-25 Thread Prabhjot Bharaj
Hi,

I have bene trying to use kafka-producer-perf-test.sh to arrive at certain
benchmarks.
When I try to run it with --compression-codec values of 1, 2 and 3, I
notice increased throughput compared to NoCompressionCodec

But, When I checked the Producerperformance.scala, I saw that the the
`producer.send` is getting data from the method: `generateProducerData`.
But, this data is just an empty array of Bytes.

Now, as per my basic understanding of compression algorithms, I think a
byte sequence of zeros will eventually result in a very small message,
because of which I thought I might be observing better throughput.

So, in line: 247 of ProducerPerformance.scala, I did this minor code
change:-



*val message = 
qopwr11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
71Title 
10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
71Title 
10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
71Title 
10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-1-1-1-1-1-1-111-1-1-message.getBytes().slice(0,msgSize)*


This makes sure that I have a big message, and I can slice that
message to the message size passed in the command line options


But, the problem is that when I try running the same with
--compression-codec vlues of 1, 2 or 3, I still am seeing ASCII data
(i.e. uncompressed one only)


I want to ask whether this is a bug. And, using
kafka-producer-perf-test.sh, how can I send my own compressed data ?


Thanks,

Prabhjot


Re: kafka producer-perf-test.sh compression-codec not working

2015-08-25 Thread Prabhjot Bharaj
Hi Erik,

I have put my efforts on the produce side till now, Thanks for making me
aware that consumer will decompress automatically.

I'll also consider your point on creating real-life messages

But, I have still have one confusion -

Why would the current ProducerPerformance.scala compress an Array of Bytes
with all zeros ?
That will anyways give better throughput. correct ?

Regards,
Prabhjot

On Tue, Aug 25, 2015 at 7:05 PM, Helleren, Erik erik.helle...@cmegroup.com
wrote:

 Hi Prabhjot,
 There are two important things to know about kafka compression:  First
 uncompression happens automatically in the consumer
 (https://cwiki.apache.org/confluence/display/KAFKA/Compression) so you
 should see ascii returned on the consumer side. The best way to see if
 compression has happened that I know of is to actually look at a packet
 capture.

 Second, the producer does not compress individual messages, but actually
 batches several sequential messages to the same topic and partition
 together and compresses that compound message.
 (
 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pro
 tocol#AGuideToTheKafkaProtocol-Compression) Thus, a fixed string will
 still see far better compression ratios than a Œtypical' real life
 message.

 Making a real-life-like message isn¹t easy, and depends heavily on your
 domain. But a general approach would be to generate messages by randomly
 selected words from a dictionary.  And having a dictionary around thousand
 large words means there is a reasonable chance of the same words appearing
 multiple times in the same message.  Also words can be non-sence like
 ³asdfasdfasdfasdf², or large words in the language of your choice.  The
 goal is for each message to be unique, but still have similar chunks that
 a compression algorithm can detect and compress.

 -Erik


 On 8/25/15, 6:47 AM, Prabhjot Bharaj prabhbha...@gmail.com wrote:

 Hi,
 
 I have bene trying to use kafka-producer-perf-test.sh to arrive at certain
 benchmarks.
 When I try to run it with --compression-codec values of 1, 2 and 3, I
 notice increased throughput compared to NoCompressionCodec
 
 But, When I checked the Producerperformance.scala, I saw that the the
 `producer.send` is getting data from the method: `generateProducerData`.
 But, this data is just an empty array of Bytes.
 
 Now, as per my basic understanding of compression algorithms, I think a
 byte sequence of zeros will eventually result in a very small message,
 because of which I thought I might be observing better throughput.
 
 So, in line: 247 of ProducerPerformance.scala, I did this minor code
 change:-
 
 
 
 *val message =
 qopwr11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1-
 1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csm
 il/bitrate=11subcategory
 71Title
 10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
 1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a
 n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M
 ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
 71Title
 10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
 1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a
 n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M
 ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
 71Title
 10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
 1-1-1-1-1-1-111-1-1-message.getBytes().slice(0,msgSize)*
 
 
 This makes sure that I have a big message, and I can slice that
 message to the message size passed in the command line options
 
 
 But, the problem is that when I try running the same with
 --compression-codec vlues of 1, 2 or 3, I still am seeing ASCII data
 (i.e. uncompressed one only)
 
 
 I want to ask whether this is a bug. And, using
 kafka-producer-perf-test.sh, how can I send my own compressed data ?
 
 
 Thanks,
 
 Prabhjot




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


Re: Best practices - Using kafka (with http server) as source-of-truth

2015-07-30 Thread Prabhjot Bharaj
Hi Ewen,

Thanks for your response. I'll experiment and benchmark it with the normal
proxy and NGinx as well and update the results.

Regards,
prabcs

On Mon, Jul 27, 2015 at 11:10 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 Hi Prabhjot,

 Confluent has a REST proxy with docs that may give some guidance:
 http://docs.confluent.io/1.0/kafka-rest/docs/intro.html The new producer
 that it uses is very efficient, so you should be able to get pretty good
 throughput. You take a bit of a hit due to the overhead of sending data
 through a proxy, but with appropriate batching you can get about 2/3 the
 performance as you would get using the Java producer directly.

 There are also a few other proxies you can find here:
 https://cwiki.apache.org/confluence/display/KAFKA/Clients#Clients-HTTPREST

 You can also put nginx (or HAProxy, or a variety of other solutions) in
 front of REST proxies for load balancing, HA, SSL termination, etc. This is
 yet another hop, so it might affect throughput and latency.

 -Ewen

 On Mon, Jul 27, 2015 at 6:55 AM, Prabhjot Bharaj prabhbha...@gmail.com
 wrote:

  Hi Folks,
 
  I would like to understand the best practices when using kafka as the
  source-of-truth, given the fact that I want to pump in data to Kafka
 using
  http methods.
 
  What are the current production configurations for such a use case:-
 
  1. Kafka-http-client - is it scalable the way Nginx is ??
  2. Using Kafka and Nginx together - If anybody has used this, please
  explain
  3. Any other scalable method ?
 
  Regards,
  prabcs
 



 --
 Thanks,
 Ewen




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


Re: Connection to zk shell on Kafka

2015-07-29 Thread Prabhjot Bharaj
Sure. It would be great if you could as well explain the reason why the
absence of the jar creates this problem

Also, I'm surprised that zookeeper that comes bundled with kafka 0.8.2 does
not have the jline jar

Regards,
prabcs

On Wed, Jul 29, 2015 at 10:45 PM, Chris Barlock barl...@us.ibm.com wrote:

 You need the jline JAR file that ships with ZooKeeper.

 Chris

 IBM Tivoli Systems
 Research Triangle Park, NC
 (919) 224-2240
 Internet:  barl...@us.ibm.com



 From:   Prabhjot Bharaj prabhbha...@gmail.com
 To: us...@kafka.apache.org, u...@zookeeper.apache.org
 Date:   07/29/2015 01:13 PM
 Subject:Connection to zk shell on Kafka



 Hi folks,

 */kafka/bin# ./zookeeper-shell.sh localhost:2182/*

 *Connecting to localhost:2182/*

 *Welcome to ZooKeeper!*

 *JLine support is disabled*


  *WATCHER::*


  *WatchedEvent state:SyncConnected type:None path:null*


 *The shell never says connected*

 I'm running 5 node zookeeper cluster on 5-node kafka cluster (each kafka
 broker has 1 zookeeper server running)

 When I try connecting to the shell, the shell never says 'Connected'



 However, if I try connecting on another standalone zookeeper  which has no
 links to kafka, I'm able to connect:-


 */kafka/bin# /zookeeper/scripts/zkCli.sh -server 127.0.0.1:2181
 http://127.0.0.1:2181*

 *Connecting to 127.0.0.1:2181 http://127.0.0.1:2181*

 *Welcome to ZooKeeper!*

 *JLine support is enabled*


 *WATCHER::*


 *WatchedEvent state:SyncConnected type:None path:null*

 *[zk: 127.0.0.1:2181(CONNECTED) 0]*

 Am I missing something?


 Thanks,

 prabcs




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread Prabhjot Bharaj
Sure. I would be doing that.
I have seen that if I have 5-7 topics with 256 partitions each on a machine
with 4CPUs, 8GB RAM, the jvm crashes with OutOfMemoryError
And, this happens in many machines in the cluster. (I'll update the exact
number as well)

I was wondering how I could tune the JVM to its limits, for handling such
scenario.

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 12:27 PM, Darion Yaphet darion.yap...@gmail.com
wrote:

 Kafka store it meta data in Zookeeper Cluster so evaluate how many total
 number of topics and partitions can be created in a cluster   maybe same
 as to test Zookeeper's expansibility  and disk IO performance .

 2015-07-28 13:51 GMT+08:00 Prabhjot Bharaj prabhbha...@gmail.com:

  Hi,
 
  I'm looking forward to a benchmark which can explain how many total
 number
  of topics and partitions can be created in a cluster of n nodes, given
 the
  message size varies between x and y bytes and how does it vary with
 varying
  heap sizes and how it affects the system performance.
 
  e.g. the result should look like: t topics with p partitions each can be
  supported in a cluster of n nodes with a heap size of h MB, before the
  cluster sees things like JVM crashes or high mem usage or system slowdown
  etc.
 
  I think such benchmarks must exist so that we can make better decisions
 on
  ops side
  If these details dont exist, I'll be doing this test myself on varying
 the
  values of parameters described above. I would be happy to share the
 numbers
  with the community
 
  Thanks,
  prabcs
 



 --

 long is the way and hard  that out of Hell leads up to light




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread Prabhjot Bharaj
@Jiefu Gong,

Are the results of your tests available publicly?

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:35 PM, Prabhjot Bharaj prabhbha...@gmail.com
wrote:

 I would be using the servers available at my place of work. I dont have
 access to AWS servers. I would starting off with a small number of nodes in
 the cluster and then plot a graph with x-axis as the number of servers in
 the cluster and y-axis as the number of topics with partitions, before the
 cluster gives up.

 I need 1 help here: What parameters should I keep in mind for tuning the
 JVM, if I have to see best performance ?
 My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
 Striped)

 Regards,
 Prabhjot

 On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG jg...@berkeley.edu wrote:

 I think these would definitely be useful statistics to have and I've tried
 to do similar tests! The biggest difference is probably going to be the
 hardware specs on whatever cluster you decide to run it on. Maybe
 benchmarks performed on different AWS servers would be helpful too, but
 I'd
 like to see these!

 On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj prabhbha...@gmail.com
 wrote:

  Hi,
 
  I'm looking forward to a benchmark which can explain how many total
 number
  of topics and partitions can be created in a cluster of n nodes, given
 the
  message size varies between x and y bytes and how does it vary with
 varying
  heap sizes and how it affects the system performance.
 
  e.g. the result should look like: t topics with p partitions each can be
  supported in a cluster of n nodes with a heap size of h MB, before the
  cluster sees things like JVM crashes or high mem usage or system
 slowdown
  etc.
 
  I think such benchmarks must exist so that we can make better decisions
 on
  ops side
  If these details dont exist, I'll be doing this test myself on varying
 the
  values of parameters described above. I would be happy to share the
 numbers
  with the community
 
  Thanks,
  prabcs
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427




 --
 -
 There are only 10 types of people in the world: Those who understand
 binary, and those who don't




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


Re: Number of kafka topics/partitions supported per cluster of n nodes

2015-07-28 Thread Prabhjot Bharaj
I would be using the servers available at my place of work. I dont have
access to AWS servers. I would starting off with a small number of nodes in
the cluster and then plot a graph with x-axis as the number of servers in
the cluster and y-axis as the number of topics with partitions, before the
cluster gives up.

I need 1 help here: What parameters should I keep in mind for tuning the
JVM, if I have to see best performance ?
My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
Striped)

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG jg...@berkeley.edu wrote:

 I think these would definitely be useful statistics to have and I've tried
 to do similar tests! The biggest difference is probably going to be the
 hardware specs on whatever cluster you decide to run it on. Maybe
 benchmarks performed on different AWS servers would be helpful too, but I'd
 like to see these!

 On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj prabhbha...@gmail.com
 wrote:

  Hi,
 
  I'm looking forward to a benchmark which can explain how many total
 number
  of topics and partitions can be created in a cluster of n nodes, given
 the
  message size varies between x and y bytes and how does it vary with
 varying
  heap sizes and how it affects the system performance.
 
  e.g. the result should look like: t topics with p partitions each can be
  supported in a cluster of n nodes with a heap size of h MB, before the
  cluster sees things like JVM crashes or high mem usage or system slowdown
  etc.
 
  I think such benchmarks must exist so that we can make better decisions
 on
  ops side
  If these details dont exist, I'll be doing this test myself on varying
 the
  values of parameters described above. I would be happy to share the
 numbers
  with the community
 
  Thanks,
  prabcs
 



 --

 Jiefu Gong
 University of California, Berkeley | Class of 2017
 B.A Computer Science | College of Letters and Sciences

 jg...@berkeley.edu elise...@berkeley.edu | (925) 400-3427




-- 
-
There are only 10 types of people in the world: Those who understand
binary, and those who don't


KAfka Mirror Maker

2015-07-28 Thread Prabhjot Bharaj
Hi,

I'm using Mirror Maker with a cluster of 3 nodes and cluster of 5 nodes.

I would like to ask - is the number of nodes a restriction for Mirror Maker?
Also, are there any other restrictions or properties that should be common
across both the clusters so that they continue mirroring.


I'm asking this because I've got this error while mirroring:-

[2015-07-28 17:51:10,943] WARN Fetching topic metadata with correlation id
0 for topics [Set(fromIndiaWithLove)] from broker
[id:3,host:a10.2.3.4,port:9092] failed (kafka.client.ClientUtils$)

java.nio.channels.ClosedChannelException

at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

[2015-07-28 17:51:18,955] WARN Fetching topic metadata with correlation id
0 for topics [Set(fromIndiaWithLove)] from broker
[id:2,host:10.2.3.5,port:9092] failed (kafka.client.ClientUtils$)

java.nio.channels.ClosedChannelException

at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)

[2015-07-28 17:51:27,043] WARN Fetching topic metadata with correlation id
0 for topics [Set(fromIndiaWithLove)] from broker
[id:5,host:a10.2.3.6port:9092] failed (kafka.client.ClientUtils$)

java.nio.channels.ClosedChannelException

at kafka.network.BlockingChannel.send(BlockingChannel.scala:100)

at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73)

at
kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72)

at kafka.producer.SyncProducer.send(SyncProducer.scala:113)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58)

at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93)

at
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)

at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60)



This is what my *consumer config* looks like:-

*zookeeper.connect=10.2.3.4:2182 http://10.2.3.4:2182*

*zookeeper.connection.timeout.ms
http://zookeeper.connection.timeout.ms=100*

*consumer.timeout.ms http://consumer.timeout.ms=-1*

*group.id http://group.id=dp-mirrorMaker-test-datap1*

*shallow.iterator.enable=true*

*auto.create.topics.enable=true*



I've used the default* producer.properties* in kafka/config/ which has
these properteis:-

*metadata.broker.list=localhost:9092*


*producer.type=sync*

*compression.codec=none*


*serializer.class=kafka.serializer.DefaultEncoder*


I'm running Mirror Maker via this command:-


 /kafka_2.10-0.8.2.0/bin/kafka-run-class.sh kafka.tools.MirrorMaker
--consumer.config ~/sourceCluster1Consumer.config  --num.streams 1
--producer.config producer.properties --whitelist=.*

Regards,

prabcs


Number of kafka topics/partitions supported per cluster of n nodes

2015-07-27 Thread Prabhjot Bharaj
Hi,

I'm looking forward to a benchmark which can explain how many total number
of topics and partitions can be created in a cluster of n nodes, given the
message size varies between x and y bytes and how does it vary with varying
heap sizes and how it affects the system performance.

e.g. the result should look like: t topics with p partitions each can be
supported in a cluster of n nodes with a heap size of h MB, before the
cluster sees things like JVM crashes or high mem usage or system slowdown
etc.

I think such benchmarks must exist so that we can make better decisions on
ops side
If these details dont exist, I'll be doing this test myself on varying the
values of parameters described above. I would be happy to share the numbers
with the community

Thanks,
prabcs


Best practices - Using kafka (with http server) as source-of-truth

2015-07-27 Thread Prabhjot Bharaj
Hi Folks,

I would like to understand the best practices when using kafka as the
source-of-truth, given the fact that I want to pump in data to Kafka using
http methods.

What are the current production configurations for such a use case:-

1. Kafka-http-client - is it scalable the way Nginx is ??
2. Using Kafka and Nginx together - If anybody has used this, please explain
3. Any other scalable method ?

Regards,
prabcs


Zookeeper use cases with Kafka

2015-07-22 Thread Prabhjot Bharaj
Hello Folks,

I wish to contribute to Kafka internals. And, one of the things which can
help me do that is understanding how kafka uses zookeeper. I have some of
these basic doubts:-

1. Is zookeeper primarily used for locking ? If yes, in what cases and what
kind of nodes does it use - sequential/ephemeral?

2. Does kafka use zookeeper watches for any of functions ?

3. What kind of state is stored in Zookeeper ? (I believe it has to be the
leader information per partition, but is there anything apart from it?)
What is the scale of data that is stored in Zookeeper ?

Looking forward for your help.

Thanks,
prabcs