Re: macbook air and kafka

2016-05-28 Thread S Ahmed
What kind of macbook would you guys recommend these days then?

minimum requirements vs. ideal

Does a i5 vs i7 make that much of a difference?

On Thu, May 26, 2016 at 6:25 PM, Gwen Shapira  wrote:

> Well...
>
> We added KafkaConnect and KafkaStreams, thats two fairly big features.
>
> On Thu, May 26, 2016 at 11:58 AM, S Ahmed  wrote:
>
> > I just pulled lated on the same old 2010 MPB and the build took over 4
> > minutes.
> >
> > Have things changed so much since 2013? :)
> >
> > I ran: ./gradlew jar
> >
> > On Tue, Jun 4, 2013 at 7:24 PM, Neha Narkhede 
> > wrote:
> >
> > > *Memory*  8 GB 1600 MHz DDR3
> > >
> > > *Processor*  2 GHz Intel Core i7
> > >
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > >
> > > On Tue, Jun 4, 2013 at 5:56 AM, S Ahmed  wrote:
> > >
> > > > I have a 1st gen i7 wit 8GB ram, and mine takes 77 seconds:
> > > >
> > > > >clean
> > > > >++2.9.2 package
> > > >
> > > > info] Packaging
> > > >
> > > >
> > >
> >
> /Users/abc/dev/sources/scala/kafka_test/contrib/hadoop-consumer/target/hadoop-consumer-0.8-SNAPSHOT.jar
> > > > ...
> > > > [info] Done packaging.
> > > > [info] Done packaging.
> > > > [info] Packaging
> > > >
> > > >
> > >
> >
> /Users/snad/dev/sources/scala/kafka_test/perf/target/scala-2.9.2/kafka-perf_2.9.2-0.8-SNAPSHOT.jar
> > > > ...
> > > > [info] Done packaging.
> > > > [success] Total time: 77 s, completed Jun 4, 2013 8:54:51 AM
> > > >
> > > >
> > > > On Mon, Jun 3, 2013 at 10:37 PM, Denny Lee 
> > > wrote:
> > > >
> > > > > I have a 2011 MacBook Air (4GB, i7, 256GB SSD) and for development
> > > > > purposes it works like a charm. I'm only doing simple tests (ie no
> > perf
> > > > > testing) and at least in my case, heat is pretty low. I have more
> > > issues
> > > > > with heat when I run Windows on Fusion ;-)
> > > > >
> > > > >
> > > > > On Jun 3, 2013, at 7:17 PM, S Ahmed  wrote:
> > > > >
> > > > > > So what are your specs then?
> > > > > >
> > > > > > So would you say it is actually enjoyable to develop on it?  Or
> its
> > > not
> > > > > > really that ideal compared to a desktop.
> > > > > > What about the heat?
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 3, 2013 at 9:54 PM, Neha Narkhede <
> > > neha.narkh...@gmail.com
> > > > > >wrote:
> > > > > >
> > > > > >> It really depends on your air specs. It takes roughly a minute
> to
> > > > > compile
> > > > > >> from scratch on my macbook air.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Neha
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Jun 3, 2013 at 6:33 PM, S Ahmed 
> > > wrote:
> > > > > >>
> > > > > >>> Hi, Curious if anyone uses a macbook air to develop with kafka.
> > > > > >>>
> > > > > >>> How are compile times?  Does the fan go crazy after a while
> along
> > > > with
> > > > > a
> > > > > >>> fairly hot keyboard?
> > > > > >>>
> > > > > >>> Just pondering getting an macbookair, and wanted your
> > experiences.
> > > > > >>
> > > > >
> > > >
> > >
> >
>


Re: macbook air and kafka

2016-05-26 Thread S Ahmed
I just pulled lated on the same old 2010 MPB and the build took over 4
minutes.

Have things changed so much since 2013? :)

I ran: ./gradlew jar

On Tue, Jun 4, 2013 at 7:24 PM, Neha Narkhede 
wrote:

> *Memory*  8 GB 1600 MHz DDR3
>
> *Processor*  2 GHz Intel Core i7
>
>
> Thanks,
> Neha
>
>
>
> On Tue, Jun 4, 2013 at 5:56 AM, S Ahmed  wrote:
>
> > I have a 1st gen i7 wit 8GB ram, and mine takes 77 seconds:
> >
> > >clean
> > >++2.9.2 package
> >
> > info] Packaging
> >
> >
> /Users/abc/dev/sources/scala/kafka_test/contrib/hadoop-consumer/target/hadoop-consumer-0.8-SNAPSHOT.jar
> > ...
> > [info] Done packaging.
> > [info] Done packaging.
> > [info] Packaging
> >
> >
> /Users/snad/dev/sources/scala/kafka_test/perf/target/scala-2.9.2/kafka-perf_2.9.2-0.8-SNAPSHOT.jar
> > ...
> > [info] Done packaging.
> > [success] Total time: 77 s, completed Jun 4, 2013 8:54:51 AM
> >
> >
> > On Mon, Jun 3, 2013 at 10:37 PM, Denny Lee 
> wrote:
> >
> > > I have a 2011 MacBook Air (4GB, i7, 256GB SSD) and for development
> > > purposes it works like a charm. I'm only doing simple tests (ie no perf
> > > testing) and at least in my case, heat is pretty low. I have more
> issues
> > > with heat when I run Windows on Fusion ;-)
> > >
> > >
> > > On Jun 3, 2013, at 7:17 PM, S Ahmed  wrote:
> > >
> > > > So what are your specs then?
> > > >
> > > > So would you say it is actually enjoyable to develop on it?  Or its
> not
> > > > really that ideal compared to a desktop.
> > > > What about the heat?
> > > >
> > > >
> > > > On Mon, Jun 3, 2013 at 9:54 PM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > > >
> > > >> It really depends on your air specs. It takes roughly a minute to
> > > compile
> > > >> from scratch on my macbook air.
> > > >>
> > > >> Thanks,
> > > >> Neha
> > > >>
> > > >>
> > > >> On Mon, Jun 3, 2013 at 6:33 PM, S Ahmed 
> wrote:
> > > >>
> > > >>> Hi, Curious if anyone uses a macbook air to develop with kafka.
> > > >>>
> > > >>> How are compile times?  Does the fan go crazy after a while along
> > with
> > > a
> > > >>> fairly hot keyboard?
> > > >>>
> > > >>> Just pondering getting an macbookair, and wanted your experiences.
> > > >>
> > >
> >
>


Re: steps path to kafka mastery

2016-03-29 Thread S Ahmed
The book says Feb. 2016 release date, looks like the producer is slower
than the consumers  (sorry couldn't resist)

On Tue, Mar 29, 2016 at 11:56 AM, S Ahmed  wrote:

> It was some pakt pub one.
>
> Yeah I am waiting for that book to be released!
>
> On Tue, Mar 29, 2016 at 6:11 AM, Ben Stopford  wrote:
>
>> Not sure which book you read but, based on the first few chapters, this
>> book <http://shop.oreilly.com/product/0636920044123.do> is an worthy
>> investment.
>>
>> B
>> > On 29 Mar 2016, at 03:40, S Ahmed  wrote:
>> >
>> > Hello,
>> >
>> > This may be a silly question for some but here goes :)
>> >
>> > Without real production experience, what steps do you suggest one take
>> to
>> > really have some solid skillz in kafka?
>> >
>> > I tend to learn in a structured way, but it just seems that since kafka
>> is
>> > a general purpose tool there isn't really a course per say that teaches
>> you
>> > all things kafka.
>> >
>> > There are books but the one I read was more of a tutorial on certain
>> > aspects of kafka but it doesn't give you the insights of building a real
>> > production system.
>> >
>> > Any suggestions or tips?
>>
>>
>


Re: steps path to kafka mastery

2016-03-29 Thread S Ahmed
It was some pakt pub one.

Yeah I am waiting for that book to be released!

On Tue, Mar 29, 2016 at 6:11 AM, Ben Stopford  wrote:

> Not sure which book you read but, based on the first few chapters, this
> book <http://shop.oreilly.com/product/0636920044123.do> is an worthy
> investment.
>
> B
> > On 29 Mar 2016, at 03:40, S Ahmed  wrote:
> >
> > Hello,
> >
> > This may be a silly question for some but here goes :)
> >
> > Without real production experience, what steps do you suggest one take to
> > really have some solid skillz in kafka?
> >
> > I tend to learn in a structured way, but it just seems that since kafka
> is
> > a general purpose tool there isn't really a course per say that teaches
> you
> > all things kafka.
> >
> > There are books but the one I read was more of a tutorial on certain
> > aspects of kafka but it doesn't give you the insights of building a real
> > production system.
> >
> > Any suggestions or tips?
>
>


RE: steps path to kafka mastery

2016-03-28 Thread S Ahmed
Hello,

This may be a silly question for some but here goes :)

Without real production experience, what steps do you suggest one take to
really have some solid skillz in kafka?

I tend to learn in a structured way, but it just seems that since kafka is
a general purpose tool there isn't really a course per say that teaches you
all things kafka.

There are books but the one I read was more of a tutorial on certain
aspects of kafka but it doesn't give you the insights of building a real
production system.

Any suggestions or tips?


kafka used for a chat system backend

2015-11-06 Thread S Ahmed
Hello,

I have read a few sites that *might* be using kafka for a real-time chat
system.

I say might because in their job descriptions they hinted toward this.

If there are people out there using kafka as a backend for a real-time chat
system, can you explain at a high level how the data flows?


Thanks!


jvm processes to consume messages

2014-12-02 Thread S Ahmed
Hi,

I have a light load scenerio but I am starting off with kafka because I
like how the messages are durable etc.

If I have 4-5 topics, am I required to create the same # of consumers?  I
am assuming each consumer runs in a long-running jvm process correct?


Are there any consumer examples that use java, and also have the startup
scripts to start/stop the process on an ubuntu server?


Re: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread S Ahmed
I want kafka features (w/o the redundancy) but don't want to have to run 3
zookeeper instances to save $$.

On Thu, Oct 9, 2014 at 2:59 PM, Jun Rao  wrote:

> This may not be easy since you have to implement things like watcher
> callbacks. What's your main concern with the ZK dependency?
>
> Thanks,
>
> Jun
>
> On Thu, Oct 9, 2014 at 8:20 AM, S Ahmed  wrote:
>
> > Hi,
> >
> > I was wondering if the zookeeper library (zkutils.scala etc) was designed
> > in a more modular way, would it make it possible to run a more "lean"
> > version of kafka?
> >
> > The idea is I want to run kafka but with a less emphasis on it being
> > durable with failover and more on it being a replacement for a standard
> > queue like kestrel.
> >
> > This way you could take advantage of how the other aspects of Kafka
> > (permanent log, etc etc.)
> >
> > I was just thinking if the zookeeper access was wrapped in something
> like:
> >
> > class DiscoverService
> >
> >def electLeader ..
> >def getFollower ...
> >
> > (I'm just making those methods up, but you get the point they are simply
> > the same calls zkutils etc. will be making to connect to zookeeper)
> >
> > Now the idea is, if you don't want to dedicate 3 servers to run
> zookeeper,
> > you could create your own implementation that e.g. returns data based on
> a
> > configuration file that is static and not a discover service like
> > zookeeper.
> >
> > Would wrapping the zookeper calls into a plugable/swapable service make
> > sense and allow you to still use Kakfa at a smaller scale or would this
> not
> > work for other reasons that I am overlooking?
> >
>


RE: refactoring ZK so it is plugable, would this make sense?

2014-10-09 Thread S Ahmed
Hi,

I was wondering if the zookeeper library (zkutils.scala etc) was designed
in a more modular way, would it make it possible to run a more "lean"
version of kafka?

The idea is I want to run kafka but with a less emphasis on it being
durable with failover and more on it being a replacement for a standard
queue like kestrel.

This way you could take advantage of how the other aspects of Kafka
(permanent log, etc etc.)

I was just thinking if the zookeeper access was wrapped in something like:

class DiscoverService

   def electLeader ..
   def getFollower ...

(I'm just making those methods up, but you get the point they are simply
the same calls zkutils etc. will be making to connect to zookeeper)

Now the idea is, if you don't want to dedicate 3 servers to run zookeeper,
you could create your own implementation that e.g. returns data based on a
configuration file that is static and not a discover service like zookeeper.

Would wrapping the zookeper calls into a plugable/swapable service make
sense and allow you to still use Kakfa at a smaller scale or would this not
work for other reasons that I am overlooking?


do apps with producers have to be restarted if cluster goes down and comes back up?

2014-06-26 Thread S Ahmed
Hi,

A few questions on timing related issues when certain parts of kafka go
down.

1.  If zookeeper goes down, then I bring it back online, do I have to
restart the brokers?
2.  If the brokers go down, producers will be erroring out.  When the
brokers are back online, do I have to restart the processes with producers?

3. When servers are restarted, how can you gaurantee that first the
zookeeper server come online, THEN the brokers, and THEN the webapp's with
the producers?   Or is the timing not that strict because of e.g. timeout
re-connect durations?

Thanks.


is it smarter to go with a java class for message serialization/des?

2014-06-17 Thread S Ahmed
My app is in scala and a quick search on serializing a scala class seems to
have potential issues with different versions of scala (I could be wrong as
I did a quick search).

Is it generally just a better idea to use plain old java classes for kafka
messages?

i.e. I simply use jackson like:

public class User implements Serializable {
...
}

// kakfa
val it = stream.iterator()
while (it.hasNext()) {
  val messageAndTopic = it.next

  val user = mapper.readValue(messageAndTopic.message(), classOf[User])


linkedin and pageview producer + when kafka is down

2014-06-16 Thread S Ahmed
I'd love to get some insights on how things work at linkedin in terms of
your web servers and kafka producers.

You guys probably connect to multiple kafka clusters, so let's assume you
are only connecting to a single cluster.

1. do you use a single producer for all message types/topics?

2. For your pageview topic i.e. it is getting sent on a per page request
(albiet it is batched):

*What happens when your kafka cluster is down?  Will your web application
behave as normal or will it really slow things down?

Locally on my laptop I shutdown my vagrant that is running kafka, and the
page renders very slow when the producer is down.

Or do you use some smart circuit breaker logic that will stop trying to
send producer messages if kafka is down?


kafka producer, one per web app?

2014-06-16 Thread S Ahmed
In my web application, I should be creating a single instance of a producer
correct?

So in scala I should be doing something like:

object KafkaProducer {
  // props...
   val producer = new Producer[AnyRef, AnyRef](new ProducerConfig(props))
}

And then say in my QueueService I would do:

class QueueService {

def send(topic: String, message: Array[Byte], partition: Array[Byte]): Unit
= {
try {
  KakfaProducer.producer.send(new KeyedMessage(topic,message,
partition))
} catch {
  case e: Exception =>
e.printStackTrace
System.exit(1)
}
  }

}

Threading wise, is this correct?


Re: re-writing old fetch request to work with 0.8 version

2014-06-13 Thread S Ahmed
Ok so now it is looping through the messages fine, and outputting the
actual message payload:

  while (true) {
//val fetchRequest = new FetchRequest("TEST", 0, offset, 1024)
val fetchRequest = new FetchRequestBuilder().addFetch(topic, partition,
offset, 1024).build()

val fetchResponse: FetchResponse = consumer1.fetch(fetchRequest)

val messageSet = fetchResponse.messageSet(topic, 0).iterator.toBuffer
println("consumed Message " +
Utils.readString(messageSet(0).message.payload, "UTF-8") )
offset += 1

  }


Is there a way for it to not crash at the end?


*** Just to be clear, the idea is the run an embedded version in my web
application so I can verify the messages are being send and processed in
development, this isn't a production idea of mine :)

consumed Message test199

consumed Message test200

[error] (run-main-0) java.lang.IndexOutOfBoundsException: 0

java.lang.IndexOutOfBoundsException: 0

at
scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)

at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:47)

at
com.debugging.jobs.KafkaEmbedded$delayedInit$body.apply(KafkaEmbedded.scala:92)

at scala.Function0$class.apply$mcV$sp(Function0.scala:40)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App$$anonfun$main$1.apply(App.scala:71)

at scala.App$$anonfun$main$1.apply(App.scala:71)

at scala.collection.immutable.List.foreach(List.scala:318)

at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)

at scala.App$class.main(App.scala:71)

at com.debugging.jobs.KafkaEmbedded$.main(KafkaEmbedded.scala:24)

at com.debugging.jobs.KafkaEmbedded.main(KafkaEmbedded.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)




On Fri, Jun 13, 2014 at 4:51 PM, S Ahmed  wrote:

> I found this embedded kafka example online (
> https://gist.github.com/mardambey/2650743)  which I am re-writing to work
> with 0.8
>
> Can someone help me re-write this portion:
>
>
>
>   val cons = new SimpleConsumer("localhost", 9090, 100, 1024)
>
>   var offset = 0L
>
>
>
>   var i = 0
>
>
>   while (true) {
> val fetchRequest = new FetchRequest("TEST", 0, offset, 1024)
>
>
>
> for (msg <- cons.fetch(fetchRequest)) {
>
>   i = i + 1
>
>   println("consumed [ " + i + "]: offset = " + msg.offset + ", payload = 
> " + Utils.toString(msg.message.payload, "UTF-8"))
>
>   offset = msg.offset
>
> }
>   }
>
>
>
> I have this so far:
>
>   val partition = 0
>   var offset = 0L
>
>   var i = 0
>   while (true) {
> //val fetchRequest = new FetchRequest("TEST", 0, offset, 1024)
> val fetchRequest = new FetchRequestBuilder().addFetch(topic, partition, 
> offset, 1024).build()
>
> val fetchResponse: FetchResponse = consumer1.fetch(fetchRequest)
>
> val messageSet = fetchResponse.messageSet(topic, 0).iterator.toBuffer
> println("consumed Message " + messageSet(0).message)
>
>   }
>
> This currently loops forever b/c it isn't incrementing the offset or anything.
>
> I'm confused b/c I believe there is no more offset as things are more user 
> friendly with an incrmeenting counter.
>
>
> Any help would be appreciated.
>
>


re-writing old fetch request to work with 0.8 version

2014-06-13 Thread S Ahmed
I found this embedded kafka example online (
https://gist.github.com/mardambey/2650743)  which I am re-writing to work
with 0.8

Can someone help me re-write this portion:


  val cons = new SimpleConsumer("localhost", 9090, 100, 1024)
  var offset = 0L

  var i = 0

  while (true) {
val fetchRequest = new FetchRequest("TEST", 0, offset, 1024)

for (msg <- cons.fetch(fetchRequest)) {
  i = i + 1
  println("consumed [ " + i + "]: offset = " + msg.offset + ",
payload = " + Utils.toString(msg.message.payload, "UTF-8"))
  offset = msg.offset
}
  }


I have this so far:

  val partition = 0
  var offset = 0L

  var i = 0
  while (true) {
//val fetchRequest = new FetchRequest("TEST", 0, offset, 1024)
val fetchRequest = new FetchRequestBuilder().addFetch(topic,
partition, offset, 1024).build()

val fetchResponse: FetchResponse = consumer1.fetch(fetchRequest)

val messageSet = fetchResponse.messageSet(topic, 0).iterator.toBuffer
println("consumed Message " + messageSet(0).message)

  }

This currently loops forever b/c it isn't incrementing the offset or anything.
I'm confused b/c I believe there is no more offset as things are more
user friendly with an incrmeenting counter.

Any help would be appreciated.


Re: scala based service/deamon consumer example

2014-06-12 Thread S Ahmed
ah ok, " Kafka that may block if there are no new messages available."

So the iterator blocks thanks.


On Thu, Jun 12, 2014 at 4:13 PM, Guozhang Wang  wrote:

> You may want to read this wiki:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>
> Guozhang
>
>
> On Thu, Jun 12, 2014 at 11:13 AM, S Ahmed  wrote:
>
> > Is there a simple example (scala preferred) where there is a consumer
> that
> > is written to run as a deamon i.e. it keeps as open connection into a
> > broker and reads off new messages
> >
>
>
>
> --
> -- Guozhang
>


scala based service/deamon consumer example

2014-06-12 Thread S Ahmed
Is there a simple example (scala preferred) where there is a consumer that
is written to run as a deamon i.e. it keeps as open connection into a
broker and reads off new messages


ec2 suggestion setup for a minimum kafka setup; zookeeper is bad in ec2?

2014-06-11 Thread S Ahmed
For those of you hosting on ec2, could someone suggest a "minimum"
recommended setup for kafka?  i.e. the # and type of instance size that you
would say is the bare minimum to get started with kafka in ec2.

My guess is the suggest route is the m3 instance type?
How about:

m3.medium  1 cpu, 3.75GB Ram  4 GB SSD
m3.large   2 cpus, 7.5 GB ram   32 GB SSD


My requirements are maybe a few hundred messages per second to start, but
want to grow my usage as my confidence/experience with kafka increases.

I was reading this post regarding kafka on ec2 (
http://engineering.onlive.com/2013/12/12/didnt-use-kafka/) and one of the
main cons the author says exists with kafka on ec2 is that zookeeper is a
little finicky when it comes to latency and may knock a host out b/c of
latency issues.

How do you guys on ec2 work around these issues?


Re: are consumer offsets stored in a log?

2014-06-04 Thread S Ahmed
Very nice.

Do you guys have any stats on what kind of load was reduced on ZK?   Just
trying to understand if this changes the type of servers required to host
ZK.




On Wed, Jun 4, 2014 at 1:10 PM, Guozhang Wang  wrote:

> Yes, we are migrating the offset management from ZK to the broker as a
> special log.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/Inbuilt+Consumer+Offset+Management
>
> The code is in trunk, and it is running in production at LinkedIn now.
>
> Guozhang
>
>
> On Wed, Jun 4, 2014 at 10:00 AM, S Ahmed  wrote:
>
> > I swear I read that Jay Kreps wrote somewhere that consumers now write
> > their offsets in a logfile (not in zookeeper).
> >
> > Is this true or did I misread?  Sorry I can't find the article I was
> > reading.
> >
>
>
>
> --
> -- Guozhang
>


are consumer offsets stored in a log?

2014-06-04 Thread S Ahmed
I swear I read that Jay Kreps wrote somewhere that consumers now write
their offsets in a logfile (not in zookeeper).

Is this true or did I misread?  Sorry I can't find the article I was
reading.


Re: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka

2014-05-20 Thread S Ahmed
Yes agreed, but I have done some load testing before and kafka was doing
10's of thousands of messages per second.

If I am doing only hundreds, I think it could handle it for now.  Like I
said this is small scale.


On Tue, May 20, 2014 at 2:51 PM, Neha Narkhede wrote:

> It is not recommended to install both kafka and zookeeper on the same box
> as both would fight for the available memory and performance will degrade.
>
> Thanks
> Neha
>
>
> On Mon, May 19, 2014 at 7:29 AM, S Ahmed  wrote:
>
> > Hi,
> >
> > I like how kafka operates, but I'm wondering if it is possible to run
> > everything on a single ec2 instance with 7.5 GB RAM.
> >
> > So that would be zookeeper and a single kafka broker.
> >
> > I would have a separate server to consume from the broker.
> >
> > Producers would be from my web servers.
> >
> >
> > I don't want to complicate things as i don't really need failover or
> > redundancy etc.  I just want to keep things simple.
> >
> > I'll have a single topic, and a few partitions because I want the
> guarantee
> > that the messages are in order.
> >
> >
> > Is this something that would be really out of the norm and not
> recommended?
> > i.e. nobody really uses it this way and who knows what is going to
> happen?
> > :)
> >
>


RE: starting of at a small scale, single ec2 instance with 7.5 GB RAM with kafka

2014-05-19 Thread S Ahmed
Hi,

I like how kafka operates, but I'm wondering if it is possible to run
everything on a single ec2 instance with 7.5 GB RAM.

So that would be zookeeper and a single kafka broker.

I would have a separate server to consume from the broker.

Producers would be from my web servers.


I don't want to complicate things as i don't really need failover or
redundancy etc.  I just want to keep things simple.

I'll have a single topic, and a few partitions because I want the guarantee
that the messages are in order.


Is this something that would be really out of the norm and not recommended?
i.e. nobody really uses it this way and who knows what is going to happen?
:)


Re: New Consumer API discussion

2014-02-28 Thread S Ahmed
Few clarifications:

1. "The new
consumer API is non blocking and instead of returning a blocking iterator,
the consumer provides a poll() API that returns a list of records. "

So this means the consumer polls, and if there are new messages it pulls
them down and then disconnects?

2.
" The consumer also allows long poll
to reduce the end-to-end message latency for low throughput data."

How is this different than blocking?  Is it even based meaning it keeps a
long poll conneciton open, and if/when a new message arrives it triggers an
event on the consumer side?


3.
" The consumer batches
data and multiplexes I/O over TCP connections to each of the brokers it
communicates with, for high throughput. "

If it is single threaded, does each tcp brocker connection block?  Not sure
I understand how this works if it is single threaded.



On Thu, Feb 27, 2014 at 11:38 PM, Robert Withers  wrote:

> Thank you, Neha, that makes it clear.  Really, the aspect of all this that
> we could really use is a way to do exactly once processing.  We are looking
> at more critical data.  What are the latest thoughts on how to achieve
> exactly once and how might that affect a consumer API?
>
> Thanks,
> Rob
>
> On Feb 27, 2014, at 10:29 AM, Neha Narkhede 
> wrote:
>
> > Is this<
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/KafkaConsumer.html#seek%28kafka.common.TopicPartitionOffset...%29
> >what
> > you are looking for? Basically, I think from the overall feedback, it
> > looks like code snippets don't seem to work for overall understanding of
> > the APIs. I plan to update the javadoc with more complete examples that
> > have been discussed so far on this thread and generally on the mailing
> list.
> >
> > Thanks,
> > Neha
> >
> >
> >
> >
> > On Thu, Feb 27, 2014 at 4:17 AM, Robert Withers
> > wrote:
> >
> >> Neha,
> >>
> >> I see how one might wish to implement onPartitionsAssigned and
> >> onPartitionsRevoked, but I don't have a sense for how I might supply
> these
> >> implementations to a running consumer.  What would the setup code look
> like
> >> to start a high-level consumer with these provided implementations?
> >>
> >> thanks,
> >> Rob
> >>
> >>
> >> On Feb 27, 2014, at 3:48 AM, Neha Narkhede 
> >> wrote:
> >>
> >>> Rob,
> >>>
> >>> The use of the callbacks is explained in the javadoc here -
> >>>
> >>
> http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/kafka/clients/consumer/ConsumerRebalanceCallback.html
> >>>
> >>> Let me know if it makes sense. The hope is to improve the javadoc so
> that
> >>> it is self explanatory.
> >>>
> >>> Thanks,
> >>> Neha
> >>>
> >>>
> >>> On Wed, Feb 26, 2014 at 9:16 AM, Robert Withers
> >>> wrote:
> >>>
>  Neha, what does the use of the RebalanceBeginCallback and
>  RebalanceEndCallback look like?
> 
>  thanks,
>  Rob
> 
>  On Feb 25, 2014, at 3:51 PM, Neha Narkhede 
>  wrote:
> 
> > How do you know n? The whole point is that you need to be able to
> fetch
>  the
> > end offset. You can't a priori decide you will load 1m messages
> without
> > knowing what is there.
> >
> > Hmm. I think what you are pointing out is that in the new consumer
> API,
>  we
> > don't have a way to issue the equivalent of the existing
>  getOffsetsBefore()
> > API. Agree that is a flaw that we should fix.
> >
> > Will update the docs/wiki with a few use cases that I've collected so
> >> far
> > and see if the API covers those.
> >
> > I would prefer PartitionsAssigned and PartitionsRevoked as that seems
> > clearer to me
> >
> > Well the RebalanceBeginCallback interface will have
>  onPartitionsAssigned()
> > as the callback. Similarly, the RebalanceEndCallback interface will
> >> have
> > onPartitionsRevoked() as the callback. Makes sense?
> >
> > Thanks,
> > Neha
> >
> >
> > On Tue, Feb 25, 2014 at 2:38 PM, Jay Kreps 
> >> wrote:
> >
> >> 1. I would prefer PartitionsAssigned and PartitionsRevoked as that
> >> seems
> >> clearer to me.
> >>
> >> -Jay
> >>
> >>
> >> On Tue, Feb 25, 2014 at 10:19 AM, Neha Narkhede <
>  neha.narkh...@gmail.com
> >>> wrote:
> >>
> >>> Thanks for the reviews so far! There are a few outstanding
> questions
> >> -
> >>>
> >>> 1.  It will be good to make the rebalance callbacks forward
> >> compatible
> >> with
> >>> Java 8 capabilities. We can change it to PartitionsAssignedCallback
> >>> and PartitionsRevokedCallback or RebalanceBeginCallback and
> >>> RebalanceEndCallback?
> >>>
> >>> If there are no objections, I will change it to
> >> RebalanceBeginCallback
> >> and
> >>> RebalanceEndCallback.
> >>>
> >>> 2.  The return type for committed() is List.
>  There
> >>> was a suggestion to change it to either be Map
> >> or
> >>> Map
> >>>
> >>> Do people

code + sbt tips

2014-02-10 Thread S Ahmed
Few quick questions that I hope people can help me with:


1. most of you guys use intellij, do you always build using sbt?  i.e. you
lose out on the bulid with IDE features like clicking on an error that
jumps to that part of the code etc.

2. do you just build using the default scala version 2.8 during development?


I have attached a screenshot of what it looks like on some .scala files,
does yours have this issue also? (see screenshot)


Re: New Producer Public API

2014-02-06 Thread S Ahmed
How about the following use case:

Just before the producer actually sends the payload to kakfa, could an
event be exposed that would allow one to loop through the messages and
potentially delete some of them?

Example:

Say you have 100 messages, but before you send these messages to kakfa, you
can easily aggregate many of these messages to reduce the message count.
 If there are messages that store counts, you could aggregate these into a
single message and then send to kafka.

Thoughts?



On Wed, Feb 5, 2014 at 2:03 PM, Jay Kreps  wrote:

> It might. I considered this but ended up going this way. Now that we have
> changed partitionKey=>partition it almost works. The difference is the
> consumer gets an offset too which the producer doesn't have.
>
> One thing I think this points to is the value of getting the consumer java
> api worked out even in the absence of an implementation just so we can
> write some fake code that uses both and kind of see how it feels.
>
> -Jay
>
>
> On Wed, Feb 5, 2014 at 10:23 AM, Neha Narkhede  >wrote:
>
> > Currently, the user will send ProducerRecords using the new producer. The
> > expectation will be that you get the same thing as output from the
> > consumer. Since ProduceRecord is a holder for topic, partition, key and
> > value, does it make sense to rename it to just Record? So, the
> send/receive
> > APIs would look like the following -
> >
> > producer.send(Record record);
> > List poll();
> >
> > Thoughts?
> >
> >
> > On Sun, Feb 2, 2014 at 4:12 PM, Guozhang Wang 
> wrote:
> >
> > > I think the most common motivate of having a customized partitioner is
> to
> > > make sure some messages always go to the same partition, but people may
> > > seldom want to know about which partition exactly they go to. If that
> is
> > > true, why not just assign the same byte array as partition key with the
> > > default hash based partitioning in option 1.A? But again, that is based
> > on
> > > my presumption that very few users would want to really specify the
> > > partition id.
> > >
> > >
> > >
> > > On Fri, Jan 31, 2014 at 2:44 PM, Jay Kreps 
> wrote:
> > >
> > > > Hey Tom,
> > > >
> > > > Agreed, there is definitely nothing that prevents our including
> > > partitioner
> > > > implementations, but it does get a little less seamless.
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown 
> > wrote:
> > > >
> > > > > Regarding partitioning APIs, I don't think there is not a common
> > subset
> > > > of
> > > > > information that is required for all strategies. Instead of
> modifying
> > > the
> > > > > core API to easily support all of the various partitioning
> > strategies,
> > > > > offer the most common ones as libraries they can build into their
> own
> > > > data
> > > > > pipeline, just like serialization. The core API would simply
> accept a
> > > > > partition index. You could include one default strategy (random)
> that
> > > > only
> > > > > applies if they set "-1" for the partition index.
> > > > >
> > > > > That way, each partitioning strategy could have its own API that
> > makes
> > > > > sense for it. For example, a round-robin partitioner only needs one
> > > > method:
> > > > > "nextPartition()", while a hash-based one needs
> > > > "getPartitionFor(byte[])".
> > > > >
> > > > > For those who actually need a pluggable strategy, a superset of the
> > API
> > > > > could be codified into an interface (perhaps the existing
> partitioner
> > > > > interface), but it would still have to be used from outside of the
> > core
> > > > > API.
> > > > >
> > > > > This design would make the core API less confusing (when do I use a
> > > > > partiton key instead of a partition index, does the key overwrite
> the
> > > > > index, can the key be null, etc...?) while still providing the
> > > > flexibility
> > > > > you want.
> > > > >
> > > > > --Tom
> > > > >
> > > > > On Fri, Jan 31, 2014 at 12:07 PM, Jay Kreps 
> > > wrote:
> > > > >
> > > > > > Oliver,
> > > > > >
> > > > > > Yeah that was my original plan--allow the registration of
> multiple
> > > > > > callbacks on the future. But there is some additional
> > implementation
> > > > > > complexity because then you need more synchronization variables
> to
> > > > ensure
> > > > > > the callback gets executed even if the request has completed at
> the
> > > > time
> > > > > > the callback is registered. This also makes it unpredictable the
> > > order
> > > > of
> > > > > > callback execution--I want to be able to guarantee that for a
> > > > particular
> > > > > > partition callbacks for lower offset messages happen before
> > callbacks
> > > > for
> > > > > > higher offset messages so that if you set a highwater mark or
> > > something
> > > > > it
> > > > > > is easy to reason about. This has the added benefit that
> callbacks
> > > > > execute
> > > > > > in the I/O thread ALWAYS instead of it being non-deterministic
> > which
> > > > is a
> > > > > > little confusing.
> > > > >

Re: Surprisingly high network traffic between kafka servers

2014-02-05 Thread S Ahmed
Sorry I'm not a ops person, but what tools do you use to monitor traffic
between servers?


On Tue, Feb 4, 2014 at 11:46 PM, Carl Lerche  wrote:

> Hello,
>
> I'm running a 0.8.0 Kafka cluster of 3 servers. The service that it is
> for is not in full production yet, so the data written to cluster is
> minimal (seems to average between 100kb/s -> 300kb/s per server). I
> have configured Kafka to have a 3 replicas. I am noticing that each
> Kafka server is talking to all the others at a data rate of 40MB/s for
> each server (so, a total of 80MB/s for each server). This
> communication is constant.
>
> Is this normal? This seems like very strange behavior and I'm not
> exactly sure how to debug.
>
> Thanks,
> Carl
>


Why does the high level consumer block, or rather where does it?

2014-01-05 Thread S Ahmed
I'm trying to trace through the codebase and figure out where exactly the
block occurs in the high level consumer?

public void run() {
 ConsumerIterator it = m_stream.iterator();
 while (it.hasNext())
 System.out.println("Thread " + m_threadNumber + ": " + new
String(it.next().message()));
 System.out.println("Shutting down Thread: " + m_threadNumber);
 }
Reference:
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

So from what I understand, the while.it.hasNext() will block if there are
no new messages for this particular topic/partion correct?

Just to understand, can someone clarify where in the kafka source this
block occurs, i.e. the broker that this consumer is connected to will keep
a socket connection open to this consumer and block until a new message
that is owned by this consumer thread arrives and then pushes it to the
consumer to process.

Is it at the iterator level somewhere?
https://github.com/kafka-dev/kafka/blob/master/core/src/main/scala/kafka/consumer/ConsumerIterator.scala


kafka + storm

2014-01-01 Thread S Ahmed
I have briefly looked at storm, but just a quick question, storm seems to
have all these workers but they way it seems to me the order in which these
items are processed off the queue is very random correct?

In my use case order is very important so using something like storm would
not be suitable right?

I first learned of kafka + storm based on a post by someone from loggly,
but loggly can process items randomly I would imagine because at the end of
the day each log item is timestamped so after it is processed and indexed
things would be fine.

But if your use case is such that processing message#123 before message#122
would have side effects that you can't use something like storm.


Re: redis versus zookeeper to track consumer offsets

2013-12-17 Thread S Ahmed
Interesting, wasn't aware of that.

Can you comment on how you go about monitoring your ZK cluster in terms of
throughput and if it is reaching its limits? Or is it even possible to do
this?


On Tue, Dec 17, 2013 at 2:01 PM, Benjamin Black  wrote:

> ZK was designed from the start as a clustered, consistent, highly available
> store for this sort of data and it works extremely well. Redis wasn't and I
> don't know anyone using Redis in production, including me, who doesn't have
> stories of Redis losing data. I'm sticking with ZK.
>
>
> On Tue, Dec 17, 2013 at 10:57 AM, S Ahmed  wrote:
>
> > I am leaning towards using redis to track consumer offsets etc., but I
> see
> > how using zookeeper makes sense since it already part of the kafka infra.
> >
> > One thing which bothers me is, how are you guys keeping track of the load
> > on zookeeper?  How do you get an idea when your zookeeper cluster is
> > underprovisioned?
> >
> > Redis is a richer store and could help in other areas where you want to
> > store more than just status information or offsets, and setup and
> > administration wise it seems a bit easier to manage.
> >
> > Thoughts?
> >
>


redis versus zookeeper to track consumer offsets

2013-12-17 Thread S Ahmed
I am leaning towards using redis to track consumer offsets etc., but I see
how using zookeeper makes sense since it already part of the kafka infra.

One thing which bothers me is, how are you guys keeping track of the load
on zookeeper?  How do you get an idea when your zookeeper cluster is
underprovisioned?

Redis is a richer store and could help in other areas where you want to
store more than just status information or offsets, and setup and
administration wise it seems a bit easier to manage.

Thoughts?


Re: storing last processed offset, recovery of failed message processing etc.

2013-12-11 Thread S Ahmed
When using ZK to keep track of last offsets metrics etc., how do you know
when you are pushing your ZK cluster to its limit?

Or can ZK handle thousands of writes/reads per second no problem since it
is all in-memory?  But even so, you need some idea on its upper limits and
how close you are to that limit etc.


On Mon, Dec 9, 2013 at 3:31 PM, Philip O'Toole  wrote:

> We use Zookeeper, as is standard with Kafka.
>
> Our systems are idempotent, so we only store offsets when the message is
> fully processed. If this means we occasionally replay a message due to some
> corner-case, or simply a restart, it doesn't matter.
>
> Philip
>
>
> On Mon, Dec 9, 2013 at 12:28 PM, S Ahmed  wrote:
>
> > I was hoping people could comment on how they handle the following
> > scenerios:
> >
> > 1. Storing the last successfully processed messageId/Offset.  Are people
> > using mysql, redis, etc.?  What are the tradeoffs here?
> >
> > 2. How do you handle recovering from an error while processesing a given
> > event?
> >
> > There are various scenerioes for #2, like:
> > 1. Do you mark the start of processing a message somewhere, and then
> update
> > the status to complete and THEN update the last messaged processed for
> #1?
> > 2. Do you only mark the status as complete, and not the start of
> processing
> > it?  I guess this depends of there are intermediate steps and processing
> > the entire message again would result in some duplicated work right?
> >
>


Re: Anyone working on a Kafka book?

2013-12-10 Thread S Ahmed
Great, so its not even at MEAP stage then :(, let me guess, it is going to
take 6 months to decide on what animal to put on the cover! :)

Looking forward to in though!


On Tue, Dec 10, 2013 at 12:15 PM, chetan conikee  wrote:

> Hey Guys
>
> Yes, Ben Lorica (Oreilly) and I are planning to pen a "Beginning Kafka"
> book.
> We only finalized this late October are hoping to start this mid-month
>
> Chetan
>
>
> On Tue, Dec 10, 2013 at 8:45 AM, Steve Morin  wrote:
>
> > I'll let chetan comment if he's up for it.
> > -Steve
> >
> >
> >
> > On Tue, Dec 10, 2013 at 8:40 AM, David Arthur  wrote:
> >
> >> There was some talk a few months ago, not sure what the current status
> is.
> >>
> >>
> >> On 12/10/13 10:01 AM, S Ahmed wrote:
> >>
> >>> Is there a book or this was just an idea?
> >>>
> >>>
> >>> On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin  >>> >wrote:
> >>>
> >>>  Thanks Jun,
> >>>>
> >>>> I've updated the example with this information.
> >>>>
> >>>> I've also removed some of the unnecessary newlines.
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Chris
> >>>>
> >>>>
> >>>> On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao  wrote:
> >>>>
> >>>>  Chris,
> >>>>>
> >>>>> This looks good. One thing about partitioning. Currently, if a
> message
> >>>>> doesn't have a key, we always use the random partitioner (regardless
> of
> >>>>> what "partitioner.class" is set to).
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jun
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>
> >
>


Re: Anyone working on a Kafka book?

2013-12-10 Thread S Ahmed
Is there a book or this was just an idea?


On Mon, Mar 25, 2013 at 12:42 PM, Chris Curtin wrote:

> Thanks Jun,
>
> I've updated the example with this information.
>
> I've also removed some of the unnecessary newlines.
>
> Thanks,
>
> Chris
>
>
> On Mon, Mar 25, 2013 at 12:04 PM, Jun Rao  wrote:
>
> > Chris,
> >
> > This looks good. One thing about partitioning. Currently, if a message
> > doesn't have a key, we always use the random partitioner (regardless of
> > what "partitioner.class" is set to).
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
>


Re: storing last processed offset, recovery of failed message processing etc.

2013-12-09 Thread S Ahmed
Say am I doing this, a scenerio that I just came up with that demonstrates
#2.

Someone signs up on a website, and you have to:

1. create the user profile
2. send email confirmation email
3. resize avatar


Now once a person registers on a website, I write a message to Kafka.

Now I have 3 different things to process (1,2,3), if I get to #2 and then
the server loses power, if I replay, I will re-send the confirmation email
2 times.   Sure in this case its not that big of a deal, but just pretend
it is, what should be done?

I guess I have to keep track of state then per step in ZK right? I mean
that's the only way so I guess I am answering my own question but was
hoping for people with real-life experience to chime in.

I could write 3 messages to kafka, but maybe order is important :)


On Mon, Dec 9, 2013 at 3:31 PM, Philip O'Toole  wrote:

> We use Zookeeper, as is standard with Kafka.
>
> Our systems are idempotent, so we only store offsets when the message is
> fully processed. If this means we occasionally replay a message due to some
> corner-case, or simply a restart, it doesn't matter.
>
> Philip
>
>
> On Mon, Dec 9, 2013 at 12:28 PM, S Ahmed  wrote:
>
> > I was hoping people could comment on how they handle the following
> > scenerios:
> >
> > 1. Storing the last successfully processed messageId/Offset.  Are people
> > using mysql, redis, etc.?  What are the tradeoffs here?
> >
> > 2. How do you handle recovering from an error while processesing a given
> > event?
> >
> > There are various scenerioes for #2, like:
> > 1. Do you mark the start of processing a message somewhere, and then
> update
> > the status to complete and THEN update the last messaged processed for
> #1?
> > 2. Do you only mark the status as complete, and not the start of
> processing
> > it?  I guess this depends of there are intermediate steps and processing
> > the entire message again would result in some duplicated work right?
> >
>


RE: storing last processed offset, recovery of failed message processing etc.

2013-12-09 Thread S Ahmed
I was hoping people could comment on how they handle the following
scenerios:

1. Storing the last successfully processed messageId/Offset.  Are people
using mysql, redis, etc.?  What are the tradeoffs here?

2. How do you handle recovering from an error while processesing a given
event?

There are various scenerioes for #2, like:
1. Do you mark the start of processing a message somewhere, and then update
the status to complete and THEN update the last messaged processed for #1?
2. Do you only mark the status as complete, and not the start of processing
it?  I guess this depends of there are intermediate steps and processing
the entire message again would result in some duplicated work right?


Re: Loggly's use of Kafka on AWS

2013-12-02 Thread S Ahmed
Interesting.  So twitter storm is used to basically process the messages on
kafka?   I'll have to read-up on storm b/c I always thought the use case
was a bit different.



On Sun, Dec 1, 2013 at 9:59 PM, Joe Stein  wrote:

> Awesome Philip, thanks for sharing!
>
> On Sun, Dec 1, 2013 at 9:17 PM, Philip O'Toole  wrote:
>
> > A couple of us here at Loggly recently spoke at AWS reinvent, on how we
> > use Kafka 0.72 in our ingestion pipeline. The slides are at the link
> below,
> > and may be of interest to people on this list.
> >
> >
> >
> http://www.slideshare.net/AmazonWebServices/infrastructure-at-scale-apache-kafka-twitter-storm-elastic-search-arc303-aws-reinvent-2013
> >
> > Any questions, let me know, though I can't promise I can answer
> > everything. Can't give the complete game away. :-)
> >
> > As always, Kafka rocks!
> >
> > Philip
> >
> >
> >
> >
>


producer (or consumer?) statistics that was using metrics

2013-10-17 Thread S Ahmed
I remember a while back Jay was looking for someone to work on producer (or
was it consumer) statisitics which was going to use metrics.

Was this ever implemented?


Anyone running kafka with a single broker in production? what about only 8GB ram?

2013-10-10 Thread S Ahmed
Is anyone out there running a single broker kafka setup?

How about with only 8 GB RAM?

I'm looking at one of the better dedicated server prodivers, and a 8GB
server is pretty much what I want to spend at the moment, would it make
sense going this route?
This same server would also potentially be running zookeeper also.

In terms of messages per second, at most I would be seeing about 2000
messages per second, of 20KB to 200KB in size.

I know the people at linkedin are running with I believe 24GB of ram.


Re: who is using kafka to stare large messages?

2013-10-07 Thread S Ahmed
When you batch things on the producer, say you batch 1000 messages or by
time whatever, the total message size of the batch should be less than
message.max.bytes or is that for each individual message?

When you batch, I am assuming that the producer sends some sort of flag
that this is a batch, and then the broker will split up those messages to
individual messages and store them in the log correct?


On Mon, Oct 7, 2013 at 12:21 PM, Neha Narkhede wrote:

> The message size limit is imposed on the compressed message. To answer your
> question about the effect of large messages - they cause memory pressure on
> the Kafka brokers as well as on the consumer since we re-compress messages
> on the broker and decompress messages on the consumer.
>
> I'm not so sure that large messages will have a hit on latency since
> compressing a few large messages vs compressing lots of small messages with
> the same content, should not be any slower. But you want to be careful on
> the batch size since you don't want the compressed message to exceed the
> message size limit.
>
> Thanks,
> Neha
>
>
> On Mon, Oct 7, 2013 at 9:10 AM, S Ahmed  wrote:
>
> > I see, so that is one thing to consider is if I have 20 KB messages, I
> > shouldn't batch too many together as that will increase latency and the
> > memory usage footprint on the producer side of things.
> >
> >
> > On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao  wrote:
> >
> > > At LinkedIn, our message size can be 10s of KB. This is mostly because
> we
> > > batch a set of messages and send them as a single compressed message.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Oct 7, 2013 at 7:44 AM, S Ahmed  wrote:
> > >
> > > > When people using message queues, the message size is usually pretty
> > > small.
> > > >
> > > > I want to know who out there is using kafka with larger payload
> sizes?
> > > >
> > > > In the configuration, the maximum message size by default is set to 1
> > > > megabyte (
> > > > message.max.bytes100)
> > > >
> > > > My message sizes will be probably be around 20-50 KB but to me that
> is
> > > > large for a message payload so I'm wondering what effects that will
> > have
> > > > with kafka.
> > > >
> > >
> >
>


Re: who is using kafka to stare large messages?

2013-10-07 Thread S Ahmed
I see, so that is one thing to consider is if I have 20 KB messages, I
shouldn't batch too many together as that will increase latency and the
memory usage footprint on the producer side of things.


On Mon, Oct 7, 2013 at 11:55 AM, Jun Rao  wrote:

> At LinkedIn, our message size can be 10s of KB. This is mostly because we
> batch a set of messages and send them as a single compressed message.
>
> Thanks,
>
> Jun
>
>
> On Mon, Oct 7, 2013 at 7:44 AM, S Ahmed  wrote:
>
> > When people using message queues, the message size is usually pretty
> small.
> >
> > I want to know who out there is using kafka with larger payload sizes?
> >
> > In the configuration, the maximum message size by default is set to 1
> > megabyte (
> > message.max.bytes100)
> >
> > My message sizes will be probably be around 20-50 KB but to me that is
> > large for a message payload so I'm wondering what effects that will have
> > with kafka.
> >
>


who is using kafka to stare large messages?

2013-10-07 Thread S Ahmed
When people using message queues, the message size is usually pretty small.

I want to know who out there is using kafka with larger payload sizes?

In the configuration, the maximum message size by default is set to 1
megabyte (
message.max.bytes100)

My message sizes will be probably be around 20-50 KB but to me that is
large for a message payload so I'm wondering what effects that will have
with kafka.


use case with high rate of duplicate messages

2013-10-01 Thread S Ahmed
I have a use case where thousands of servers send status type messages,
which I am currently handling real-time w/o any kind of queueing system.

So currently when I receive a message, and perform a md5 hash of the
message, perform a lookup in my database to see if this is a duplicate, if
not, I store the message.

Now the message format can be either xml or json, and the actual parsing of
the message takes time so I would am thinking of storing all the messages
in kafka first and then batch processing these messages in hopes that this
will be faster to do.

Do you think there would be a faster way of recognizing duplicate messages
this way or its just the same problem but doing it on a batch level?


Re: Logo

2013-07-22 Thread S Ahmed
Similar, yet different.  I like it!


On Mon, Jul 22, 2013 at 1:25 PM, Jay Kreps  wrote:

> Yeah, good point. I hadn't seen that before.
>
> -Jay
>
>
> On Mon, Jul 22, 2013 at 10:20 AM, Radek Gruchalski <
> radek.gruchal...@portico.io> wrote:
>
> > 296 looks familiar: https://www.nodejitsu.com/
> >
> > Kind regards,
> > Radek Gruchalski
> > radek.gruchal...@technicolor.com (mailto:
> radek.gruchal...@technicolor.com)
> > | radek.gruchal...@portico.io (mailto:radek.gruchal...@portico.io) |
> > ra...@gruchalski.com (mailto:ra...@gruchalski.com)
> > 00447889948663
> >
> > Confidentiality:
> > This communication is intended for the above-named person and may be
> > confidential and/or legally privileged.
> > If it has come to you in error you must take no action based on it, nor
> > must you copy or show it to anyone; please delete/destroy and inform the
> > sender immediately.
> >
> >
> > On Monday, 22 July 2013 at 18:51, Jay Kreps wrote:
> >
> > > Hey guys,
> > >
> > > We need a logo!
> > >
> > > I got a few designs from a 99 designs contest that I would like to put
> > > forward:
> > > https://issues.apache.org/jira/browse/KAFKA-982
> > >
> > > If anyone else would like to submit a design that would be great.
> > >
> > > Let's do a vote to choose one.
> > >
> > > -Jay
> >
> >
>


byte offset -> sequential

2013-07-03 Thread S Ahmed
When you guys refactored the offset to be human friendly i.e. numerical
versus the byte offset, what was involved in that refactor?  Is there a
wiki page for that?

I'm guessing there was a index file that was created for this, or is this
currently managed in zookeeper?


This wiki is related to the consumer offset, which is not what I'm looking
for: (https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management


Re: Kafka User Group Meeting Jun. 27

2013-07-02 Thread S Ahmed
Was this recorded by any chance?


On Wed, Jun 26, 2013 at 11:22 AM, Jun Rao  wrote:

> Hi, Everyone,
>
> We have finalized our agenda of the meetup Thursday evening, with Speakers
> from LinkedIn, RichRelevance, Netflix and Square. Please RSVP to the meetup
> link below. For remote people, we will post the streaming link at meetup
> before the meeting and we will use our IRC for remote questions.
>
>
> http://www.meetup.com/http-kafka-apache-org/events/125887332/?_af_eid=125887332&a=uc1_vm&_af=event
>
> See you Thursday evening.
>
> Thanks,
>
> Jun
>


RE: production consumer

2013-06-17 Thread S Ahmed
Greetings,

Are there any code samples of consumer that is used in production?  I'm
looking for a something that is a daemon that has x number of threads and
reads messages of kafka.

I've looked at the sampleconsumer etc., but those are just one of runs that
exit.

i'm guessing that you would also have some retries in case a connection
temporarily goes down, and maybe even some co-ordination with zookeeper
etc. to spin up multiple consumers.

thanks!


RE: shipping logs to s3 or other servers for backups

2013-06-13 Thread S Ahmed
Hi,

In my application, I am storing user events, and I want to partition the
storage by day.

So at the end of a day, I want to take that "file" and ship it to s3 or
another server as a backup.  This way I can replay the events for a
specific day if needed.

These events also have to be in order.

How should I structure things in kafka to ensure that these user events all
belong to the same file (or set of files if the file gets large I believe
kafka splits it into multiple files).


Re: kafka 0.8

2013-06-11 Thread S Ahmed
Yeah probably a good idea to upgrade your end first, I'm sure some things
might come up :)


On Tue, Jun 11, 2013 at 1:02 PM, Soby Chacko  wrote:

> Jun,
>
> That is great to hear. Looking forward to it.
>
> Thanks,
> Soby
>
>
> On Tue, Jun 11, 2013 at 12:20 PM, Jun Rao  wrote:
>
> > Soby,
> >
> > Sorry for the delay. This week and early next week, all Kafka committers
> at
> > LinkedIn are busy with upgrading Kafka to 0.8 internally at LinkedIn. We
> > will try to start the release process as soon as we have things under
> > control.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Jun 10, 2013 at 6:54 PM, Soby Chacko 
> wrote:
> >
> > > Hello,
> > >
> > > Any updates on the 0.8 beta release?
> > >
> > > Soby Chacko
> > >
> > >
> > > On Tue, Jun 4, 2013 at 12:24 PM, Neha Narkhede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > I was just about to send an update. We can release beta right away.
> > > >
> > > > Joe,
> > > >
> > > > I remember you were interested in helping out. Let me know if you are
> > > still
> > > > up for managing the release.
> > > >
> > > > Thanks,
> > > > Neha
> > > > On Jun 4, 2013 9:22 AM, "Soby Chacko"  wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > Just wanted to know where we are with the beta release for 0.8?
> More
> > > > > importantly, is 0.8 going to be publicly available from a maven
> > > > repository?
> > > > > How about different versions of 0.8 built for different versions of
> > > > scala?
> > > > > (for example scala 2.8 vs 2.9 etc.)
> > > > >
> > > > > Much appreciated.
> > > > >
> > > > > Soby Chacko
> > > > >
> > > >
> > >
> >
>


RE: message order, guarenteed?

2013-06-09 Thread S Ahmed
I understand that there are no guarantees per say that a message may be a
duplicate (its the consumer's job to guarantee that), but when it comes to
message order, is kafka built in such a way that it is impossible to get
messages in the wrong order?

Certain use cases might not be sensitive to order, but when order is very
important, is kafka the wrong tool for the job or is there a way to get
this requirement?


Re: macbook air and kafka

2013-06-04 Thread S Ahmed
I have a 1st gen i7 wit 8GB ram, and mine takes 77 seconds:

>clean
>++2.9.2 package

info] Packaging
/Users/abc/dev/sources/scala/kafka_test/contrib/hadoop-consumer/target/hadoop-consumer-0.8-SNAPSHOT.jar
...
[info] Done packaging.
[info] Done packaging.
[info] Packaging
/Users/snad/dev/sources/scala/kafka_test/perf/target/scala-2.9.2/kafka-perf_2.9.2-0.8-SNAPSHOT.jar
...
[info] Done packaging.
[success] Total time: 77 s, completed Jun 4, 2013 8:54:51 AM


On Mon, Jun 3, 2013 at 10:37 PM, Denny Lee  wrote:

> I have a 2011 MacBook Air (4GB, i7, 256GB SSD) and for development
> purposes it works like a charm. I'm only doing simple tests (ie no perf
> testing) and at least in my case, heat is pretty low. I have more issues
> with heat when I run Windows on Fusion ;-)
>
>
> On Jun 3, 2013, at 7:17 PM, S Ahmed  wrote:
>
> > So what are your specs then?
> >
> > So would you say it is actually enjoyable to develop on it?  Or its not
> > really that ideal compared to a desktop.
> > What about the heat?
> >
> >
> > On Mon, Jun 3, 2013 at 9:54 PM, Neha Narkhede  >wrote:
> >
> >> It really depends on your air specs. It takes roughly a minute to
> compile
> >> from scratch on my macbook air.
> >>
> >> Thanks,
> >> Neha
> >>
> >>
> >> On Mon, Jun 3, 2013 at 6:33 PM, S Ahmed  wrote:
> >>
> >>> Hi, Curious if anyone uses a macbook air to develop with kafka.
> >>>
> >>> How are compile times?  Does the fan go crazy after a while along with
> a
> >>> fairly hot keyboard?
> >>>
> >>> Just pondering getting an macbookair, and wanted your experiences.
> >>
>


Re: macbook air and kafka

2013-06-03 Thread S Ahmed
So what are your specs then?

So would you say it is actually enjoyable to develop on it?  Or its not
really that ideal compared to a desktop.
What about the heat?


On Mon, Jun 3, 2013 at 9:54 PM, Neha Narkhede wrote:

> It really depends on your air specs. It takes roughly a minute to compile
> from scratch on my macbook air.
>
> Thanks,
> Neha
>
>
> On Mon, Jun 3, 2013 at 6:33 PM, S Ahmed  wrote:
>
> > Hi, Curious if anyone uses a macbook air to develop with kafka.
> >
> > How are compile times?  Does the fan go crazy after a while along with a
> > fairly hot keyboard?
> >
> > Just pondering getting an macbookair, and wanted your experiences.
> >
>


RE: macbook air and kafka

2013-06-03 Thread S Ahmed
Hi, Curious if anyone uses a macbook air to develop with kafka.

How are compile times?  Does the fan go crazy after a while along with a
fairly hot keyboard?

Just pondering getting an macbookair, and wanted your experiences.


Re: Apache Kafka in AWS

2013-05-29 Thread S Ahmed
Is the code you used to benchmark open source by any chance?


On Tue, May 28, 2013 at 4:29 PM, Jason Weiss  wrote:

> Nope, sorry.
>
>
> ____
> From: S Ahmed [sahmed1...@gmail.com]
> Sent: Tuesday, May 28, 2013 15:47
> To: users@kafka.apache.org
> Subject: Re: Apache Kafka in AWS
>
> Curious if you tested with larger message sizes, like around 20-30kb (you
> mentioned 2kb).
>
> Any numbers on that size?
>
>
> On Thu, May 23, 2013 at 10:12 AM, Jason Weiss  >wrote:
>
> > Bummer.
> >
> > Yes, but it will be several days. I'll post back to the forum with a URL
> > once I'm done.
> >
> > Jason
> >
> >
> >
> > On 5/23/13 10:11 AM, "Jun Rao"  wrote:
> >
> > >Jason,
> > >
> > >Unfortunately, Apache mailing lists don't support attachments. Could you
> > >document your experience (with the graphs) in a blog (or a wiki page in
> > >Kafka)?
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >
> > >On Thu, May 23, 2013 at 2:00 AM, Jason Weiss 
> > >wrote:
> > >
> > >> Jun,
> > >>
> > >> Here is a screenshot from AWS's statistics (per-minute sampling is the
> > >> finest granularity I believe that they chart). I don't have a
> > >>screenshot of
> > >> the top output.
> > >>
> > >> This shows when I added a 4th machine to the cluster with the same
> > >>number
> > >> of clients, my CPU utilization fell- but remained constant. The
> > >>flatline is
> > >> pretty obvious in the extended 4 minute test-- it ramps up, flat
> lines,
> > >> then ramps down.
> > >>
> > >> Jason
> > >>
> > >> 
> > >> From: Jun Rao [jun...@gmail.com]
> > >> Sent: Thursday, May 23, 2013 00:17
> > >> To: users@kafka.apache.org
> > >> Subject: Re: Apache Kafka in AWS
> > >>
> > >> Jason,
> > >>
> > >> Thanks for sharing. This is very interesting. Normally, Kafka brokers
> > >>don't
> > >> use too much CPU. Are most of the 750% CPU actually used by Kafka
> > >>brokers?
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Wed, May 22, 2013 at 6:11 PM, Jason Weiss 
> > >> wrote:
> > >>
> > >> > >>Did you check that you were using all cores?
> > >> >
> > >> > top was reporting over 750%
> > >> >
> > >> > Jason
> > >> >
> > >> > 
> > >> > From: Ken Krugler [kkrugler_li...@transpac.com]
> > >> > Sent: Wednesday, May 22, 2013 20:59
> > >> > To: users@kafka.apache.org
> > >> > Subject: Re: Apache Kafka in AWS
> > >> >
> > >> > Hi Jason,
> > >> >
> > >> > On May 22, 2013, at 3:35pm, Jason Weiss wrote:
> > >> >
> > >> > > Ken,
> > >> > >
> > >> > > Great question! I should have indicated I was using EBS, 500GB
> with
> > >> 2000
> > >> > provisioned IOPs.
> > >> >
> > >> > OK, thanks. Sounds like you were pegged on CPU usage.
> > >> >
> > >> > But that does surprise me a bit. Did you check that you were using
> all
> > >> > cores?
> > >> >
> > >> > Thanks,
> > >> >
> > >> > -- Ken
> > >> >
> > >> > PS - back in 2006 I spent a week of hell debugging an occasion job
> > >> failure
> > >> > on Hadoop (this is when it was still part of Nutch). Turns out one
> of
> > >>our
> > >> > 12 slaves was accidentally using OpenJDK, and this had a JIT
> compiler
> > >>bug
> > >> > that would occasionally rear its ugly head. Obviously the Sun/Oracle
> > >>JRE
> > >> > isn't bug-free, but it gets a lot more stress testing. So one of my
> > >>basic
> > >> > guidelines in the ops portion of the Hadoop class I teach is that
> > >>every
> > >> > server must have exactly the same version of Oracle's JRE.
> > >> >
> > >> > > 
> > >> > 

Re: Apache Kafka in AWS

2013-05-28 Thread S Ahmed
Curious if you tested with larger message sizes, like around 20-30kb (you
mentioned 2kb).

Any numbers on that size?


On Thu, May 23, 2013 at 10:12 AM, Jason Weiss wrote:

> Bummer.
>
> Yes, but it will be several days. I'll post back to the forum with a URL
> once I'm done.
>
> Jason
>
>
>
> On 5/23/13 10:11 AM, "Jun Rao"  wrote:
>
> >Jason,
> >
> >Unfortunately, Apache mailing lists don't support attachments. Could you
> >document your experience (with the graphs) in a blog (or a wiki page in
> >Kafka)?
> >
> >Thanks,
> >
> >Jun
> >
> >
> >On Thu, May 23, 2013 at 2:00 AM, Jason Weiss 
> >wrote:
> >
> >> Jun,
> >>
> >> Here is a screenshot from AWS's statistics (per-minute sampling is the
> >> finest granularity I believe that they chart). I don't have a
> >>screenshot of
> >> the top output.
> >>
> >> This shows when I added a 4th machine to the cluster with the same
> >>number
> >> of clients, my CPU utilization fell- but remained constant. The
> >>flatline is
> >> pretty obvious in the extended 4 minute test-- it ramps up, flat lines,
> >> then ramps down.
> >>
> >> Jason
> >>
> >> 
> >> From: Jun Rao [jun...@gmail.com]
> >> Sent: Thursday, May 23, 2013 00:17
> >> To: users@kafka.apache.org
> >> Subject: Re: Apache Kafka in AWS
> >>
> >> Jason,
> >>
> >> Thanks for sharing. This is very interesting. Normally, Kafka brokers
> >>don't
> >> use too much CPU. Are most of the 750% CPU actually used by Kafka
> >>brokers?
> >>
> >> Jun
> >>
> >>
> >> On Wed, May 22, 2013 at 6:11 PM, Jason Weiss 
> >> wrote:
> >>
> >> > >>Did you check that you were using all cores?
> >> >
> >> > top was reporting over 750%
> >> >
> >> > Jason
> >> >
> >> > 
> >> > From: Ken Krugler [kkrugler_li...@transpac.com]
> >> > Sent: Wednesday, May 22, 2013 20:59
> >> > To: users@kafka.apache.org
> >> > Subject: Re: Apache Kafka in AWS
> >> >
> >> > Hi Jason,
> >> >
> >> > On May 22, 2013, at 3:35pm, Jason Weiss wrote:
> >> >
> >> > > Ken,
> >> > >
> >> > > Great question! I should have indicated I was using EBS, 500GB with
> >> 2000
> >> > provisioned IOPs.
> >> >
> >> > OK, thanks. Sounds like you were pegged on CPU usage.
> >> >
> >> > But that does surprise me a bit. Did you check that you were using all
> >> > cores?
> >> >
> >> > Thanks,
> >> >
> >> > -- Ken
> >> >
> >> > PS - back in 2006 I spent a week of hell debugging an occasion job
> >> failure
> >> > on Hadoop (this is when it was still part of Nutch). Turns out one of
> >>our
> >> > 12 slaves was accidentally using OpenJDK, and this had a JIT compiler
> >>bug
> >> > that would occasionally rear its ugly head. Obviously the Sun/Oracle
> >>JRE
> >> > isn't bug-free, but it gets a lot more stress testing. So one of my
> >>basic
> >> > guidelines in the ops portion of the Hadoop class I teach is that
> >>every
> >> > server must have exactly the same version of Oracle's JRE.
> >> >
> >> > > 
> >> > > From: Ken Krugler [kkrugler_li...@transpac.com]
> >> > > Sent: Wednesday, May 22, 2013 17:23
> >> > > To: users@kafka.apache.org
> >> > > Subject: Re: Apache Kafka in AWS
> >> > >
> >> > > Hi Jason,
> >> > >
> >> > > Thanks for the notes.
> >> > >
> >> > > I'm curious whether you went with using local drives (ephemeral
> >> storage)
> >> > or EBS, and if with EBS then what IOPS.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > -- Ken
> >> > >
> >> > > On May 22, 2013, at 1:42pm, Jason Weiss wrote:
> >> > >
> >> > >> All,
> >> > >>
> >> > >> I asked a number of questions of the group over the last week, and
> >>I'm
> >> > happy to report that I've had great success getting Kafka up and
> >>running
> >> in
> >> > AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory
> >> > Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to
> >> the
> >> > AWS specs. I have co-located Zookeeper instances next to Zafka on each
> >> > machine.
> >> > >>
> >> > >> I am able to publish in a repeatable fashion 273,000 events per
> >> second,
> >> > with each event payload consisting of a fixed size of 2048 bytes! This
> >> > represents the maximum throughput possible on this configuration, as
> >>the
> >> > servers became CPU constrained, averaging 97% utilization in a
> >>relatively
> >> > flat line. This isn't a "burst" speed ­ it represents a sustained
> >> > throughput from 20 M1 Large EC2 Kafka multi-threaded producers.
> >>Putting
> >> > this into perspective, if my log retention period was a month, I'd be
> >> > aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I
> >> > don't see us retaining data for more than a few hours!
> >> > >>
> >> > >> Here were the keys to tuning for future folks to consider:
> >> > >>
> >> > >> First and foremost, be sure to configure your Java heap size
> >> > accordingly when you launch Kafka. The default is like 512MB, which
> >>in my
> >> > case left virtually all of my RAM inaccessible to Kafka.
> >> 

is 0.8 stable?

2013-05-27 Thread S Ahmed
Hi,

Sorry if I missed the announcement, but is 0.8 stable/production worthy as
of yet?

Is anyone using it in the wild?


Re: Analysis of producer performance -- and Producer-Kafka reliability

2013-04-12 Thread S Ahmed
Interesting topic.

How would buffering in RAM help in reality though (just trying to work
through the scenerio in my head):

producer tries to connect to a broker, it fails, so it appends the message
to a in-memory store.  If the broker is down for say 20 minutes and then
comes back online, won't this create problems now when the producer creates
a new message, and it has 20 minutes of backlog, and the broker now is
handling more load (assuming you are sending those in-memory messages using
a different thread)?




On Fri, Apr 12, 2013 at 11:21 AM, Philip O'Toole  wrote:

> This is just my opinion of course (who else's could it be? :-)) but I think
> from an engineering point of view, one must spend one's time making the
> Producer-Kafka connection solid, if it is mission-critical.
>
> Kafka is all about getting messages to disk, and assuming your disks are
> solid (and 0.8 has replication) those messages are safe. To then try to
> build a system to cope with the Kafka brokers being unavailable seems like
> you're setting yourself for infinite regress. And to write code in the
> Producer to spool to disk seems even more pointless. If you're that
> worried, why not run a dedicated Kafka broker on the same node as the
> Producer, and connect over localhost? To turn around and write code to
> spool to disk, because the primary system that *spools to disk* is down
> seems to be missing the point.
>
> That said, even by going over local-host, I guess the network connection
> could go down. In that case, Producers should buffer in RAM, and start
> sending some major alerts to the Operations team. But this should almost
> *never happen*. If it is happening regularly *something is fundamentally
> wrong with your system design*. Those Producers should also refuse any more
> incoming traffic and await intervention. Even bringing up "netcat -l" and
> letting it suck in the data and write it to disk would work then.
> Alternatives include having Producers connect to a load-balancer with
> multiple Kafka brokers behind it, which helps you deal with any one Kafka
> broker failing. Or just have your Producers connect directly to multiple
> Kafka brokers, and switch over as needed if any one broker goes down.
>
> I don't know if the standard Kafka producer that ships with Kafka supports
> buffering in RAM in an emergency. We wrote our own that does, with a focus
> on speed and simplicity, but I expect it will very rarely, if ever, buffer
> in RAM.
>
> Building and using semi-reliable system after semi-reliable system, and
> chaining them all together, hoping to be more tolerant of failure is not
> necessarily a good approach. Instead, identifying that one system that is
> critical, and ensuring that it remains up (redundant installations,
> redundant disks, redundant network connections etc) is a better approach
> IMHO.
>
> Philip
>
>
> On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao  wrote:
>
> > Another way to handle this is to provision enough client and broker
> servers
> > so that the peak load can be handled without spooling.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski  > >wrote:
> >
> > > Jun,
> > >
> > > When talking about "catastrophic consequences" I was actually only
> > > referring to the producer side. in our use case (logging requests from
> > > webapp servers), a spike in traffic would force us to either tolerate a
> > > dramatic increase in the response time, or drop messages, both of which
> > are
> > > really undesirable. Hence the need to absorb spikes with some system on
> > top
> > > of Kafka, unless the spooling feature mentioned by Wing (
> > > https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This
> is
> > > assuming there are a lot more producer machines than broker nodes, so
> > each
> > > producer would absorb a small part of the extra load from the spike.
> > >
> > > Piotr
> > >
> > > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao  wrote:
> > >
> > > > Piotr,
> > > >
> > > > Actually, could you clarify what "catastrophic consequences" did you
> > see
> > > on
> > > > the broker side? Do clients timeout due to longer serving time or
> > > something
> > > > else?
> > > >
> > > > Going forward, we plan to add per client quotas (KAFKA-656) to
> prevent
> > > the
> > > > brokers from being overwhelmed by a runaway client.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > > otis_gospodne...@yahoo.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is there anything one can do to "defend" from:
> > > > >
> > > > > "Trying to push more data than the brokers can handle for any
> > sustained
> > > > > period of time has catastrophic consequences, regardless of what
> > > timeout
> > > > > settings are used. In our use case this means that we need to
> either
> > > > ensure
> > > > > we have spare capacity for spikes, or use something on top of Kafka
> > to
> > > > > absorb spik

Re: log.file.size limit?

2013-03-25 Thread S Ahmed
but it show's long not int?

Isn't it then Long.MAX_VALUE ?


On Mon, Mar 25, 2013 at 3:14 PM, David Arthur  wrote:

> FileChannel#map docs indicate the max size is Integer.MAX_VALUE, so yea 2gb
>
> http://docs.oracle.com/javase/**6/docs/api/java/nio/channels/**
> FileChannel.html#map(java.nio.**channels.FileChannel.MapMode<http://docs.oracle.com/javase/6/docs/api/java/nio/channels/FileChannel.html#map(java.nio.channels.FileChannel.MapMode>,
> long, long)
>
>
>
> On 3/25/13 2:42 PM, S Ahmed wrote:
>
>> Is there any limit to how large a log file can be?
>> I swear I read somewhere that java's memory mapped implementation is
>> limited to 2GB but I'm not sure.
>>
>>
>


log.file.size limit?

2013-03-25 Thread S Ahmed
Is there any limit to how large a log file can be?
I swear I read somewhere that java's memory mapped implementation is
limited to 2GB but I'm not sure.


Re: Anyone working on a Kafka book?

2013-03-19 Thread S Ahmed
I guess the challenge would be that kafka is still in version 0.8, so by
the time your book comes out they might be at version 1.0 i.e. its a moving
target

Sounds like a great idea though!


On Tue, Mar 19, 2013 at 12:20 PM, Jun Rao  wrote:

> Hi, David,
>
> At LinkedIn, committers are too busy to write a Kafka book right now. I
> think this is a good idea to pursue. So, if you want to do it, we'd be
> happy to help. The only request that I have for you is while writing the
> book, it would be good if you can use this opportunity to also help us
> improve the documentation of the site.
>
> Thanks,
>
> Jun
>
> On Tue, Mar 19, 2013 at 6:34 AM, David Arthur  wrote:
>
> > I was approached by a publisher the other day to do a book on Kafka -
> > something I've actually thought about pursuing. Before I say yes (or
> > consider saying yes), I wanted to make sure no one else was working on a
> > book. No sense in producing competing texts at this point.
> >
> > So, anyone working on a Kafka book? Self published or otherwise?
> >
> > -David
> >
> >
> >
>


Re: Consume from X messages ago

2013-03-19 Thread S Ahmed
I thought since the offsets in .8 are numeric and not byte offsets like in
0.7x, you can simply just take say the current offset - 1.


On Tue, Mar 19, 2013 at 12:16 PM, Neha Narkhede wrote:

> Jim,
>
> You can leverage the ExportZkOffsets/ImportZkOffsets tools to do this.
> ExportZkOffsets exports the consumer offsets for your group to a file in a
> certain format. You can then place the desired offset per partition you
> want to reset your consumer to in the exported file.
>
> 1. Shutdown the consumer
> 2. Export current offsets
> 3. Get the desired offset (current offset - 10K). As David mentions, this
> is approximate and might get you more than 10K messages if the data is
> compressed.
> 4. Replace the exported offsets with these offsets
> 5. Restart the consumer.
>
> HTH,
> Neha
>
>
> On Tue, Mar 19, 2013 at 8:49 AM, David Arthur  wrote:
>
> > This API is exposed through the SimpleConsumer scala class. See
> > https://github.com/apache/**kafka/blob/trunk/core/src/**
> > main/scala/kafka/consumer/**SimpleConsumer.scala#L60<
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/SimpleConsumer.scala#L60
> >
> >
> > You will need to set earliestOrLatest to -1 for the latest offset.
> >
> > There is also a command line tool https://github.com/apache/**
> >
> kafka/blob/trunk/core/src/**main/scala/kafka/tools/**GetOffsetShell.scala<
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/GetOffsetShell.scala
> >
> >
> > -David
> >
> >
> > On 3/19/13 11:25 AM, James Englert wrote:
> >
> >> I'm still a bit lost.  Where is the offsets API?  I.e. which class?
> >>
> >>
> >> On Tue, Mar 19, 2013 at 11:16 AM, David Arthur 
> wrote:
> >>
> >>  Using the Offsets API, you can get the latest offset by setting time to
> >>> -1. Then you subtract 1
> >>>
> >>> There is no guarantee that 10k prior messages exist of course, so you'd
> >>> need to handle that case.
> >>>
> >>> -David
> >>>
> >>>
> >>> On 3/19/13 11:04 AM, James Englert wrote:
> >>>
> >>>  Hi,
> 
>  I'm using Kafka 0.8.  I would like to setup a consumer to fetch the
> last
>  10,000 messages and then start consuming messages.
> 
>  I see the configuration autooffset.reset, but that isn't quite what I
>  want.  I want only the last 10,000 messages.
> 
>  Is there a good way to achieve this in 0.8, besides just hacking the
>  data
>  in ZK?
> 
>  Thanks,
>  Jim
> 
> 
> 
> >>
> >
>


Re: Kafka replication presentation at ApacheCon

2013-02-28 Thread S Ahmed
Excellent thanks!

BTW, in the slides, it shows the the message 'm3' is lost.

I guess the leader is the single point of failure then when a producer
sends a message, meaning it can never bypass the leader and write to the
followers in case of leader failure right?


On Thu, Feb 28, 2013 at 8:35 AM, Matan Safriel  wrote:

> Thanks Jun, great slides!
>
>
> On Wed, Feb 27, 2013 at 11:52 PM, Jun Rao  wrote:
>
> > Hi,
> >
> > I gave a talk on Kafka replication at ApacheCon yesterday. The slides can
> > be found at
> > http://www.slideshare.net/junrao/kafka-replication-apachecon2013(it's
> > also added to Kafka wiki). I will share the link to the recorded
> > video once it's available.
> >
> > Thanks,
> >
> > Jun
> >
>


Re: long running (continous) benchmark test

2013-02-06 Thread S Ahmed
Another reason is to test kafka will potentially larger payloads (10-100K)


On Wed, Feb 6, 2013 at 2:26 PM, S Ahmed  wrote:

> I plan on creating some sort of a long running test with kafka, curiuos if
> anyone has already done this and released the code.
>
> I want to setup some ec2 instances, and mimick what I think will be close
> to real life usage, and monitor how things work.  I want this to run for 24
> hours or more.
>
> I know there are system tests, but I believe those are short lived tests.
>
> It is always interesting to see how things work like if consumers fall
> behind, etc.
>


Re: Kafka 0.8 producer within Play Framework?

2013-02-05 Thread S Ahmed
Shouldn't your producer be at the controller scope?  Instantiating it every
time is probably nto the correct pattern.  You probably want to use a asych
producer also right?




On Mon, Feb 4, 2013 at 7:12 PM, charlie w  wrote:

> It seems the issue is related to Scala versions.  I grabbed from
> https://github.com/kevinwright/kafka
> and built with
> sbt "++2.10.0 package"
>
> I'm able to have my Play app produce messages for Kafka now.
>
>
> On Fri, Feb 1, 2013 at 5:32 PM, charlie w  wrote:
> > Is it possible to have a Kafka 0.8 producer inside a Play Framework
> > controller?  (I am new to both Kafka and Play.)
> >
> > I have managed to get Java code cribbed from the Kafka examples
> > (below) to compile and execute within Play, but constructing the
> > ProducerConfig object never returns.  I am not finding anything in any
> > Play log.
> >
> > I've also tried this using the Scala console producer as a model and
> > had the same result
> >
> > The Kafka console producer and consumer samples are working for me.
> >
> > Thanks
> > Charlie
> >
> > code:
> >
> >
> > package controllers;
> >
> > import java.util.Properties;
> >
> > import kafka.producer.ProducerConfig;
> > import kafka.javaapi.producer.Producer;
> > import kafka.producer.KeyedMessage;
> >
> > public class Hello
> > {
> >public static void hello( String v )
> >{
> >   Properties props = new Properties();
> >   props.put("zk.connect", "127.0.0.1:2181");
> >   props.put("serializer.class", "kafka.serializer.StringEncoder");
> >   props.put("broker.list", "localhost:9092" );
> >
> >   play.Logger.info("create config");
> >   ProducerConfig config = new ProducerConfig(props);
> >
> >   play.Logger.info("create producer");
> >   Producer producer = new Producer > String>(config);
> >
> >   play.Logger.info("yay!");
> >
> >   KeyedMessage msg = new KeyedMessage > String>("test-topic", "Greetings!");
> >
> >   producer.send( msg );
> >}
> > }
>


Re: S3 Archiving for Kafka topics (with Zookeeper resume)

2013-01-31 Thread S Ahmed
Great thanks.

BTW, that's not a daemon server is it?  Is it something you have to wrap to
run as a service/daemon?


On Thu, Jan 31, 2013 at 4:29 AM, Jibran Saithi wrote:

> Hey,
>
> I know this has come up a few times, so thought I'd share a bit of code
> we've been using to archive topics to S3.
>
> Particularly unimaginatively named, but is available here:
> https://github.com/jibs/kafka-s3-consumer
>
> We needed something with Zookeeper support for storing the offsets, but
> didn't come across anything so I quickly put this together. For the moment
> I've removed graphite stats reporting because it has a few internal
> dependencies, but plan to sort that out soon.
>
> Hope this helps,
> Jibran
>


Re: anecdotal uptime and service monitoring

2013-01-29 Thread S Ahmed
Jun,

Great list.   I'm haven't really setup monitoring before, so for starters,
what should I be researching in order to monitor those metrics, are they
exposed via those yammer metrics library that can be exported to a csv
file, or are these jmx related items?




On Fri, Dec 28, 2012 at 5:47 PM, Jun Rao  wrote:

> At LinkedIn, the most common failure of a Kafka broker is when we have to
> deploy new Kafka code/config. Otherwise, the broker can be up for a long
> time (e..g, months). It woud be good to monitor the following metrics at
> the broker: log flush time/rate, produce/fetch requests/messages rate, GC
> rate/time, network bandwidth utilization,  and disk space and I/O
> utilization. For the clients, it would be good to monitor message
> size/rate, request time/rate, dropped event rate (for async producers) and
> consumption lag (for consumers). For ZK, ideally, one should monitor ZK
> request latency and GCs.
>
> Thanks,
>
> Jun
>
> On Fri, Dec 28, 2012 at 7:27 AM, S Ahmed  wrote:
>
> > Curious what kind of uptime have you guys experienced using kafka?
> >
> > What sort of monitoring do you suggest should be in place for kafka?
> >
> > If the service crashes, does it usually make sense to have something like
> > upstart restart the service?
> >
> > There are allot of moving parts (hard drive space, zooker, producers,
> > consumers, etc.)
> >
> > Also if the consumers can't keep up with new messages...
> >
>


Re: Payload size exception

2013-01-29 Thread S Ahmed
Ok so it might be an issue somewhere in the pipeline (I'm guessing memory
issues?).

They are xml files, and that 30-100 was uncompressed.


On Tue, Jan 29, 2013 at 12:28 PM, Neha Narkhede wrote:

> > At linkedin, what is the largest payload size per message you guys have
> in
> > production?
> >
>
> Roughly 30K after compression, but that is fairly rare. Most messages are <
> 500 bytes after compression.
>
> Thanks,
> Neha
>


Re: Payload size exception

2013-01-29 Thread S Ahmed
Neha/Jay,

At linkedin, what is the largest payload size per message you guys have in
production?  My app might have like 20-100 kilobytes in size and I am
hoping to get an idea if others have large messages like this for any
production use case.


On Tue, Jan 29, 2013 at 11:35 AM, Neha Narkhede wrote:

> > In 0.7.x this
> > setting is controlled by the broker configuration max.message.size.
> >
>
> Actually, in 0.7.x this setting is controlled by max.message.size on the
> producer. In 0.8, we moved this setting to the broker.
>
> Thanks,
> Neha
>


Re: How do you keep track of offset in a partition

2013-01-28 Thread S Ahmed
Once you have an offset, is it possible to know how many messages there are
from that point to the end? (or least for the particular topic partition
that you are requested data from?).

The idea is to get an idea how far behind the consumers are from the # of
messages coming in etc.

I'm guessing the broker's dont' really know how many messages they are
currently storing?  Or is that what the index is for?




On Mon, Jan 28, 2013 at 8:27 PM, Neha Narkhede wrote:

> Jamie,
>
> You need to use the getOffsetsBefore() API to get the earliest/latest
> offset available on the broker for a particular partition.
>
> Thanks,
> Neha
>
>
> On Mon, Jan 28, 2013 at 5:05 PM, Jamie Wang 
> wrote:
>
> > Hi,
> >
> > We are using 0.72 version of Kafka on Windows. I am wondering what is the
> > right way to fetch data and keep track of offset in a partition. For
> > example, I am currently assuming the first message the producer sent to
> the
> > broker is at offset 0. So far it seems working. Is this correct
> assumption?
> >
> > Let' say 2 days later, the first 100 messages on the broker is discarded
> > because it passed retention.hours set in the config file. Now what is the
> > offset I should use to retrieve the first message in the partition?  And
> > let's also say the offset I had for the 80th message is now not valid.
> > What is the right way to get the correct offset to fetch in the consumer?
> >
> > What is the purpose of the api for getting a list of valid offsets for
> all
> > segments in a partition?
> >
> > Thanks in advance for your help.
> >
> > Jamie
> >
>


Re: are topics and partitions dynamic?

2013-01-27 Thread S Ahmed
So from what I understand, a single broker will contain all the messages
for a given topic (and its partitions).  Who decides which broker will own
a particular topic?


On Sun, Jan 27, 2013 at 3:36 PM, Neha Narkhede wrote:

> In Kafka 0.7, topics and partitions are dynamic in the sense that a
> partition is created when the broker receives the first message for that
> partition.
> In Kafka 0.8, there are 2 ways of creating a new topic -
> 1. Turn on auto.create.topics.enable option on the broker. When the broker
> receives the first message for a new topic, it creates that topic with
> num.partitions and default.replication.factor.
> 2. Use the admin command bin/kafka-create-topic.sh
>
> Thanks,
> Neha
>
>
> On Sun, Jan 27, 2013 at 12:23 PM, S Ahmed  wrote:
>
> > I'm looking at the kafka server.properties that is in /config folder and
> > didn't find any reference to topics or partitions.
> >
> > Are topics and optionally partitions dynamic in nature or do they have to
> > be defined before starting the broker? (and consequently producers can't
> > send to a topic and partition that isn't predefined).
> >
>


Re: first steps with the codebase

2012-12-12 Thread S Ahmed
Thanks, it looks like the test passed so it was as you suggested.

I couldn't find the location where the files were written too (I'm guessing
it cleans up after itself, but still wanted to know where folder it writes
too).


On Tue, Dec 11, 2012 at 6:45 PM, Jay Kreps  wrote:

> All of the unit tests start and stop all their dependencies. You shouldn't
> have to do anything prior to running ./sbt test. Did the test fail or did
> it just print an exception? The sbt test runner will print exceptions and
> other logging to the console. Some of the tests specifically invoke error
> conditions, and some do their cleanup by interrupting I/O both of which may
> produce exceptions. But provided they are just logged that is fine.
>
> -Jay
>
>
> On Tue, Dec 11, 2012 at 2:16 PM, S Ahmed  wrote:
>
> > And my question before that regarding what services do I need to start
> and
> > how do I do this?
> >
> > Please see my previous post, I wrote how I tried to run the unit test "
> > testAsyncSendCanCorrectlyFailWithTimeout"
> >
> >
> >
> >
> > On Tue, Dec 11, 2012 at 4:24 PM, Jain Rahul 
> wrote:
> >
> > > You can check in config/server.properties. By  default it writes in
> > > /tmp/kafka-logs/ .
> > >
> > > -Original Message-
> > > From: S Ahmed [mailto:sahmed1...@gmail.com]
> > > Sent: 12 December 2012 02:51
> > > To: users@kafka.apache.org
> > > Subject: Re: first steps with the codebase
> > >
> > > help anyone? :)
> > >
> > > Much much appreciated!
> > >
> > >
> > > On Tue, Dec 11, 2012 at 12:03 AM, S Ahmed 
> wrote:
> > >
> > > > BTW, where exactly will the broker be writing these messages?  Is it
> > > > in a /tmp folder?
> > > >
> > > >
> > > > On Tue, Dec 11, 2012 at 12:02 AM, S Ahmed 
> > wrote:
> > > >
> > > >> Neha,
> > > >>
> > > >> But what do I need to start before running the tests, I tried to run
> > > >> the test "testAsyncSendCanCorrectlyFailWithTimeout" but I got this:
> > > >>
> > > >> 2012-12-11 00:01:08,974] WARN EndOfStreamException: Unable to read
> > > >> additional data from client sessionid 0x13b8856456a0002, likely
> > > >> client has closed socket
> > > >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> > > >> [2012-12-11 00:01:11,231] WARN EndOfStreamException: Unable to read
> > > >> additional data from client sessionid 0x13b8856456a0001, likely
> > > >> client has closed socket
> > > >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> > > >> [2012-12-11 00:01:26,561] WARN EndOfStreamException: Unable to read
> > > >> additional data from client sessionid 0x13b8856456a0003, likely
> > > >> client has closed socket
> > > >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> > > >> [2012-12-11 00:01:26,563] WARN EndOfStreamException: Unable to read
> > > >> additional data from client sessionid 0x13b8856456a0004, likely
> > > >> client has closed socket
> > > >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> > > >> [2012-12-11 00:01:30,661] ERROR [TopicChangeListener on Controller
> 1]:
> > > >> Error while handling new topic
> > > >> (kafka.controller.PartitionStateMachine$TopicChangeListener:102)
> > > >> java.lang.NullPointerException
> > > >> at
> > > >>
> scala.collection.JavaConversions$JListWrapper.iterator(JavaConversion
> > > >> s.scala:524) at
> > > >> scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> > > >>  at
> > > >>
> scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions
> > > >> .scala:521)
> > > >> at
> > > >>
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala
> > > >> :176)
> > > >>  at
> > > >>
> scala.collection.JavaConversions$JListWrapper.foldLeft(JavaConversion
> > > >> s.scala:521)
> > > >> at
> > > >>
> scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.sca
> > > >> la:139)
> > > >>  at
> > > >>
> scala.collection.JavaConversions$JListWrapper.$div$colon(JavaConversi
> > > >> ons.scala:521) at
> > > >> scala.collection.generic.Addable

Re: first steps with the codebase

2012-12-11 Thread S Ahmed
And my question before that regarding what services do I need to start and
how do I do this?

Please see my previous post, I wrote how I tried to run the unit test "
testAsyncSendCanCorrectlyFailWithTimeout"




On Tue, Dec 11, 2012 at 4:24 PM, Jain Rahul  wrote:

> You can check in config/server.properties. By  default it writes in
> /tmp/kafka-logs/ .
>
> -Original Message-
> From: S Ahmed [mailto:sahmed1...@gmail.com]
> Sent: 12 December 2012 02:51
> To: users@kafka.apache.org
> Subject: Re: first steps with the codebase
>
> help anyone? :)
>
> Much much appreciated!
>
>
> On Tue, Dec 11, 2012 at 12:03 AM, S Ahmed  wrote:
>
> > BTW, where exactly will the broker be writing these messages?  Is it
> > in a /tmp folder?
> >
> >
> > On Tue, Dec 11, 2012 at 12:02 AM, S Ahmed  wrote:
> >
> >> Neha,
> >>
> >> But what do I need to start before running the tests, I tried to run
> >> the test "testAsyncSendCanCorrectlyFailWithTimeout" but I got this:
> >>
> >> 2012-12-11 00:01:08,974] WARN EndOfStreamException: Unable to read
> >> additional data from client sessionid 0x13b8856456a0002, likely
> >> client has closed socket
> >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> >> [2012-12-11 00:01:11,231] WARN EndOfStreamException: Unable to read
> >> additional data from client sessionid 0x13b8856456a0001, likely
> >> client has closed socket
> >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> >> [2012-12-11 00:01:26,561] WARN EndOfStreamException: Unable to read
> >> additional data from client sessionid 0x13b8856456a0003, likely
> >> client has closed socket
> >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> >> [2012-12-11 00:01:26,563] WARN EndOfStreamException: Unable to read
> >> additional data from client sessionid 0x13b8856456a0004, likely
> >> client has closed socket
> >> (org.apache.zookeeper.server.NIOServerCnxn:634)
> >> [2012-12-11 00:01:30,661] ERROR [TopicChangeListener on Controller 1]:
> >> Error while handling new topic
> >> (kafka.controller.PartitionStateMachine$TopicChangeListener:102)
> >> java.lang.NullPointerException
> >> at
> >> scala.collection.JavaConversions$JListWrapper.iterator(JavaConversion
> >> s.scala:524) at
> >> scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> >>  at
> >> scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions
> >> .scala:521)
> >> at
> >> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala
> >> :176)
> >>  at
> >> scala.collection.JavaConversions$JListWrapper.foldLeft(JavaConversion
> >> s.scala:521)
> >> at
> >> scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.sca
> >> la:139)
> >>  at
> >> scala.collection.JavaConversions$JListWrapper.$div$colon(JavaConversi
> >> ons.scala:521) at
> >> scala.collection.generic.Addable$class.$plus$plus(Addable.scala:54)
> >>  at scala.collection.immutable.Set$EmptySet$.$plus$plus(Set.scala:47)
> >> at
> >> scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:43
> >> 6)
> >>  at
> >> scala.collection.JavaConversions$JListWrapper.toSet(JavaConversions.s
> >> cala:521)
> >> at
> >> kafka.controller.PartitionStateMachine$TopicChangeListener.liftedTree
> >> 1$1(PartitionStateMachine.scala:337)
> >>  at
> >> kafka.controller.PartitionStateMachine$TopicChangeListener.handleChil
> >> dChange(PartitionStateMachine.scala:335)
> >> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
> >>  at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> >> Disconnected from the target VM, address: '127.0.0.1:64026', transport:
> >> 'socket'
> >>
> >>
> >>
> >>
> >> On Mon, Dec 10, 2012 at 11:54 PM, Neha Narkhede <
> neha.narkh...@gmail.com>wrote:
> >>
> >>> You can take a look at one of the producer tests and attach
> >>> breakpoints in the code. Ensure you pick the Debug Test instead of
> >>> Run Test option.
> >>>
> >>> Thanks,
> >>> Neha
> >>>
> >>> On Mon, Dec 10, 2012 at 7:31 PM, S Ahmed  wrote:
> >>> > Hi,
> >>> >
> >>> > So I followed the instructions from here:
> >>> > https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup

Re: first steps with the codebase

2012-12-11 Thread S Ahmed
help anyone? :)

Much much appreciated!


On Tue, Dec 11, 2012 at 12:03 AM, S Ahmed  wrote:

> BTW, where exactly will the broker be writing these messages?  Is it in a
> /tmp folder?
>
>
> On Tue, Dec 11, 2012 at 12:02 AM, S Ahmed  wrote:
>
>> Neha,
>>
>> But what do I need to start before running the tests, I tried to run the
>> test "testAsyncSendCanCorrectlyFailWithTimeout" but I got this:
>>
>> 2012-12-11 00:01:08,974] WARN EndOfStreamException: Unable to read
>> additional data from client sessionid 0x13b8856456a0002, likely client has
>> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
>> [2012-12-11 00:01:11,231] WARN EndOfStreamException: Unable to read
>> additional data from client sessionid 0x13b8856456a0001, likely client has
>> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
>> [2012-12-11 00:01:26,561] WARN EndOfStreamException: Unable to read
>> additional data from client sessionid 0x13b8856456a0003, likely client has
>> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
>> [2012-12-11 00:01:26,563] WARN EndOfStreamException: Unable to read
>> additional data from client sessionid 0x13b8856456a0004, likely client has
>> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
>> [2012-12-11 00:01:30,661] ERROR [TopicChangeListener on Controller 1]:
>> Error while handling new topic
>> (kafka.controller.PartitionStateMachine$TopicChangeListener:102)
>> java.lang.NullPointerException
>> at
>> scala.collection.JavaConversions$JListWrapper.iterator(JavaConversions.scala:524)
>> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
>>  at
>> scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:521)
>> at
>> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:176)
>>  at
>> scala.collection.JavaConversions$JListWrapper.foldLeft(JavaConversions.scala:521)
>> at
>> scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:139)
>>  at
>> scala.collection.JavaConversions$JListWrapper.$div$colon(JavaConversions.scala:521)
>> at scala.collection.generic.Addable$class.$plus$plus(Addable.scala:54)
>>  at scala.collection.immutable.Set$EmptySet$.$plus$plus(Set.scala:47)
>> at scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:436)
>>  at
>> scala.collection.JavaConversions$JListWrapper.toSet(JavaConversions.scala:521)
>> at
>> kafka.controller.PartitionStateMachine$TopicChangeListener.liftedTree1$1(PartitionStateMachine.scala:337)
>>  at
>> kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:335)
>> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
>>  at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
>> Disconnected from the target VM, address: '127.0.0.1:64026', transport:
>> 'socket'
>>
>>
>>
>>
>> On Mon, Dec 10, 2012 at 11:54 PM, Neha Narkhede 
>> wrote:
>>
>>> You can take a look at one of the producer tests and attach
>>> breakpoints in the code. Ensure you pick the Debug Test instead of Run
>>> Test option.
>>>
>>> Thanks,
>>> Neha
>>>
>>> On Mon, Dec 10, 2012 at 7:31 PM, S Ahmed  wrote:
>>> > Hi,
>>> >
>>> > So I followed the instructions from here:
>>> > https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup
>>> >
>>> > So I pulled down the latest from github, ran;
>>> > sbt
>>> >> update
>>> >>idea
>>> >
>>> > open it up in idea, and builds fine in idea also (version 12).
>>> >
>>> > Everything is fine so far.
>>> >
>>> > Questions:
>>> >
>>> > From just using the IDE, how can I start the neccessary services so I
>>> can
>>> > debug a producer call so I can trace the code line by line as it
>>> executes
>>> > to create a message, and then set a breakpoint on the kafka server
>>> side of
>>> > things to see how it goes about processing an inbound message.
>>> >
>>> > Is this possible, or is the general workflow first starting the
>>> services
>>> > using some .sh scripts?
>>> >
>>> > My goal here is to be able to set breakpoints on both the producer and
>>> > broker side of things.
>>> >
>>> > Much appreciated!
>>>
>>
>>
>


Re: first steps with the codebase

2012-12-10 Thread S Ahmed
BTW, where exactly will the broker be writing these messages?  Is it in a
/tmp folder?


On Tue, Dec 11, 2012 at 12:02 AM, S Ahmed  wrote:

> Neha,
>
> But what do I need to start before running the tests, I tried to run the
> test "testAsyncSendCanCorrectlyFailWithTimeout" but I got this:
>
> 2012-12-11 00:01:08,974] WARN EndOfStreamException: Unable to read
> additional data from client sessionid 0x13b8856456a0002, likely client has
> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
> [2012-12-11 00:01:11,231] WARN EndOfStreamException: Unable to read
> additional data from client sessionid 0x13b8856456a0001, likely client has
> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
> [2012-12-11 00:01:26,561] WARN EndOfStreamException: Unable to read
> additional data from client sessionid 0x13b8856456a0003, likely client has
> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
> [2012-12-11 00:01:26,563] WARN EndOfStreamException: Unable to read
> additional data from client sessionid 0x13b8856456a0004, likely client has
> closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
> [2012-12-11 00:01:30,661] ERROR [TopicChangeListener on Controller 1]:
> Error while handling new topic
> (kafka.controller.PartitionStateMachine$TopicChangeListener:102)
> java.lang.NullPointerException
> at
> scala.collection.JavaConversions$JListWrapper.iterator(JavaConversions.scala:524)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
>  at
> scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:521)
> at
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:176)
>  at
> scala.collection.JavaConversions$JListWrapper.foldLeft(JavaConversions.scala:521)
> at
> scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:139)
>  at
> scala.collection.JavaConversions$JListWrapper.$div$colon(JavaConversions.scala:521)
> at scala.collection.generic.Addable$class.$plus$plus(Addable.scala:54)
>  at scala.collection.immutable.Set$EmptySet$.$plus$plus(Set.scala:47)
> at scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:436)
>  at
> scala.collection.JavaConversions$JListWrapper.toSet(JavaConversions.scala:521)
> at
> kafka.controller.PartitionStateMachine$TopicChangeListener.liftedTree1$1(PartitionStateMachine.scala:337)
>  at
> kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:335)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
>  at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Disconnected from the target VM, address: '127.0.0.1:64026', transport:
> 'socket'
>
>
>
>
> On Mon, Dec 10, 2012 at 11:54 PM, Neha Narkhede 
> wrote:
>
>> You can take a look at one of the producer tests and attach
>> breakpoints in the code. Ensure you pick the Debug Test instead of Run
>> Test option.
>>
>> Thanks,
>> Neha
>>
>> On Mon, Dec 10, 2012 at 7:31 PM, S Ahmed  wrote:
>> > Hi,
>> >
>> > So I followed the instructions from here:
>> > https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup
>> >
>> > So I pulled down the latest from github, ran;
>> > sbt
>> >> update
>> >>idea
>> >
>> > open it up in idea, and builds fine in idea also (version 12).
>> >
>> > Everything is fine so far.
>> >
>> > Questions:
>> >
>> > From just using the IDE, how can I start the neccessary services so I
>> can
>> > debug a producer call so I can trace the code line by line as it
>> executes
>> > to create a message, and then set a breakpoint on the kafka server side
>> of
>> > things to see how it goes about processing an inbound message.
>> >
>> > Is this possible, or is the general workflow first starting the services
>> > using some .sh scripts?
>> >
>> > My goal here is to be able to set breakpoints on both the producer and
>> > broker side of things.
>> >
>> > Much appreciated!
>>
>
>


Re: first steps with the codebase

2012-12-10 Thread S Ahmed
Neha,

But what do I need to start before running the tests, I tried to run the
test "testAsyncSendCanCorrectlyFailWithTimeout" but I got this:

2012-12-11 00:01:08,974] WARN EndOfStreamException: Unable to read
additional data from client sessionid 0x13b8856456a0002, likely client has
closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
[2012-12-11 00:01:11,231] WARN EndOfStreamException: Unable to read
additional data from client sessionid 0x13b8856456a0001, likely client has
closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
[2012-12-11 00:01:26,561] WARN EndOfStreamException: Unable to read
additional data from client sessionid 0x13b8856456a0003, likely client has
closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
[2012-12-11 00:01:26,563] WARN EndOfStreamException: Unable to read
additional data from client sessionid 0x13b8856456a0004, likely client has
closed socket (org.apache.zookeeper.server.NIOServerCnxn:634)
[2012-12-11 00:01:30,661] ERROR [TopicChangeListener on Controller 1]:
Error while handling new topic
(kafka.controller.PartitionStateMachine$TopicChangeListener:102)
java.lang.NullPointerException
at
scala.collection.JavaConversions$JListWrapper.iterator(JavaConversions.scala:524)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
 at
scala.collection.JavaConversions$JListWrapper.foreach(JavaConversions.scala:521)
at
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:176)
 at
scala.collection.JavaConversions$JListWrapper.foldLeft(JavaConversions.scala:521)
at
scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:139)
 at
scala.collection.JavaConversions$JListWrapper.$div$colon(JavaConversions.scala:521)
at scala.collection.generic.Addable$class.$plus$plus(Addable.scala:54)
 at scala.collection.immutable.Set$EmptySet$.$plus$plus(Set.scala:47)
at scala.collection.TraversableOnce$class.toSet(TraversableOnce.scala:436)
 at
scala.collection.JavaConversions$JListWrapper.toSet(JavaConversions.scala:521)
at
kafka.controller.PartitionStateMachine$TopicChangeListener.liftedTree1$1(PartitionStateMachine.scala:337)
 at
kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:335)
at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:570)
 at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
Disconnected from the target VM, address: '127.0.0.1:64026', transport:
'socket'




On Mon, Dec 10, 2012 at 11:54 PM, Neha Narkhede wrote:

> You can take a look at one of the producer tests and attach
> breakpoints in the code. Ensure you pick the Debug Test instead of Run
> Test option.
>
> Thanks,
> Neha
>
> On Mon, Dec 10, 2012 at 7:31 PM, S Ahmed  wrote:
> > Hi,
> >
> > So I followed the instructions from here:
> > https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup
> >
> > So I pulled down the latest from github, ran;
> > sbt
> >> update
> >>idea
> >
> > open it up in idea, and builds fine in idea also (version 12).
> >
> > Everything is fine so far.
> >
> > Questions:
> >
> > From just using the IDE, how can I start the neccessary services so I can
> > debug a producer call so I can trace the code line by line as it executes
> > to create a message, and then set a breakpoint on the kafka server side
> of
> > things to see how it goes about processing an inbound message.
> >
> > Is this possible, or is the general workflow first starting the services
> > using some .sh scripts?
> >
> > My goal here is to be able to set breakpoints on both the producer and
> > broker side of things.
> >
> > Much appreciated!
>


Re: tracking page views at linkedin

2012-12-10 Thread S Ahmed
Ok just looking at the code, seems like you could even create a new
implementation and somehow rollout the page views potentially (if that is
possible in the use case) before sending them over the wire.

e.g. maybe you can just increment the couter to 2 instead of sending 2 line
items.

The key is to also figure out what size or time to queue before pushing
them to kafka.  For something like a page view, and other request
information like browser, timestamp, querystring values, you could probably
store a few hundred?


On Sun, Dec 9, 2012 at 6:21 PM, Jay Kreps  wrote:

> Yes this is how it works. We do not log out to disk on the web service
> machines, rather we use the async setting in the kafka producer from the
> app and it directly sends all tracking and monitoring data to the kafka
> cluster.
>
>
> On Sun, Dec 9, 2012 at 12:47 PM, S Ahmed  wrote:
>
> > I was reading (or watching) how linkedin uses kafka to track page views.
> >
> > I'm trying to imagine this in practise, where linkedin probably has 100's
> > of web servers serving requests, and each server is making a put call to
> > kafka to track a single page view.
> >
> > is this really the case?  Or does some other service roll up the web
> > servers log files and then push it to kafka on a batch basis?
> >
> > Interesting stuff!
> >
>


Re: steps to mitigate hardware failure in a replicate

2012-11-30 Thread S Ahmed
Yes, I'm referring to increase the # of replicas (new broker id)

Thanks, i'll look into that.


On Fri, Nov 30, 2012 at 2:59 PM, Joel Koshy  wrote:

> Node replacement does not require anything special - i.e., as long as it
> has the same broker id, just bringing it back into the cluster would
> suffice, and it should catch up on partitions that are assigned to it. Not
> sure what you mean by "increase a replica" - or you mean increase the
> number of replicas for a partition. That will require an admin tool - (see
> KAFKA-347).
>
> Thanks,
>
> Joel
>
>
>
> On Fri, Nov 30, 2012 at 11:22 AM, S Ahmed  wrote:
>
> > Hello,
> >
> > I am watching the video of your meetup, where Jun is going over the new
> .8
> > replica feature.
> >
> > Since the replica's for a master broker are in zookeeper, when there is a
> > hardware crash it will somehow (?) notice it is down and then ignore that
> > node.  When it comes back up, it will first get up to speed and then
> become
> > an active replica.
> >
> > If we replace the node, or increase a replica, do we have to restart all
> > brockers or simply update zooker?
> >
>


Re: consumer read process

2012-11-29 Thread S Ahmed
testing new list


On Wed, Nov 28, 2012 at 10:44 AM, Jun Rao  wrote:

> You can find the information at
> http://incubator.apache.org/kafka/design.html
>
> Look for consumer registration algorithm and consumer rebalancing
> algorithm.
>
> Thanks,
>
> Jun
>
>
>
> On Wed, Nov 28, 2012 at 7:13 AM, S Ahmed  wrote:
>
> > Can someone go over how a consumer goes about reading from a broker?
> >
> > example:
> >
> > 1. connect to zookeeper, get information on the broker that corresponds
> to
> > the topic/partition
> >
> > Then what does it do, does it register which message_id it is picking up,
> > or a group of them?
> >
>


Re: consumer read process

2012-11-28 Thread S Ahmed
If you read from offset x last, what information can you get regarding how
many messages are left to process?


On Wed, Nov 28, 2012 at 10:13 AM, S Ahmed  wrote:

> Can someone go over how a consumer goes about reading from a broker?
>
> example:
>
> 1. connect to zookeeper, get information on the broker that corresponds to
> the topic/partition
>
> Then what does it do, does it register which message_id it is picking up,
> or a group of them?
>