Most sites wouldn't be able to pre-aggregate--you need to report things
like uniques that require tracking and storing line items.

We use 30 seconds or 200 events as our threshold, though there is nothing
particularly good or bad about those settings.

-Jay


On Mon, Dec 10, 2012 at 10:51 AM, S Ahmed <sahmed1...@gmail.com> wrote:

> Ok just looking at the code, seems like you could even create a new
> implementation and somehow rollout the page views potentially (if that is
> possible in the use case) before sending them over the wire.
>
> e.g. maybe you can just increment the couter to 2 instead of sending 2 line
> items.
>
> The key is to also figure out what size or time to queue before pushing
> them to kafka.  For something like a page view, and other request
> information like browser, timestamp, querystring values, you could probably
> store a few hundred?
>
>
> On Sun, Dec 9, 2012 at 6:21 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Yes this is how it works. We do not log out to disk on the web service
> > machines, rather we use the async setting in the kafka producer from the
> > app and it directly sends all tracking and monitoring data to the kafka
> > cluster.
> >
> >
> > On Sun, Dec 9, 2012 at 12:47 PM, S Ahmed <sahmed1...@gmail.com> wrote:
> >
> > > I was reading (or watching) how linkedin uses kafka to track page
> views.
> > >
> > > I'm trying to imagine this in practise, where linkedin probably has
> 100's
> > > of web servers serving requests, and each server is making a put call
> to
> > > kafka to track a single page view.
> > >
> > > is this really the case?  Or does some other service roll up the web
> > > servers log files and then push it to kafka on a batch basis?
> > >
> > > Interesting stuff!
> > >
> >
>

Reply via email to