Re: Unique users per calendar month using kafka streams

2019-11-21 Thread claude.war...@wipro.com.INVALID
A different approach would be to integrate the Apache DataSketches  
(https://datasketches.apache.org/) which have mathematical proofs behind them.  
Using a DataSketch you can capture unique members for any given time period in 
a very small data object and be able to aggregate them (even though unique 
counts are not in and of themselves aggregateable).  For example you could take 
the monthly measures and calculate the unique users per quarter or for the 
entire year very quickly.  Generally orders of magnitude faster.


From: Bruno Cadonna 
Sent: Thursday, November 21, 2019 11:37
To: Users 
Subject: Re: Unique users per calendar month using kafka streams

** This mail has been sent from an external source. Treat hyperlinks and 
attachments in this email with caution**

Hi Chintan,

You cannot specify time windows based on a calendar object like months.

In the following, I suppose the keys of your records are user IDs. You
could extract the months from the timestamps of the events and add
them to the key of your records. Then you can group the records by key
and count them. Be aware that your state that stores the counts will
grow indefinitely and therefore you need to take care how to remove
counts you do not need anymore from your local state.

Take a look at the following example of how to deduplicate records

https://clicktime.symantec.com/3E6BmtgzXaCnuSmDcxKqdKD7Vc?u=https%3A%2F%2Fgithub.com%2Fconfluentinc%2Fkafka-streams-examples%2Fblob%2F5.3.1-post%2Fsrc%2Ftest%2Fjava%2Fio%2Fconfluent%2Fexamples%2Fstreams%2FEventDeduplicationLambdaIntegrationTest.java

It shows how to avoid indefinite growing of local store in such cases.
Try to adapt it to solve your problem by extending the key with the
month and computing the count instead of looking for duplicates.

Best,
Bruno

On Thu, Nov 21, 2019 at 10:28 AM chintan mavawala
 wrote:
>
> Hi,
>
> We have a use case to capture number of unique users per month. We planned
> to use windowing concept for this.
>
> For example, group events from input topic by user name and later sub group
> them based on time window. However i don't see how i can sub group the
> results based on particular month, say January. The only way is sub group
> based on time.
>
> Any pointers would be appreciated.
>
> Regards,
> Chintan
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com


Re: RecordTooLargeException on 16M messages in Kafka?

2019-08-15 Thread claude.war...@wipro.com.INVALID
I had a similar problem before.  I could find no way for the producer to 
determine the smallest maximum buffer size of the servers.

We had an issue where we had very large items that we wanted to send through 
kafka, basically to tunnel through a firewall.

We utilized an open source project that I have to allow a logical buffer to 
span multiple smaller buffers[1] [2]

We were then able to split the large file into multiple messages and reassemble 
them on the remote side only reading from Kafka when the data in the specific 
buffer was needed.

We still do not have an answer for figuring out, from the producer, what the 
smalled maximum buffer size of the servers is, so we just picked a suitably 
small number to make it work.

Claude

[1] https://github.com/Claudenw/spanbuffer
[2] Kafka specific code is not included in the spanbuffer project but it is 
fairly simple to implement.  I have plans to do so in the future.

From: l vic 
Sent: Thursday, August 15, 2019 13:48
To: users@kafka.apache.org 
Subject: Re: RecordTooLargeException on 16M messages in Kafka?

** This mail has been sent from an external source. Treat hyperlinks and 
attachments in this email with caution**

I tested it with kafka-console-consumer and kafka-console-producer reading
from 16M text file (no newlines):

kafka-console-producer.sh --broker-list :6667   --topic test <
./large-file

The error comes out on producer side:

org.apache.kafka.common.errors.RecordTooLargeException: The message is
16777239 bytes when serialized which is larger than the maximum request
size you have configured with the max.request.size configuration.


On Thu, Aug 15, 2019 at 4:49 AM Jonathan Santilli <
jonathansanti...@gmail.com> wrote:

> Hello, try to send and flush just one message of 16777239 bytes, to verify
> the error still shows up.
>
> Cheers!
> --
> Jonathan
>
>
>
> On Thu, Aug 15, 2019 at 2:23 AM l vic  wrote:
>
> > My kafka (1.0.0) producer errors out on  large (16M) messages.
> > ERROR Error when sending message to topic test with key: null, value:
> > 16777239 bytes with error: (org.apache.kafka.clients.producer.internals.
> > ErrorLoggingCallback)
> >
> > org.apache.kafka.common.errors.RecordTooLargeException: The message is
> > 16777327 bytes when serialized which is larger than the maximum request
> > size you have configured with the max.request.size configuration.
> > I found couple of links describing the solution:
> > *
> >
> https://clicktime.symantec.com/3RuErYUTQJAnPJRNn8RUTiW7Vc?u=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F21020347%2Fhow-can-i-send-large-messages-with-kafka-over-15mb
> > <
> >
> https://clicktime.symantec.com/3RuErYUTQJAnPJRNn8RUTiW7Vc?u=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F21020347%2Fhow-can-i-send-large-messages-with-kafka-over-15mb
> > >*
> >
> > in my server.properties on brokers I set:
> > socket.request.max.bytes=104857600
> > message.max.bytes=18874368
> > max.request.size=18874368
> > replica.fetch.max.bytes=18874368
> > fetch.message.max.bytes=18874368
> >
> > Then in my producer.properties i tried to set
> > max.request.size=18874368
> >
> > But no matter how i large i try to set max.request.size -
> > i still have the same problem...Are there other settings i am missing?
> > Can it be solved in configuration alone, or do i need to make code
> changes?
> > Thank you,
> >
>
>
> --
> Santilli Jonathan
>
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com