Re: Bemchmarks for KTable Joins and Queries

2018-11-04 Thread Tom Szumowski
Wow thank you for the great details. It is very helpful.

We may shoot to try out some benchmarks in the next few weeks. I'll report
back if we find anything notable.

Thanks again!

On Nov 4, 2018 12:53, "Matthias J. Sax"  wrote:

Thanks for the details.

Don't think there is a answer with numbers. You would need to benchmark
this yourself. However, some details:

 - If you query a local store using a key-lookup (ie, point query) is
should be sub-millisecond latency for a high percentile. The read is
either served from the Kafka Streams store cache, or RocksDBs memcache,
or worst case disk. Thus, you can tune it with increases caches and it
depends on your storage hardware (eg, HDD vs SSD). Because of those
factors, it's hard to give a better answer.

 - If you do local range-quries, those are slower (I assume 10x
difference). But again, you would need to benchmark to get more details
as it depends on your hardware and configuration.

 - For querying remote stores: Kafka Streams only provides the basic
building blocks, and it's your application that implement the actual
routing and network communication. Thus, it depends mainly on your own
code. All metadata about all stores is exchanged in every rebalance, and
thus, for each instance it's either a local lookup, or one network hop
to the hosting instance of a key (it's always one hop for 5, 10, or 100)
instances (but it's your own code the implements this) for point-queries.

 - For general range queries, you might need to query all instances and
assemble the result, because due to hash-partitioning, a key-range can
be distributed over all instances.

 - For GlobalKTables/GlobalStores you always have a full copy of the
state (it's not sharded as in regular KTable/stores) and thus, you
always do a local point/range query. Not need to query any remote store.


Hope this helps.

-Matthias


On 11/3/18 4:53 PM, Tom Szumowski wrote:
> Hi Matthias.
>
> My apologies for my ambiguous descriptions. I responded in-line to your
> questions below. Thank you for taking the time to understand my question.
>
> On Fri, Nov 2, 2018 at 1:28 PM Matthias J. Sax 
> wrote:
>
 At a high level, KTables provide a capability to query for data.
>>
>> Can you elaborate? Do you refer to "Interactive Queries" feature?
>>
>
> *Yes that's correct. I'm referring to Interactive queries as described
here
> <
https://kafka.apache.org/20/documentation/streams/developer-guide/interactive-queries.html
>.
> This description
> <
https://docs.confluent.io/current/streams/concepts.html#interactive-queries>

> covers
> several use cases for interactive queries including: real-time threat
> monitoring, video gaming, risk and fraud, and trend detection. Let's
> consider the video gaming use case for example. It states, "A mobile
> companion app can then directly query the Kafka Streams application to
show
> the current location of a player to friends and family, and invite them to
> come along". In that scenario, one may be very interested in knowing the
> throughput and latency (independent of network delays) for executing that
> Interactive Query on the mobile companion app in order to get locations.*

>
>
>>
 And I imagine latency/throughout of KTable queries depend on the number
>> of
 consumers the query would have to touch to complete.
>
>
>>
>> * Similar here. I am not sure if I can follow.*
>
>
> In that same article
> <
https://kafka.apache.org/20/documentation/streams/developer-guide/interactive-queries.html
>

> above,
> it refers to local state stores and remote state stores. I'm interested in
> understanding latencies when querying a remote state store
> <
https://kafka.apache.org/20/documentation/streams/developer-guide/interactive-queries.html#querying-remote-state-stores-for-the-entire-app
>.

> The diagram in that link shows three instances. What if there are 100
> instances? How do the latencies in a query change as the number of
> instances increase (if at all)? What kind of latencies in that
> configuration would we expect? And how would it compare to an alternate
> configuration that utilizes a GlobalKTable
> ?
>
> *I hope that helps clarify.*
>
> *-Tom*

>
>
>>
>> -Matthias
>>
>> On 11/2/18 4:27 AM, Tom Szumowski wrote:
>>> Thank you for the clarification. I understand they are fundamentally
>>> different underneath than a relational database, and may not be fair to
>>> compare directly.
>>>
>>> But how about benchmarks that aren't a comparison from other databases?
>>>
>>> At a high level, KTables provide a capability to query for data. Suppose
>> I
>>> have a requirement to fetch thr data in less than X milliseconds. It
>> seems
>>> fair to want to understand if a KTable can satisfy that capability under
>>> some configuration, or if I need to seek alternate solutions.
>>>
>>> And I imagine latency/throughout of KTable queries depend on the number
>> of
>>> consumers the query would have to 

Re: Bemchmarks for KTable Joins and Queries

2018-11-04 Thread Matthias J. Sax
Thanks for the details.

Don't think there is a answer with numbers. You would need to benchmark
this yourself. However, some details:

 - If you query a local store using a key-lookup (ie, point query) is
should be sub-millisecond latency for a high percentile. The read is
either served from the Kafka Streams store cache, or RocksDBs memcache,
or worst case disk. Thus, you can tune it with increases caches and it
depends on your storage hardware (eg, HDD vs SSD). Because of those
factors, it's hard to give a better answer.

 - If you do local range-quries, those are slower (I assume 10x
difference). But again, you would need to benchmark to get more details
as it depends on your hardware and configuration.

 - For querying remote stores: Kafka Streams only provides the basic
building blocks, and it's your application that implement the actual
routing and network communication. Thus, it depends mainly on your own
code. All metadata about all stores is exchanged in every rebalance, and
thus, for each instance it's either a local lookup, or one network hop
to the hosting instance of a key (it's always one hop for 5, 10, or 100)
instances (but it's your own code the implements this) for point-queries.

 - For general range queries, you might need to query all instances and
assemble the result, because due to hash-partitioning, a key-range can
be distributed over all instances.

 - For GlobalKTables/GlobalStores you always have a full copy of the
state (it's not sharded as in regular KTable/stores) and thus, you
always do a local point/range query. Not need to query any remote store.


Hope this helps.

-Matthias

On 11/3/18 4:53 PM, Tom Szumowski wrote:
> Hi Matthias.
> 
> My apologies for my ambiguous descriptions. I responded in-line to your
> questions below. Thank you for taking the time to understand my question.
> 
> On Fri, Nov 2, 2018 at 1:28 PM Matthias J. Sax 
> wrote:
> 
 At a high level, KTables provide a capability to query for data.
>>
>> Can you elaborate? Do you refer to "Interactive Queries" feature?
>>
> 
> *Yes that's correct. I'm referring to Interactive queries as described here
> .
> This description
> 
> covers
> several use cases for interactive queries including: real-time threat
> monitoring, video gaming, risk and fraud, and trend detection. Let's
> consider the video gaming use case for example. It states, "A mobile
> companion app can then directly query the Kafka Streams application to show
> the current location of a player to friends and family, and invite them to
> come along". In that scenario, one may be very interested in knowing the
> throughput and latency (independent of network delays) for executing that
> Interactive Query on the mobile companion app in order to get locations.*
> 
> 
>>
 And I imagine latency/throughout of KTable queries depend on the number
>> of
 consumers the query would have to touch to complete.
> 
> 
>>
>> * Similar here. I am not sure if I can follow.*
> 
> 
> In that same article
> 
> above,
> it refers to local state stores and remote state stores. I'm interested in
> understanding latencies when querying a remote state store
> .
> The diagram in that link shows three instances. What if there are 100
> instances? How do the latencies in a query change as the number of
> instances increase (if at all)? What kind of latencies in that
> configuration would we expect? And how would it compare to an alternate
> configuration that utilizes a GlobalKTable
> ?
> 
> *I hope that helps clarify.*
> 
> *-Tom*
> 
> 
>>
>> -Matthias
>>
>> On 11/2/18 4:27 AM, Tom Szumowski wrote:
>>> Thank you for the clarification. I understand they are fundamentally
>>> different underneath than a relational database, and may not be fair to
>>> compare directly.
>>>
>>> But how about benchmarks that aren't a comparison from other databases?
>>>
>>> At a high level, KTables provide a capability to query for data. Suppose
>> I
>>> have a requirement to fetch thr data in less than X milliseconds. It
>> seems
>>> fair to want to understand if a KTable can satisfy that capability under
>>> some configuration, or if I need to seek alternate solutions.
>>>
>>> And I imagine latency/throughout of KTable queries depend on the number
>> of
>>> consumers the query would have to touch to complete. For example, in
>>> complex joins or filters. Or perhaps a trade-off with GlobalKTables...
>>>
>>> That's the kind of information in terms of benchmarks I'd be interested
>> in
>>> knowing exists or not.
>>>
>>> 

Re: Bemchmarks for KTable Joins and Queries

2018-11-03 Thread Tom Szumowski
Hi Matthias.

My apologies for my ambiguous descriptions. I responded in-line to your
questions below. Thank you for taking the time to understand my question.

On Fri, Nov 2, 2018 at 1:28 PM Matthias J. Sax 
wrote:

> >> At a high level, KTables provide a capability to query for data.
>
> Can you elaborate? Do you refer to "Interactive Queries" feature?
>

*Yes that's correct. I'm referring to Interactive queries as described here
.
This description

covers
several use cases for interactive queries including: real-time threat
monitoring, video gaming, risk and fraud, and trend detection. Let's
consider the video gaming use case for example. It states, "A mobile
companion app can then directly query the Kafka Streams application to show
the current location of a player to friends and family, and invite them to
come along". In that scenario, one may be very interested in knowing the
throughput and latency (independent of network delays) for executing that
Interactive Query on the mobile companion app in order to get locations.*


>
> >> And I imagine latency/throughout of KTable queries depend on the number
> of
> >> consumers the query would have to touch to complete.


>
> * Similar here. I am not sure if I can follow.*


In that same article

above,
it refers to local state stores and remote state stores. I'm interested in
understanding latencies when querying a remote state store
.
The diagram in that link shows three instances. What if there are 100
instances? How do the latencies in a query change as the number of
instances increase (if at all)? What kind of latencies in that
configuration would we expect? And how would it compare to an alternate
configuration that utilizes a GlobalKTable
?

*I hope that helps clarify.*

*-Tom*


>
> -Matthias
>
> On 11/2/18 4:27 AM, Tom Szumowski wrote:
> > Thank you for the clarification. I understand they are fundamentally
> > different underneath than a relational database, and may not be fair to
> > compare directly.
> >
> > But how about benchmarks that aren't a comparison from other databases?
> >
> > At a high level, KTables provide a capability to query for data. Suppose
> I
> > have a requirement to fetch thr data in less than X milliseconds. It
> seems
> > fair to want to understand if a KTable can satisfy that capability under
> > some configuration, or if I need to seek alternate solutions.
> >
> > And I imagine latency/throughout of KTable queries depend on the number
> of
> > consumers the query would have to touch to complete. For example, in
> > complex joins or filters. Or perhaps a trade-off with GlobalKTables...
> >
> > That's the kind of information in terms of benchmarks I'd be interested
> in
> > knowing exists or not.
> >
> > Thank you,
> >
> > Tom
> >
> > On Thu, Nov 1, 2018, 16:24 Matthias J. Sax  >
> >> I am not aware if benchmarks, but want to point out, that KTables work
> >> somewhat different to relational database system. Thus, you might want
> >> to evaluate not base on performance, but on the semantics KTable
> provide.
> >>
> >> Recall, that Kafka Streams is a stream processing library while a
> >> database system is a "batch processing" system. It's two quite different
> >> types of systems and benchmarking them to compare each other is
> >> questionable.
> >>
> >>
> >> -Matthias
> >>
> >> On 10/31/18 12:18 PM, Tom Szumowski wrote:
> >>> I was wonderinf if anyone had or knew of benchmark tests for KTable or
> >>> GlobalKTable queries/joins, as compared to alternatives such as
> >> distributed
> >>> databases.
> >>>
> >>
> >>
> >
>
>


Re: Bemchmarks for KTable Joins and Queries

2018-11-02 Thread Matthias J. Sax
>> At a high level, KTables provide a capability to query for data.

Can you elaborate? Do you refer to "Interactive Queries" feature?

>> And I imagine latency/throughout of KTable queries depend on the number of
>> consumers the query would have to touch to complete. 

Similar here. I am not sure if I can follow.


-Matthias

On 11/2/18 4:27 AM, Tom Szumowski wrote:
> Thank you for the clarification. I understand they are fundamentally
> different underneath than a relational database, and may not be fair to
> compare directly.
> 
> But how about benchmarks that aren't a comparison from other databases?
> 
> At a high level, KTables provide a capability to query for data. Suppose I
> have a requirement to fetch thr data in less than X milliseconds. It seems
> fair to want to understand if a KTable can satisfy that capability under
> some configuration, or if I need to seek alternate solutions.
> 
> And I imagine latency/throughout of KTable queries depend on the number of
> consumers the query would have to touch to complete. For example, in
> complex joins or filters. Or perhaps a trade-off with GlobalKTables...
> 
> That's the kind of information in terms of benchmarks I'd be interested in
> knowing exists or not.
> 
> Thank you,
> 
> Tom
> 
> On Thu, Nov 1, 2018, 16:24 Matthias J. Sax  
>> I am not aware if benchmarks, but want to point out, that KTables work
>> somewhat different to relational database system. Thus, you might want
>> to evaluate not base on performance, but on the semantics KTable provide.
>>
>> Recall, that Kafka Streams is a stream processing library while a
>> database system is a "batch processing" system. It's two quite different
>> types of systems and benchmarking them to compare each other is
>> questionable.
>>
>>
>> -Matthias
>>
>> On 10/31/18 12:18 PM, Tom Szumowski wrote:
>>> I was wonderinf if anyone had or knew of benchmark tests for KTable or
>>> GlobalKTable queries/joins, as compared to alternatives such as
>> distributed
>>> databases.
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Bemchmarks for KTable Joins and Queries

2018-11-02 Thread Tom Szumowski
Thank you for the clarification. I understand they are fundamentally
different underneath than a relational database, and may not be fair to
compare directly.

But how about benchmarks that aren't a comparison from other databases?

At a high level, KTables provide a capability to query for data. Suppose I
have a requirement to fetch thr data in less than X milliseconds. It seems
fair to want to understand if a KTable can satisfy that capability under
some configuration, or if I need to seek alternate solutions.

And I imagine latency/throughout of KTable queries depend on the number of
consumers the query would have to touch to complete. For example, in
complex joins or filters. Or perhaps a trade-off with GlobalKTables...

That's the kind of information in terms of benchmarks I'd be interested in
knowing exists or not.

Thank you,

Tom

On Thu, Nov 1, 2018, 16:24 Matthias J. Sax  I am not aware if benchmarks, but want to point out, that KTables work
> somewhat different to relational database system. Thus, you might want
> to evaluate not base on performance, but on the semantics KTable provide.
>
> Recall, that Kafka Streams is a stream processing library while a
> database system is a "batch processing" system. It's two quite different
> types of systems and benchmarking them to compare each other is
> questionable.
>
>
> -Matthias
>
> On 10/31/18 12:18 PM, Tom Szumowski wrote:
> > I was wonderinf if anyone had or knew of benchmark tests for KTable or
> > GlobalKTable queries/joins, as compared to alternatives such as
> distributed
> > databases.
> >
>
>


Re: Bemchmarks for KTable Joins and Queries

2018-11-01 Thread Matthias J. Sax
I am not aware if benchmarks, but want to point out, that KTables work
somewhat different to relational database system. Thus, you might want
to evaluate not base on performance, but on the semantics KTable provide.

Recall, that Kafka Streams is a stream processing library while a
database system is a "batch processing" system. It's two quite different
types of systems and benchmarking them to compare each other is
questionable.


-Matthias

On 10/31/18 12:18 PM, Tom Szumowski wrote:
> I was wonderinf if anyone had or knew of benchmark tests for KTable or
> GlobalKTable queries/joins, as compared to alternatives such as distributed
> databases.
> 



signature.asc
Description: OpenPGP digital signature


Bemchmarks for KTable Joins and Queries

2018-10-31 Thread Tom Szumowski
I was wonderinf if anyone had or knew of benchmark tests for KTable or
GlobalKTable queries/joins, as compared to alternatives such as distributed
databases.