If my thinking is correct then for some scenarios or use cases, the MRU for
GlobalKTable might also work as a local store of data since the MRU will
always store the data required by that application instance.

On Sun, May 17, 2020 at 9:42 AM Pushkar Deole <pdeole2...@gmail.com> wrote:

> Matthias,
>
> I would like to provide a suggestion here. Please check if this can be
> converted into a KIP. Since GlobalKTable holds complete topic data, and
> when the store underneath is in-memory store then the data in memory can
> quickly grow to a large value. I think it would be good if while using
> GlobalKTable with in-memory store, the memory limit (or no. of events) can
> also be specified in which case the GlobalKTable will hold only that much
> data in memory and rest of the data will be fetched from topic.
> On top of it, the GlobalKTable can also be converted into most recently
> used cache so whatever memory size is allocated to the table, it will
> always hold the MRU on that cache.
>
> On Thu, May 14, 2020 at 11:49 PM Matthias J. Sax <mj...@apache.org> wrote:
>
>> Yeah, the current API doesn't make it very clear how to do it. You can
>> set an in-memory like this:
>>
>> > builder.globalTable("topic",
>> Materialized.as(Stores.inMemoryKeyValueStore("store-name")));
>>
>>
>> We are already working on an improved API via KIP-591:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-591%3A+Add+Kafka+Streams+config+to+set+default+store+type
>>
>>
>>
>> -Matthias
>>
>>
>> On 5/13/20 3:40 AM, Pushkar Deole wrote:
>> > Matthias,
>> >
>> > For GlobalKTable, I am looking at the APIs provided by StreamsBuilder
>> and I
>> > don't see any option to mention in-memory store there: all these API
>> > documentation states that  The resulting GlobalKTable
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/GlobalKTable.html
>> >
>> > will
>> > be materialized in a local KeyValueStore
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/state/KeyValueStore.html
>> >
>> > with
>> > an internal store name . It doesn't give an option whether in-memory or
>> > backed by DB
>> >
>> > globalTable
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String-
>> >
>> > (String
>> > <
>> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
>> >
>> >  topic)
>> > globalTable
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/StreamsBuilder.html#globalTable-java.lang.String-org.apache.kafka.streams.kstream.Consumed-org.apache.kafka.streams.kstream.Materialized-
>> >
>> > (String
>> > <
>> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
>> >
>> >  topic, Consumed
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/Consumed.html
>> >
>> > <K,V> consumed, Materialized
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/kstream/Materialized.html
>> >
>> > <K,V,KeyValueStore
>> > <
>> https://kafka.apache.org/23/javadoc/org/apache/kafka/streams/state/KeyValueStore.html
>> >
>> > <org.apache.kafka.common.utils.Bytes,byte[]>> materialized)
>> >
>> > On Tue, May 12, 2020 at 11:07 PM Matthias J. Sax <mj...@apache.org>
>> wrote:
>> >
>> >> By default, RocksDB is used. You can also change it to use an in-memory
>> >> store that is basically a HashMap.
>> >>
>> >>
>> >> -Matthias
>> >>
>> >> On 5/12/20 10:16 AM, Pushkar Deole wrote:
>> >>> Thanks Liam!
>> >>>
>> >>> On Tue, May 12, 2020, 15:12 Liam Clarke-Hutchinson <
>> >>> liam.cla...@adscale.co.nz> wrote:
>> >>>
>> >>>> Hi Pushkar,
>> >>>>
>> >>>> GlobalKTables and KTables can have whatever data structure you like,
>> if
>> >> you
>> >>>> provide the appropriate deserializers - for example, an Kafka Streams
>> >> app I
>> >>>> maintain stores model data (exported to a topic per entity from
>> Postgres
>> >>>> via Kafka Connect's JDBC Source) as a GlobalKTable of Jackson
>> >> ObjectNode's
>> >>>> keyed by entity id
>> >>>>
>> >>>> If you're worried about efficiency, just treat KTables/GlobalKTables
>> as
>> >> a
>> >>>> HashMap<K, V> to and you're pretty much there. In terms of
>> efficiency,
>> >>>> we're joining model  data to about 7 - 10 TB of transactional data a
>> >> day,
>> >>>> and on average, run about 5 - 10 instances of our enrichment app with
>> >> about
>> >>>> 2GB max heap.
>> >>>>
>> >>>> Kind regards,
>> >>>>
>> >>>> Liam "Not a part of the Confluent team, but happy to help"
>> >>>> Clarke-Hutchinson
>> >>>>
>> >>>> On Tue, May 12, 2020 at 9:35 PM Pushkar Deole <pdeole2...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>>> Hello confluent team,
>> >>>>>
>> >>>>> Could you provide some information on what data structures are used
>> >>>>> internally by GlobalKTable and KTables. The application that I am
>> >> working
>> >>>>> on has a requirement to read cached data from GlobalKTable on every
>> >>>>> incoming event, so the reads from GlobalKTable need to be efficient.
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>

Reply via email to