Re: Why use ListView?

Rex Fenley Tue, 19 Jan 2021 11:13:20 -0800

Thanks!

On Tue, Jan 19, 2021 at 12:55 AM Timo Walther <twal...@apache.org> wrote:


> As always, this depends on the use case ;-)
>
> In general, you should not get a performance regression in using them.
> But keep in mind that ListViews/MapViews cannot be backed by a state
> backend in every operator, so sometimes they are represented as
> List/Maps on heap.
>
> Regards,
> Timo
>
>
> On 18.01.21 18:28, Rex Fenley wrote:
> > Fascinating, do you have an estimate of what qualifies as a lot of data
> > and therefore when this should be used?
> >
> > Thanks
> >
> > On Mon, Jan 18, 2021 at 12:35 AM Timo Walther <twal...@apache.org
> > <mailto:twal...@apache.org>> wrote:
> >
> >     Hi Rex,
> >
> >     ListView and MapView have been part of Flink for years. However, they
> >     were considered as an internal feature and therefore not well
> >     documented. MapView is used internally to make distinct aggregates
> work.
> >
> >     Because we reworked the type inference of aggregate functions, we
> also
> >     added basic documentation for power users.
> >
> >     By default an accumulator will be deserialized from state on every
> >     access. ListView and MapView are not deserialized entirely on access
> >     but
> >     delegate directly to a state backend. Thus, only the key that is
> >     accessed is deserialized. So if an accumulator stores a lot of data,
> it
> >     might be beneficial to use the mentioned abstractions.
> >
> >     Regards,
> >     Timo
> >
> >
> >     On 16.01.21 20:09, Rex Fenley wrote:
> >      > Hello,
> >      >
> >      > In the recent version of Flink docs I read the following [1]:
> >      >  > If an accumulator needs to store large amounts of data,
> >      > |org.apache.flink.table.api.dataview.ListView| and
> >      > |org.apache.flink.table.api.dataview.MapView| provide advanced
> >     features
> >      > for leveraging Flink’s state backends in unbounded data scenarios.
> >      > Please see the docs of the corresponding classes for more
> >     information
> >      > about this advanced feature.
> >      >
> >      > Our job has unbounded state from Debezium/Kafka, uses RocksDB,
> >     and we
> >      > have a number of Aggregators like the following, which group a
> >     set of
> >      > ids by some foreign key "group_id". The sets are usually 10-100
> >     ids in
> >      > size, but at the largest the sets could at some theoretical point
> >     get
> >      > into the tens of thousands of ids (right now largest sets are
> >     ~2000 ids).
> >      >
> >      > table.groupBy($"group_id")
> >      > .aggregate(
> >      > newIDsAgg()(
> >      > $"member_id"
> >      > ) as ("member_ids")
> >      > )
> >      > .select($"group_id", $"member_ids")
> >      >
> >      > caseclassIDsAcc(
> >      > varIDs: mutable.Set[Long]
> >      > )
> >      > classIDsAgg extendsAggregateFunction[Row, IDsAcc] {
> >      >
> >      > overridedefcreateAccumulator(): IDsAcc =
> >      > IDsAcc(mutable.Set())
> >      >
> >      > defaccumulate(
> >      > acc: IDsAcc,
> >      > ID: Long
> >      > ): Unit = {
> >      > acc.IDs.add(ID)
> >      > }
> >      >
> >      > defretract(acc: IDsAcc, ID: Long): Unit = {
> >      > acc.IDs.remove(ID)
> >      > }
> >      >
> >      > defresetAccumulator(acc: IDsAcc): Unit = {
> >      > acc.IDs = mutable.Set()
> >      > }
> >      >
> >      > overridedefgetValue(acc: IDsAcc): Row = {
> >      > Row.of(acc.IDs.toArray)
> >      > }
> >      >
> >      > overridedefgetResultType: TypeInformation[Row] = {
> >      > newRowTypeInfo(
> >      > createTypeInformation[Array[Long]]
> >      > )
> >      > }
> >      > }
> >      >
> >      > I read the docs [2] but I don't see it really say anything about
> why
> >      > ListView is better than just using a Set or Array.
> >      > If we were to move from a Set to a ListView what advantages might
> >     we see
> >      > in these Aggregates?
> >      >
> >      > I also noticed that ListView existed in 1.11 (we're on 1.11.2),
> >     did we
> >      > simply miss this feature? Does it work for 1.11.x too?
> >      >
> >      > Thanks!
> >      >
> >      > [1]
> >      >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/functions/udfs.html#mandatory-and-optional-methods
> >     <
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/functions/udfs.html#mandatory-and-optional-methods
> >
> >
> >      >
> >     <
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/functions/udfs.html#mandatory-and-optional-methods
> >     <
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/functions/udfs.html#mandatory-and-optional-methods
> >>
> >      > [2]
> >      >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/table/api/dataview/ListView.html
> >     <
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/table/api/dataview/ListView.html
> >
> >
> >      >
> >     <
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/table/api/dataview/ListView.html
> >     <
> https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/table/api/dataview/ListView.html
> >>
> >      >
> >      > --
> >      >
> >      > Rex Fenley|Software Engineer - Mobile and Backend
> >      >
> >      >
> >      > Remind.com <https://www.remind.com/ <https://www.remind.com/>>|
> >     BLOG <http://blog.remind.com/ <http://blog.remind.com/>> |
> >      > FOLLOW US <https://twitter.com/remindhq
> >     <https://twitter.com/remindhq>> | LIKE US
> >      > <https://www.facebook.com/remindhq
> >     <https://www.facebook.com/remindhq>>
> >      >
> >
> >
> >
> > --
> >
> > Rex Fenley|Software Engineer - Mobile and Backend
> >
> >
> > Remind.com <https://www.remind.com/>| BLOG <http://blog.remind.com/> |
> > FOLLOW US <https://twitter.com/remindhq> | LIKE US
> > <https://www.facebook.com/remindhq>
> >
>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Re: Why use ListView?

Reply via email to