Yes, exactly. The rocksdb has to "seek" data sets because it doesn't know how many entries are under the join key.
On Thu, 19 Nov 2020 at 13:38, Rex Fenley <r...@remind101.com> wrote: > Ok, but if there is only 1 row per Join key on either side of the join, > then wouldn't "iterate all the values in the MapState under the current > key" effectively be "iterate 1 value in MapState under the current key" > which would be O(1)? Or are you saying that it must seek across the entire > dataset for the whole table even for that 1 row on either side of the join? > > Thanks for the help so far! > > On Wed, Nov 18, 2020 at 6:30 PM Jark Wu <imj...@gmail.com> wrote: > >> Actually, if there is no unique key, it's not O(1), because there maybe >> multiple rows are joined by the join key, i.e. iterate all the values in >> the MapState under the current key, this is a "seek" operation on rocksdb >> which is not efficient. >> >> Are you asking where the join key is set? The join key is set by the >> framework via `AbstractStreamOperator#setKeyContextElement1`. >> >> Best, >> Jark >> >> On Thu, 19 Nov 2020 at 03:18, Rex Fenley <r...@remind101.com> wrote: >> >>> Thanks for the info. >>> >>> So even if there is no unique key inferred for a Row, the set of rows to >>> join on each Join key should effectively still be an O(1) lookup if the >>> join key is unique right? >>> >>> Also, I've been digging around the code to find where the lookup of rows >>> for a join key happens and haven't come across anything. Mind pointing me >>> in the right direction? >>> >>> Thanks! >>> >>> cc Brad >>> >>> On Wed, Nov 18, 2020 at 7:39 AM Jark Wu <imj...@gmail.com> wrote: >>> >>>> Hi Rex, >>>> >>>> Currently, the join operator may use 3 kinds of state structure >>>> depending on the input key and join key information. >>>> >>>> 1) input doesn't have a unique key => MapState<row, count>, >>>> where the map key is the input row and the map value is the number >>>> of equal rows. >>>> >>>> 2) input has unique key, but the unique key is not a subset of join key >>>> => MapState<UK, row> >>>> this is better than the above one, because it has a shorter map key and >>>> is more efficient when retracting records. >>>> >>>> 3) input has a unique key, and the unique key is a subset of join key >>>> => ValueState<row> >>>> this is the best performance, because it only performs a "get" >>>> operation rather than "seek" on rocksdb >>>> for each record of the other input side. >>>> >>>> Note: the join key is the key of the keyed states. >>>> >>>> You can see the implementation differences >>>> in >>>> org.apache.flink.table.runtime.operators.join.stream.state.JoinRecordStateViews. >>>> >>>> Best, >>>> Jark >>>> >>>> On Wed, 18 Nov 2020 at 02:30, Rex Fenley <r...@remind101.com> wrote: >>>> >>>>> Ok, what are the performance consequences then of having a join with >>>>> NoUniqueKey if the left side's key actually is unique in practice? >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> On Tue, Nov 17, 2020 at 7:35 AM Jark Wu <imj...@gmail.com> wrote: >>>>> >>>>>> Hi Rex, >>>>>> >>>>>> Currently, the unique key is inferred by the optimizer. However, the >>>>>> inference is not perfect. >>>>>> There are known issues that the unique key is not derived correctly, >>>>>> e.g. FLINK-20036 (is this opened by you?). If you think you have the same >>>>>> case, please open an issue. >>>>>> >>>>>> Query hint is a nice way for this, but it is not supported yet. >>>>>> We have an issue to track supporting query hint, see FLINK-17173. >>>>>> >>>>>> Beest, >>>>>> Jark >>>>>> >>>>>> >>>>>> On Tue, 17 Nov 2020 at 15:23, Rex Fenley <r...@remind101.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I have quite a few joins in my plan that have >>>>>>> >>>>>>> leftInputSpec=[NoUniqueKey] >>>>>>> >>>>>>> in Flink UI. I know this can't truly be the case that there is no >>>>>>> unique key, at least for some of these joins that I've evaluated. >>>>>>> >>>>>>> Is there a way to hint to the join what the unique key is for a >>>>>>> table? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>>> >>>>>>> >>>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>>> <https://www.facebook.com/remindhq> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>> >>>>> >>>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>>> <https://www.facebook.com/remindhq> >>>>> >>>> >>> >>> -- >>> >>> Rex Fenley | Software Engineer - Mobile and Backend >>> >>> >>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>> <https://www.facebook.com/remindhq> >>> >> > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> >