Hi Satyam,

first of all your initial join query can also work, you just need to make sure that no time attribute is in the SELECT clause. As the exception indicates, you need to cast all time attributes to TIMESTAMP. The reason for this is some major design issue that is also explained here where a time attribute must not be in the output of a regular join:

https://stackoverflow.com/a/64500296/806430

However, since you would like to perform the join "time-based" either interval join or temporal join might solve your use cases.

In your case I guess the watermark strategy of D is the problem. Are you sure the result is:

>     Emit D row=+I(0,"0",1970-01-01T00:00)@time=0
>     Emit D row=+I(1,"1",1970-01-01T00:00)@time=0
>     Emit D row=+I(2,"2",1970-01-01T00:00)@time=0
>     Emit D watermark=0

and not:

>     Emit D row=+I(0,"0",1970-01-01T00:00)@time=0
>     Emit D watermark=0
>     Emit D row=+I(1,"1",1970-01-01T00:00)@time=0
>     Emit D row=+I(2,"2",1970-01-01T00:00)@time=0

Or maybe the watermark is even dropped. Could you try to use a watermark strategy with

`R` - INTERVAL '0.001' SECONDS

I hope this helps.

Regards,
Timo



On 16.03.21 04:37, Satyam Shekhar wrote:
Hello folks,

I would love to hear back your feedback on this.

Regards,
Satyam

On Wed, Mar 10, 2021 at 6:53 PM Satyam Shekhar <satyamshek...@gmail.com <mailto:satyamshek...@gmail.com>> wrote:

    Hello folks,

    I am looking to enrich rows from an unbounded streaming table by
    joining it with a bounded static table while preserving rowtime for
    the streaming table. For example, let's consider table two tables F
    and D, where F is unbounded and D is bounded. The schema for the two
    tables is the following -

    F:
      |-- C0: BIGINT
      |-- C1: STRING
      |-- R: TIMESTAMP(3) **rowtime**
      |-- WATERMARK FOR R: TIMESTAMP(3) AS `R` - INTERVAL '0' SECONDS

    D:
      |-- C0: BIGINT
      |-- C1: STRING NOT NULL

    I'd like to run the following query on this schema -

    select sum(F.C0), D.C1, tumble_start(F.R, interval '1' second)
         from F join D ON F.C1 = D.C1
         group by D.C1, tumble(F.R, interval '1' second)

    However, I run into the following error while running the above query -

    "Rowtime attributes must not be in the input rows of a regular join.
    As a workaround you can cast the time attributes of input tables to
    TIMESTAMP before."

    My understanding reading the docs is that Time Temporal Join is
    meant to solve this problem. So I model table D as the following -

    D:
      |-- C0: BIGINT
      |-- C1: STRING NOT NULL
      |-- R: TIMESTAMP(3)
      |-- WATERMARK FOR R: TIMESTAMP(3) AS `R` - INTERVAL '0' SECONDS
      |-- CONSTRAINT 2da2dd2e-9937-48cb-9dec-4f6055713004 PRIMARY KEY (C1)

    With column D.R always set to 0 and modify the query as follows -

    select sum(F.C0), D.C1, tumble_start(F.R, interval '1' second)
         from F join D FOR SYSTEM_TIME AS OF F.R ON F.C1 = D.C1
         group by D.C1, tumble(F.R, interval '1' second)

    The above query runs but does not return any result. I have the
    following data in D initially -
    Emit D row=+I(0,"0",1970-01-01T00:00)@time=0
    Emit D row=+I(1,"1",1970-01-01T00:00)@time=0
    Emit D row=+I(2,"2",1970-01-01T00:00)@time=0
    Emit D watermark=0

    And F streams the following rows -
    Emit F row=+I(0,"0",1970-01-01T00:00)@time=0
    Emit F row=+I(1,"1",1970-01-01T00:00:10)@time=1000
    Emit F watermark=1000

    I expect that two rows in F will join with matching rows (on C1) in
    D and produce some output. But I do not see anything in the output.

    So I have the following questions -

    1. Is time temporal join the correct tool to solve this problem?
    2. What could be the reason for not getting any output rows in the
    result?

    Thanks,
    Satyam


Reply via email to