> I'm fine with your proposal. But once we see users asking for better unified
> semantics, we should not hesitate to introduce an option to give them more
> flexibility.
Yes, I agree that we should introduce the option once we received feedback
requirement from user input. I will update this
It is true that your proposal is kind of a conservative plan.
I'm fine with your proposal. But once we see users asking for better
unified semantics, we should not hesitate to introduce an option to give
them more flexibility.
Regards,
Timo
On 01.03.21 12:59, Leonard Xu wrote:
Thanks Kurt
Thanks Kurt and Timo for the feedbacks.
>> I prefer to not introduce such config until we have to. Leonard's proposal
>> already makes almost all users happy thus I think we can still wait.
I could understand Kurt’s concern that we don't need rush to introduce this
option util we have to,
I agree that Leonard's last proposal makes "almost all" users happy.
However, a config option (as Joe said) would make "all" user happy
because they have the power to choose.
I don't have a strong opinion on this proposal as it is bascially a
mixture of both approaches:
1) "some magic using
I'm +1 to Leonard's last proposal, which:
1. Keep CURRENT_TIMESTAMP row level behavior in streaming mode, and make it
evaluated at query start in batch mode.
2. Introduce CURRENT_ROW_TIMESTAMP for batch users who want such semantic.
I'm slightly -1 for introducing an option because we are
and btw it is interesting to notice that AWS seems to do the approach
that I suggested first.
All functions are SQL standard compliant, and only dedicated functions
with a prefix such as CURRENT_ROW_TIMESTAMP divert from the standard.
Regards,
Timo
On 01.03.21 08:45, Timo Walther wrote:
How
How about we simply go for your first approach by having [query-start,
row, auto] as configuration parameters where [auto] is the default?
This sounds like a good consensus where everyone is happy, no?
This also allows user to restore the old per-row behavior for all
functions that we had
Thanks Joe for the great investigation.
> • Generally urging for semantics (batch > time of first query issued,
> streaming > row level).
> I discussed the thing now with Timo & Stephan:
> • It seems to go towards a config parameter, either [query-start, row]
> or [query-start,
Hi,
Sorry it took some time, here are my findings:
The sentiment was:
• This will only be an issue when you face it.
• Generally urging for semantics (batch > time of first query issued,
streaming > row level).
• Not necessarily introducing new functions, but rather
Hi, Joe
Thanks for volunteering to investigate the user data on this topic. Do you
have any progress here?
Thanks,
Leonard
On Thu, Feb 4, 2021 at 3:08 PM Johannes Moser wrote:
> Hello,
>
> I will work with some users to get data on that.
>
> Thanks, Joe
>
> > On 03.02.2021, at 14:58, Stephan
Hello,
I will work with some users to get data on that.
Thanks, Joe
> On 03.02.2021, at 14:58, Stephan Ewen wrote:
>
> Hi all!
>
> A quick thought on this thread: We see a typical stalemate here, as in so
> many discussions recently.
> One developer prefers it this way, another one another
Hi all!
A quick thought on this thread: We see a typical stalemate here, as in so
many discussions recently.
One developer prefers it this way, another one another way. Both have
pro/con arguments, it takes a lot of time from everyone, still there is
little progress in the discussion.
Hi Fabian,
I think we have an agreement that the functions should be evaluated at
query start in batch mode.
Because all the other batch systems and traditional databases are this
behavior, which is standard SQL compliant.
*1. The different point of view is what's the behavior in streaming mode?
Hi everyone,
Sorry for joining this discussion late.
Let me give some thought to two of the arguments raised in this thread.
Time functions are inherently non-determintistic:
--
This is of course true, but IMO it doesn't mean that the semantics of time
functions do not matter.
It makes a
BTW I also don't like to introduce an option for this case at the
first step.
If we can find a default behavior which can make 90% users happy, we should
do it. If the remaining
10% percent users start to complain about the fixed behavior (it's also
possible that they don't complain ever),
we
Hi Timo,
I don't think batch-stream unification can deal with all the cases,
especially if
the query involves some non deterministic functions.
No matter we choose any options, these queries will have different results.
For example, if we run the same query in batch mode multiple times, it's
Hi everyone,
I'm not sure if we should introduce the `auto` mode. Taking all the
previous discussions around batch-stream unification into account, batch
mode and streaming mode should only influence the runtime efficiency and
incremental computation. The final query result should be the same
+1 for the default "auto" to the "table.exec.time-function-evaluation".
>From the definition of these functions, in my opinion:
- Batch is the instant execution of all records, which is the meaning of
the word "BATCH", so there is only one time at query-start.
- Stream only executes a single
Hi Leonard, Timo,
I just did some investigation and found all the other batch processing
systems
evaluate the time functions at query-start, including Snowflake, Hive,
Spark, Trino.
I'm wondering whether the default 'per-record' mode will still be weird for
batch users.
I know we proposed the
Hi Leonard,
thanks for considering this issue as well. +1 for the proposed config
option. Let's start a voting thread once the FLIP document has been
updated if there are no other concerns?
Thanks,
Timo
On 01.02.21 15:07, Leonard Xu wrote:
Hi, all
I’ve discussed with @Timo @Jark about
Hi, all
I’ve discussed with @Timo @Jark about the time function evaluation further. We
reach a consensus that we’d better address the time function
evaluation(function value materialization) in this FLIP as well.
We’re fine with introducing an option table.exec.time-function-evaluation to
Parts of the FLIP can already be implemented without a completed voting,
e.g. there is no doubt that we should support TIME(9).
However, I don't see a benefit of reworking the time functions to rework
them again later. If we lock the time on query-start the implementation
of the previsouly
I also prefer to not expand this FLIP further, but we could open a
discussion thread
right after this FLIP being accepted and start coding & reviewing. Make
technique
discussion and coding more pipelined will improve efficiency.
Best,
Kurt
On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu wrote:
>
Hi, Timo
> I do think that this topic must be part of the FLIP as well. Esp. if the FLIP
> has the title "time function behavior" and this is clearly a behavioral
> aspect. We are performing a heavy refactoring of the SQL query semantics in
> Flink here which will affect a lot of users. We
Hi Leonard,
I do think that this topic must be part of the FLIP as well. Esp. if the
FLIP has the title "time function behavior" and this is clearly a
behavioral aspect. We are performing a heavy refactoring of the SQL
query semantics in Flink here which will affect a lot of users. We
cannot
Hi, Timo
> I'm sorry that I need to open another discussion thread befoe voting but I
> think we should also discuss this in this FLIP before it pops up at a later
> stage.
>
> How do we want our time functions to behave in long running queries?
It’s okay to open this thread. Although I don’t
I'm sorry that I need to open another discussion thread befoe voting but
I think we should also discuss this in this FLIP before it pops up at a
later stage.
How do we want our time functions to behave in long running queries?
See also:
Hi, Jark
> I have a minor suggestion:
> I think we will still suggest users use TIMESTAMP even if we have
> TIMESTAMP_NTZ. Then it seems
> introducing TIMESTAMP_NTZ doesn't help much for users, but introduces more
> learning costs.
I think your suggestion makes sense, we should suggest users
I have a minor suggestion:
I think we may not need to introduce TIMESTAMP_NTZ, we already have the
shortcut
type TIMESTAMP for TIMESTAMP WITHOUT TIME ZONE. I think we will still
suggest
users use TIMESTAMP even if we have TIMESTAMP_NTZ. Then it seems
introducing
TIMESTAMP_NTZ doesn't help much
Thanks all for sharing your opinions.
Looks like we’ve reached a consensus about the topic.
@Timo:
> 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP and not
> TIMESTAMP_LTZ? Maybe we should quickly list also LOCALTIME/LOCALDATE and
> LOCALTIMESTAMP for completeness.
Yes,
+1 to have shortcut types TIMESTAMP_LTZ, TIMESTAMP_TZ.
Best,
Jark
On Thu, 28 Jan 2021 at 17:32, Timo Walther wrote:
> Hi Leonard,
>
> thanks for the great summary and the updated FLIP. I think using
> TIMESTAMP_LTZ for CURRENT_TIMESTAMP/PROCTIME/ROWTIME is a good long-term
> solution. I also
Hi Leonard,
thanks for the great summary and the updated FLIP. I think using
TIMESTAMP_LTZ for CURRENT_TIMESTAMP/PROCTIME/ROWTIME is a good long-term
solution. I also discussed this with people of different backgrounds
internally and everybody seems to agree to the proposed design. I hope
we
Thanks Leonard for the further investigation.
I think we all agree we should correct the return value of
CURRENT_TIMESTAMP.
Regarding the return type of CURRENT_TIMESTAMP, I also agree TIMESTAMP_LTZ
would be more worldwide useful. This may need more effort, but if this is
the right direction, we
Thanks Leonard for the detailed response and also the bad case about option
1, these all
make sense to me.
Also nice catch about conversion support of LocalZonedTimestampType, I
think it actually
makes sense to support java.sql.Timestamp as well as
java.time.LocalDateTime. It also has
a slight
Hi, All
Thanks for your comments. I think all of the thread have agreed that:
(1) The return values of CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME() are
wrong.
(2) The LOCALTIME/LOCALTIMESTAMP and CURRENT_TIME/CURRENT_TIMESTAMP should be
different whether from SQL standard’s perspective or
Hi everyone,
let me answer the individual threads:
>>> I know that the two series should be different at first glance, but
>>> different SQL engines can have their own explanations,for example,
>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1]
and has
>>> no difference, and
Forgot one more thing. Continue with displaying in UTC. As a user, if Flink
want to display the timestamp
in UTC, why don't we offer something like UTC_TIMESTAMP?
Best,
Kurt
On Fri, Jan 22, 2021 at 4:33 PM Kurt Young wrote:
> Before jumping into technique details, let's take a step back to
Before jumping into technique details, let's take a step back to discuss
user experience.
The first important question is what kind of date and time will Flink
display when users call
CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are similar).
Should it always display the date and
Thanks @Timo for the detailed reply, let's go on this topic on this discussion,
I've merged all mails to this discussion.
> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
>
> --> uses session time zone, returns DATE/TIME/TIMESTAMP
>
> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
>
> --> uses session
Now we have 2 discussion threads on 3 mailing lists. Which one should
have prioity? Should I repost my large email here again?
I think it is good to inform and invite in the user mailing lists but
let's keep the FLIP discussion on the dev@ ML only.
Regards,
Timo
On 21.01.21 16:50, Leonard
40 matches
Mail list logo