Since this is landing in Spam for some reason for most people, I am
explicitly forwarding it rather than replying, so as people can get
it.

I will vote for (2) considering it was returning `null` earlier as
well & the new behaviour came up recently, it would be a behaviour
change for folks migrating from earlier versions of hive to 4.x & the
new behaviour ain't that fancy either...

-Ayush

---------- Forwarded message ---------
From: Zoltán Rátkai <zrat...@cloudera.com>
Date: Wed, 18 Sept 2024 at 12:55
Subject: Re: IMPORTANT: Hive date parsing issue, input needed
To: <user@hive.apache.org>
Cc: dev <d...@hive.apache.org>


Hi Ayush,

thanks for your thoughts!

My answers:

- On the official Hive page about types this is not documented at all:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-date

"Dates:  DATE values describe a particular year/month/day, in the form
YYYY-­MM-­DD. For example, DATE '2013-­01-­01'. Date types do not have
a time of day component. The range of values supported for the Date
type is 0000-­01-­01 to 9999-­12-­31, dependent on support by the
primitive Java Date type."

- Since the beginning of Hive it worked like returning null, this was
introduced one year ago as a side effect of this: HIVE-27586 "Parse
dates from strings ignoring trailing (potentialy) invalid chars", so
the goal of this ticket was to be able to handle string to date
parsing like : "2024-09-18 GARBAGE" and parse it to 2024-09-18 as
date.
So it is not a long standing behaviour.

I made a comparison of other DB (Postre, MySQL, ORACLE) and none of
them handle it like Hive now. Please check it here:
https://issues.apache.org/jira/browse/HIVE-28483
I agree with you to align with those. This is why I asked this question.

Thanks and regards,

Zoltan Ratkai


On Wed, Sep 18, 2024 at 8:58 AM Ayush Saxena <ayush...@gmail.com> wrote:
>
> ++ Adding the dev mailing list, as this might be relevant to them as well.
>
> I have a few additional questions:
>
> * Has this behavior been present since the inception of Hive, or did
> it start occurring more recently? If this has been the behavior for
> over a decade, it might be best to leave it unchanged. However, if
> this is a recent development, we should aim to restore the original
> behavior. In my opinion, any change that alters long-standing behavior
> could be considered incompatible unless it was introduced to prevent
> data loss or address a security issue.
> * How do other engines (e.g., Impala, Spark) handle similar scenarios,
> and how do other databases like MySQL and PostgreSQL behave in this
> regard? In the past, we've typically aimed to align with these
> systems, not necessarily exactly, but closely enough—especially in
> cases where there's ambiguity or conflict.
>
>
> -Ayush
>
>
> On Wed, 18 Sept 2024 at 12:00, Zoltán Rátkai <zrat...@cloudera.com> wrote:
> >
> > Dear Hive User!
> >
> > In Hive the standard date format is YYYY-MM-DD, like “2024-09-17”.
> > So when casting a string which format is different, like DD-MM-YYYY e.g. 
> > “17-09-2024" to date, as a Hive user what would you expect as a result?
> >
> >
> > 1.  “0017-09-20”  => Year of 17 September 20
> > or
> > 2. null
> >
> > It is possible to give a date pattern CAST(<string-value> AS DATE FORMAT 
> > <date-pattern>)and cast with it like "DD-MM-YYYY".
> > Earlier it gave back null, but currently Hive gives back the first, which I 
> > think unexpected as a user and customers complain about that. There is a 
> > debate if it is a bug or feature, so I need your help!
> >
> > I would like to ask you to reply with a vote on 1 or 2 what you want as a 
> > user to happen!
> >
> > I really appreciate your input!
> >
> > Thank you,
> >
> > Zoltan Ratkai

Reply via email to