Since this is landing in Spam for some reason for most people, I am explicitly forwarding it rather than replying, so as people can get it.
I will vote for (2) considering it was returning `null` earlier as well & the new behaviour came up recently, it would be a behaviour change for folks migrating from earlier versions of hive to 4.x & the new behaviour ain't that fancy either... -Ayush ---------- Forwarded message --------- From: Zoltán Rátkai <zrat...@cloudera.com> Date: Wed, 18 Sept 2024 at 12:55 Subject: Re: IMPORTANT: Hive date parsing issue, input needed To: <user@hive.apache.org> Cc: dev <d...@hive.apache.org> Hi Ayush, thanks for your thoughts! My answers: - On the official Hive page about types this is not documented at all: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-date "Dates: DATE values describe a particular year/month/day, in the form YYYY-MM-DD. For example, DATE '2013-01-01'. Date types do not have a time of day component. The range of values supported for the Date type is 0000-01-01 to 9999-12-31, dependent on support by the primitive Java Date type." - Since the beginning of Hive it worked like returning null, this was introduced one year ago as a side effect of this: HIVE-27586 "Parse dates from strings ignoring trailing (potentialy) invalid chars", so the goal of this ticket was to be able to handle string to date parsing like : "2024-09-18 GARBAGE" and parse it to 2024-09-18 as date. So it is not a long standing behaviour. I made a comparison of other DB (Postre, MySQL, ORACLE) and none of them handle it like Hive now. Please check it here: https://issues.apache.org/jira/browse/HIVE-28483 I agree with you to align with those. This is why I asked this question. Thanks and regards, Zoltan Ratkai On Wed, Sep 18, 2024 at 8:58 AM Ayush Saxena <ayush...@gmail.com> wrote: > > ++ Adding the dev mailing list, as this might be relevant to them as well. > > I have a few additional questions: > > * Has this behavior been present since the inception of Hive, or did > it start occurring more recently? If this has been the behavior for > over a decade, it might be best to leave it unchanged. However, if > this is a recent development, we should aim to restore the original > behavior. In my opinion, any change that alters long-standing behavior > could be considered incompatible unless it was introduced to prevent > data loss or address a security issue. > * How do other engines (e.g., Impala, Spark) handle similar scenarios, > and how do other databases like MySQL and PostgreSQL behave in this > regard? In the past, we've typically aimed to align with these > systems, not necessarily exactly, but closely enough—especially in > cases where there's ambiguity or conflict. > > > -Ayush > > > On Wed, 18 Sept 2024 at 12:00, Zoltán Rátkai <zrat...@cloudera.com> wrote: > > > > Dear Hive User! > > > > In Hive the standard date format is YYYY-MM-DD, like “2024-09-17”. > > So when casting a string which format is different, like DD-MM-YYYY e.g. > > “17-09-2024" to date, as a Hive user what would you expect as a result? > > > > > > 1. “0017-09-20” => Year of 17 September 20 > > or > > 2. null > > > > It is possible to give a date pattern CAST(<string-value> AS DATE FORMAT > > <date-pattern>)and cast with it like "DD-MM-YYYY". > > Earlier it gave back null, but currently Hive gives back the first, which I > > think unexpected as a user and customers complain about that. There is a > > debate if it is a bug or feature, so I need your help! > > > > I would like to ask you to reply with a vote on 1 or 2 what you want as a > > user to happen! > > > > I really appreciate your input! > > > > Thank you, > > > > Zoltan Ratkai