Hi Okumin, thank you very much to checking them I appreciate it! Now we have information on 5 different DB (MySQL, Oracle, Postgre, Spark, Trino) and 3 different versions of Hive about this topic and as I see only Hive 4.0 works like the No. 1 way.
Regards, Zoltan On Wed, Sep 18, 2024 at 3:53 PM Okumin <m...@okumin.com> wrote: > Hi, > > I put supplementary information in the ticket, such as examples of > Hive 2, Hive 3, Spark, Trino, or my research of the ANSI standard. > > Regards, > Okumin > > On Wed, Sep 18, 2024 at 7:17 PM Attila Turoczy > <aturo...@cloudera.com.invalid> wrote: > > > > Hi Folks, > > > > Let me share my opinion about this change and request. To have an option > for parse data from string is a very valuable feature especially in OLAP, > because many of the input data many times come from different formats. It > is easy and makes sense that the date follows the ISO format. But real life > is different. Especially on data lakes the sources could come from many > formats and the ETL job parses excel, json, csv etc to create tables, > insert rows. For those countries like Hungary where the ISO date format is > default everywhere the HIVE-27586 would not cause any issue. However, in > regions such as the US and UK, where date formats like mm/dd/yyyy and > dd/mm/yyyy are common (e.g., in Excel files), this could present > challenges. In these cases, users may encounter problems. In my mind > especially for these formats it has to be null or as an oracle throws an > error. If users fail to validate the output, this could lead to data > correctness issues or, worse, potential data loss if source files are > deleted. Yes, Hive users do not like the behavior change, but one type (one > of the most popular formats in the US) could lead to more problems. > > > > -Attila > > > > On Wed, Sep 18, 2024 at 9:43 AM Ayush Saxena <ayush...@gmail.com> wrote: > >> > >> Since this is landing in Spam for some reason for most people, I am > >> explicitly forwarding it rather than replying, so as people can get > >> it. > >> > >> I will vote for (2) considering it was returning `null` earlier as > >> well & the new behaviour came up recently, it would be a behaviour > >> change for folks migrating from earlier versions of hive to 4.x & the > >> new behaviour ain't that fancy either... > >> > >> -Ayush > >> > >> ---------- Forwarded message --------- > >> From: Zoltán Rátkai <zrat...@cloudera.com> > >> Date: Wed, 18 Sept 2024 at 12:55 > >> Subject: Re: IMPORTANT: Hive date parsing issue, input needed > >> To: <user@hive.apache.org> > >> Cc: dev <d...@hive.apache.org> > >> > >> > >> Hi Ayush, > >> > >> thanks for your thoughts! > >> > >> My answers: > >> > >> - On the official Hive page about types this is not documented at all: > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-date > >> > >> "Dates: DATE values describe a particular year/month/day, in the form > >> YYYY-MM-DD. For example, DATE '2013-01-01'. Date types do not have > >> a time of day component. The range of values supported for the Date > >> type is 0000-01-01 to 9999-12-31, dependent on support by the > >> primitive Java Date type." > >> > >> - Since the beginning of Hive it worked like returning null, this was > >> introduced one year ago as a side effect of this: HIVE-27586 "Parse > >> dates from strings ignoring trailing (potentialy) invalid chars", so > >> the goal of this ticket was to be able to handle string to date > >> parsing like : "2024-09-18 GARBAGE" and parse it to 2024-09-18 as > >> date. > >> So it is not a long standing behaviour. > >> > >> I made a comparison of other DB (Postre, MySQL, ORACLE) and none of > >> them handle it like Hive now. Please check it here: > >> https://issues.apache.org/jira/browse/HIVE-28483 > >> I agree with you to align with those. This is why I asked this question. > >> > >> Thanks and regards, > >> > >> Zoltan Ratkai > >> > >> > >> On Wed, Sep 18, 2024 at 8:58 AM Ayush Saxena <ayush...@gmail.com> > wrote: > >> > > >> > ++ Adding the dev mailing list, as this might be relevant to them as > well. > >> > > >> > I have a few additional questions: > >> > > >> > * Has this behavior been present since the inception of Hive, or did > >> > it start occurring more recently? If this has been the behavior for > >> > over a decade, it might be best to leave it unchanged. However, if > >> > this is a recent development, we should aim to restore the original > >> > behavior. In my opinion, any change that alters long-standing behavior > >> > could be considered incompatible unless it was introduced to prevent > >> > data loss or address a security issue. > >> > * How do other engines (e.g., Impala, Spark) handle similar scenarios, > >> > and how do other databases like MySQL and PostgreSQL behave in this > >> > regard? In the past, we've typically aimed to align with these > >> > systems, not necessarily exactly, but closely enough—especially in > >> > cases where there's ambiguity or conflict. > >> > > >> > > >> > -Ayush > >> > > >> > > >> > On Wed, 18 Sept 2024 at 12:00, Zoltán Rátkai <zrat...@cloudera.com> > wrote: > >> > > > >> > > Dear Hive User! > >> > > > >> > > In Hive the standard date format is YYYY-MM-DD, like “2024-09-17”. > >> > > So when casting a string which format is different, like DD-MM-YYYY > e.g. “17-09-2024" to date, as a Hive user what would you expect as a result? > >> > > > >> > > > >> > > 1. “0017-09-20” => Year of 17 September 20 > >> > > or > >> > > 2. null > >> > > > >> > > It is possible to give a date pattern CAST(<string-value> AS DATE > FORMAT <date-pattern>)and cast with it like "DD-MM-YYYY". > >> > > Earlier it gave back null, but currently Hive gives back the first, > which I think unexpected as a user and customers complain about that. There > is a debate if it is a bug or feature, so I need your help! > >> > > > >> > > I would like to ask you to reply with a vote on 1 or 2 what you > want as a user to happen! > >> > > > >> > > I really appreciate your input! > >> > > > >> > > Thank you, > >> > > > >> > > Zoltan Ratkai >