Re: How to recognize and get the min of a date/string column in Java?
finally solved with the MM for months format recommendation. thanks Le mar. 14 juin 2022 à 23:02, marc nicole a écrit : > i changed the format to -mm-dd for the example > > Le mar. 14 juin 2022 à 22:52, Sean Owen a écrit : > >> Look at your data - doesn't match date format you give >> >> On Tue, Jun 14, 2022, 3:41 PM marc nicole wrote: >> >>> for the input (I changed the format) : >>> >>> +---+ >>> |Date| >>> +---+ >>> | 2019-02-08 | >>> ++ >>> | 2019-02-07 | >>> ++ >>> | 2019-12-01 | >>> ++ >>> | 2015-02-02 | >>> ++ >>> | 2012-02-03 | >>> ++ >>> | 2018-05-06 | >>> ++ >>> | 2022-02-08 | >>> ++ >>> the output was 2012-01-03 >>> >>> To note that for my below code to work I cast to string the resulting >>> min column. >>> >>> Le mar. 14 juin 2022 à 21:12, Sean Owen a écrit : >>> You haven't shown your input or the result On Tue, Jun 14, 2022 at 1:40 PM marc nicole wrote: > Hi Sean, > > Even with MM for months it gives incorrect (but different this time) > min value. > > Le mar. 14 juin 2022 à 20:18, Sean Owen a écrit : > >> Yes that is right. It has to be parsed as a date to correctly reason >> about ordering. Otherwise you are finding the minimum string >> alphabetically. >> >> Small note, MM is month. mm is minute. You have to fix that for this >> to work. These are Java format strings. >> >> On Tue, Jun 14, 2022, 12:32 PM marc nicole >> wrote: >> >>> Hi, >>> >>> I want to identify a column of dates as such, the column has >>> formatted strings in the likes of: "06-14-2022" (the format being >>> mm-dd-) and get the minimum of those dates. >>> >>> I tried in Java as follows: >>> >>> if (dataset.filter(org.apache.spark.sql.functions.to_date( dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() != 0) { >>> >>> >>> And to get the *min *of the column: >>> >>> Object colMin = dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), "mm-dd-"))).first().get(0); >>> >>> // then I cast the *colMin *to string. >>> >>> To note that if i don't apply *to_date*() to the target column then >>> the result will be erroneous (i think Spark will take the values as >>> string >>> and will get the min as if it was applied on an alphabetical string). >>> >>> Any better approach to accomplish this? >>> Thanks. >>> >>
Re: How to recognize and get the min of a date/string column in Java?
i changed the format to -mm-dd for the example Le mar. 14 juin 2022 à 22:52, Sean Owen a écrit : > Look at your data - doesn't match date format you give > > On Tue, Jun 14, 2022, 3:41 PM marc nicole wrote: > >> for the input (I changed the format) : >> >> +---+ >> |Date| >> +---+ >> | 2019-02-08 | >> ++ >> | 2019-02-07 | >> ++ >> | 2019-12-01 | >> ++ >> | 2015-02-02 | >> ++ >> | 2012-02-03 | >> ++ >> | 2018-05-06 | >> ++ >> | 2022-02-08 | >> ++ >> the output was 2012-01-03 >> >> To note that for my below code to work I cast to string the resulting min >> column. >> >> Le mar. 14 juin 2022 à 21:12, Sean Owen a écrit : >> >>> You haven't shown your input or the result >>> >>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole wrote: >>> Hi Sean, Even with MM for months it gives incorrect (but different this time) min value. Le mar. 14 juin 2022 à 20:18, Sean Owen a écrit : > Yes that is right. It has to be parsed as a date to correctly reason > about ordering. Otherwise you are finding the minimum string > alphabetically. > > Small note, MM is month. mm is minute. You have to fix that for this > to work. These are Java format strings. > > On Tue, Jun 14, 2022, 12:32 PM marc nicole > wrote: > >> Hi, >> >> I want to identify a column of dates as such, the column has >> formatted strings in the likes of: "06-14-2022" (the format being >> mm-dd-) and get the minimum of those dates. >> >> I tried in Java as follows: >> >> if (dataset.filter(org.apache.spark.sql.functions.to_date( >>> dataset.col(colName), >>> "mm-dd-").isNotNull()).select(colName).count() != >>> 0) { >> >> >> And to get the *min *of the column: >> >> Object colMin = >>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >>> "mm-dd-"))).first().get(0); >> >> // then I cast the *colMin *to string. >> >> To note that if i don't apply *to_date*() to the target column then >> the result will be erroneous (i think Spark will take the values as >> string >> and will get the min as if it was applied on an alphabetical string). >> >> Any better approach to accomplish this? >> Thanks. >> >
Re: How to recognize and get the min of a date/string column in Java?
Look at your data - doesn't match date format you give On Tue, Jun 14, 2022, 3:41 PM marc nicole wrote: > for the input (I changed the format) : > > +---+ > |Date| > +---+ > | 2019-02-08 | > ++ > | 2019-02-07 | > ++ > | 2019-12-01 | > ++ > | 2015-02-02 | > ++ > | 2012-02-03 | > ++ > | 2018-05-06 | > ++ > | 2022-02-08 | > ++ > the output was 2012-01-03 > > To note that for my below code to work I cast to string the resulting min > column. > > Le mar. 14 juin 2022 à 21:12, Sean Owen a écrit : > >> You haven't shown your input or the result >> >> On Tue, Jun 14, 2022 at 1:40 PM marc nicole wrote: >> >>> Hi Sean, >>> >>> Even with MM for months it gives incorrect (but different this time) min >>> value. >>> >>> Le mar. 14 juin 2022 à 20:18, Sean Owen a écrit : >>> Yes that is right. It has to be parsed as a date to correctly reason about ordering. Otherwise you are finding the minimum string alphabetically. Small note, MM is month. mm is minute. You have to fix that for this to work. These are Java format strings. On Tue, Jun 14, 2022, 12:32 PM marc nicole wrote: > Hi, > > I want to identify a column of dates as such, the column has formatted > strings in the likes of: "06-14-2022" (the format being mm-dd-) and > get > the minimum of those dates. > > I tried in Java as follows: > > if (dataset.filter(org.apache.spark.sql.functions.to_date( >> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() >> != >> 0) { > > > And to get the *min *of the column: > > Object colMin = >> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >> "mm-dd-"))).first().get(0); > > // then I cast the *colMin *to string. > > To note that if i don't apply *to_date*() to the target column then > the result will be erroneous (i think Spark will take the values as string > and will get the min as if it was applied on an alphabetical string). > > Any better approach to accomplish this? > Thanks. >
Re: How to recognize and get the min of a date/string column in Java?
for the input (I changed the format) : +---+ |Date| +---+ | 2019-02-08 | ++ | 2019-02-07 | ++ | 2019-12-01 | ++ | 2015-02-02 | ++ | 2012-02-03 | ++ | 2018-05-06 | ++ | 2022-02-08 | ++ the output was 2012-01-03 To note that for my below code to work I cast to string the resulting min column. Le mar. 14 juin 2022 à 21:12, Sean Owen a écrit : > You haven't shown your input or the result > > On Tue, Jun 14, 2022 at 1:40 PM marc nicole wrote: > >> Hi Sean, >> >> Even with MM for months it gives incorrect (but different this time) min >> value. >> >> Le mar. 14 juin 2022 à 20:18, Sean Owen a écrit : >> >>> Yes that is right. It has to be parsed as a date to correctly reason >>> about ordering. Otherwise you are finding the minimum string >>> alphabetically. >>> >>> Small note, MM is month. mm is minute. You have to fix that for this to >>> work. These are Java format strings. >>> >>> On Tue, Jun 14, 2022, 12:32 PM marc nicole wrote: >>> Hi, I want to identify a column of dates as such, the column has formatted strings in the likes of: "06-14-2022" (the format being mm-dd-) and get the minimum of those dates. I tried in Java as follows: if (dataset.filter(org.apache.spark.sql.functions.to_date( > dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() > != > 0) { And to get the *min *of the column: Object colMin = > dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), > "mm-dd-"))).first().get(0); // then I cast the *colMin *to string. To note that if i don't apply *to_date*() to the target column then the result will be erroneous (i think Spark will take the values as string and will get the min as if it was applied on an alphabetical string). Any better approach to accomplish this? Thanks. >>>
Re: How to recognize and get the min of a date/string column in Java?
Hi Sean, Even with MM for months it gives incorrect (but different this time) min value. Le mar. 14 juin 2022 à 20:18, Sean Owen a écrit : > Yes that is right. It has to be parsed as a date to correctly reason about > ordering. Otherwise you are finding the minimum string alphabetically. > > Small note, MM is month. mm is minute. You have to fix that for this to > work. These are Java format strings. > > On Tue, Jun 14, 2022, 12:32 PM marc nicole wrote: > >> Hi, >> >> I want to identify a column of dates as such, the column has formatted >> strings in the likes of: "06-14-2022" (the format being mm-dd-) and get >> the minimum of those dates. >> >> I tried in Java as follows: >> >> if (dataset.filter(org.apache.spark.sql.functions.to_date( >>> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() != >>> 0) { >> >> >> And to get the *min *of the column: >> >> Object colMin = >>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >>> "mm-dd-"))).first().get(0); >> >> // then I cast the *colMin *to string. >> >> To note that if i don't apply *to_date*() to the target column then the >> result will be erroneous (i think Spark will take the values as string and >> will get the min as if it was applied on an alphabetical string). >> >> Any better approach to accomplish this? >> Thanks. >> >
Re: How to recognize and get the min of a date/string column in Java?
Yes that is right. It has to be parsed as a date to correctly reason about ordering. Otherwise you are finding the minimum string alphabetically. Small note, MM is month. mm is minute. You have to fix that for this to work. These are Java format strings. On Tue, Jun 14, 2022, 12:32 PM marc nicole wrote: > Hi, > > I want to identify a column of dates as such, the column has formatted > strings in the likes of: "06-14-2022" (the format being mm-dd-) and get > the minimum of those dates. > > I tried in Java as follows: > > if (dataset.filter(org.apache.spark.sql.functions.to_date( >> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() != >> 0) { > > > And to get the *min *of the column: > > Object colMin = >> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName), >> "mm-dd-"))).first().get(0); > > // then I cast the *colMin *to string. > > To note that if i don't apply *to_date*() to the target column then the > result will be erroneous (i think Spark will take the values as string and > will get the min as if it was applied on an alphabetical string). > > Any better approach to accomplish this? > Thanks. >