Re: How to recognize and get the min of a date/string column in Java?

2022-06-15 Thread marc nicole
finally solved with the MM for months format recommendation. thanks

Le mar. 14 juin 2022 à 23:02, marc nicole  a écrit :

> i changed the format to -mm-dd for the example
>
> Le mar. 14 juin 2022 à 22:52, Sean Owen  a écrit :
>
>> Look at your data - doesn't match date format you give
>>
>> On Tue, Jun 14, 2022, 3:41 PM marc nicole  wrote:
>>
>>> for the input  (I changed the format)  :
>>>
>>> +---+
>>> |Date|
>>> +---+
>>> | 2019-02-08 |
>>> ++
>>> | 2019-02-07 |
>>> ++
>>> | 2019-12-01 |
>>> ++
>>> | 2015-02-02 |
>>> ++
>>> | 2012-02-03 |
>>> ++
>>> | 2018-05-06 |
>>> ++
>>> | 2022-02-08 |
>>> ++
>>> the output was 2012-01-03
>>>
>>> To note that for my below code to work I cast to string the resulting
>>> min column.
>>>
>>> Le mar. 14 juin 2022 à 21:12, Sean Owen  a écrit :
>>>
 You haven't shown your input or the result

 On Tue, Jun 14, 2022 at 1:40 PM marc nicole 
 wrote:

> Hi Sean,
>
> Even with MM for months it gives incorrect (but different this time)
> min value.
>
> Le mar. 14 juin 2022 à 20:18, Sean Owen  a écrit :
>
>> Yes that is right. It has to be parsed as a date to correctly reason
>> about ordering. Otherwise you are finding the minimum string
>> alphabetically.
>>
>> Small note, MM is month. mm is minute. You have to fix that for this
>> to work. These are Java format strings.
>>
>> On Tue, Jun 14, 2022, 12:32 PM marc nicole 
>> wrote:
>>
>>> Hi,
>>>
>>> I want to identify a column of dates as such, the column has
>>> formatted strings in the likes of: "06-14-2022" (the format being
>>> mm-dd-) and get the minimum of those dates.
>>>
>>> I tried in Java as follows:
>>>
>>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
 dataset.col(colName), 
 "mm-dd-").isNotNull()).select(colName).count() !=
 0) { 
>>>
>>>
>>> And to get the *min *of the column:
>>>
>>> Object colMin =
 dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
 "mm-dd-"))).first().get(0);
>>>
>>> // then I cast the *colMin *to string.
>>>
>>> To note that if i don't apply *to_date*() to the target column then
>>> the result will be erroneous (i think Spark will take the values as 
>>> string
>>> and will get the min as if it was applied on an alphabetical string).
>>>
>>> Any better approach to accomplish this?
>>> Thanks.
>>>
>>


Re: How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread marc nicole
i changed the format to -mm-dd for the example

Le mar. 14 juin 2022 à 22:52, Sean Owen  a écrit :

> Look at your data - doesn't match date format you give
>
> On Tue, Jun 14, 2022, 3:41 PM marc nicole  wrote:
>
>> for the input  (I changed the format)  :
>>
>> +---+
>> |Date|
>> +---+
>> | 2019-02-08 |
>> ++
>> | 2019-02-07 |
>> ++
>> | 2019-12-01 |
>> ++
>> | 2015-02-02 |
>> ++
>> | 2012-02-03 |
>> ++
>> | 2018-05-06 |
>> ++
>> | 2022-02-08 |
>> ++
>> the output was 2012-01-03
>>
>> To note that for my below code to work I cast to string the resulting min
>> column.
>>
>> Le mar. 14 juin 2022 à 21:12, Sean Owen  a écrit :
>>
>>> You haven't shown your input or the result
>>>
>>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole  wrote:
>>>
 Hi Sean,

 Even with MM for months it gives incorrect (but different this time)
 min value.

 Le mar. 14 juin 2022 à 20:18, Sean Owen  a écrit :

> Yes that is right. It has to be parsed as a date to correctly reason
> about ordering. Otherwise you are finding the minimum string
> alphabetically.
>
> Small note, MM is month. mm is minute. You have to fix that for this
> to work. These are Java format strings.
>
> On Tue, Jun 14, 2022, 12:32 PM marc nicole 
> wrote:
>
>> Hi,
>>
>> I want to identify a column of dates as such, the column has
>> formatted strings in the likes of: "06-14-2022" (the format being
>> mm-dd-) and get the minimum of those dates.
>>
>> I tried in Java as follows:
>>
>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>> dataset.col(colName), 
>>> "mm-dd-").isNotNull()).select(colName).count() !=
>>> 0) { 
>>
>>
>> And to get the *min *of the column:
>>
>> Object colMin =
>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>> "mm-dd-"))).first().get(0);
>>
>> // then I cast the *colMin *to string.
>>
>> To note that if i don't apply *to_date*() to the target column then
>> the result will be erroneous (i think Spark will take the values as 
>> string
>> and will get the min as if it was applied on an alphabetical string).
>>
>> Any better approach to accomplish this?
>> Thanks.
>>
>


Re: How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread Sean Owen
Look at your data - doesn't match date format you give

On Tue, Jun 14, 2022, 3:41 PM marc nicole  wrote:

> for the input  (I changed the format)  :
>
> +---+
> |Date|
> +---+
> | 2019-02-08 |
> ++
> | 2019-02-07 |
> ++
> | 2019-12-01 |
> ++
> | 2015-02-02 |
> ++
> | 2012-02-03 |
> ++
> | 2018-05-06 |
> ++
> | 2022-02-08 |
> ++
> the output was 2012-01-03
>
> To note that for my below code to work I cast to string the resulting min
> column.
>
> Le mar. 14 juin 2022 à 21:12, Sean Owen  a écrit :
>
>> You haven't shown your input or the result
>>
>> On Tue, Jun 14, 2022 at 1:40 PM marc nicole  wrote:
>>
>>> Hi Sean,
>>>
>>> Even with MM for months it gives incorrect (but different this time) min
>>> value.
>>>
>>> Le mar. 14 juin 2022 à 20:18, Sean Owen  a écrit :
>>>
 Yes that is right. It has to be parsed as a date to correctly reason
 about ordering. Otherwise you are finding the minimum string
 alphabetically.

 Small note, MM is month. mm is minute. You have to fix that for this to
 work. These are Java format strings.

 On Tue, Jun 14, 2022, 12:32 PM marc nicole  wrote:

> Hi,
>
> I want to identify a column of dates as such, the column has formatted
> strings in the likes of: "06-14-2022" (the format being mm-dd-) and 
> get
> the minimum of those dates.
>
> I tried in Java as follows:
>
> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() 
>> !=
>> 0) { 
>
>
> And to get the *min *of the column:
>
> Object colMin =
>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>> "mm-dd-"))).first().get(0);
>
> // then I cast the *colMin *to string.
>
> To note that if i don't apply *to_date*() to the target column then
> the result will be erroneous (i think Spark will take the values as string
> and will get the min as if it was applied on an alphabetical string).
>
> Any better approach to accomplish this?
> Thanks.
>



Re: How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread marc nicole
for the input  (I changed the format)  :

+---+
|Date|
+---+
| 2019-02-08 |
++
| 2019-02-07 |
++
| 2019-12-01 |
++
| 2015-02-02 |
++
| 2012-02-03 |
++
| 2018-05-06 |
++
| 2022-02-08 |
++
the output was 2012-01-03

To note that for my below code to work I cast to string the resulting min
column.

Le mar. 14 juin 2022 à 21:12, Sean Owen  a écrit :

> You haven't shown your input or the result
>
> On Tue, Jun 14, 2022 at 1:40 PM marc nicole  wrote:
>
>> Hi Sean,
>>
>> Even with MM for months it gives incorrect (but different this time) min
>> value.
>>
>> Le mar. 14 juin 2022 à 20:18, Sean Owen  a écrit :
>>
>>> Yes that is right. It has to be parsed as a date to correctly reason
>>> about ordering. Otherwise you are finding the minimum string
>>> alphabetically.
>>>
>>> Small note, MM is month. mm is minute. You have to fix that for this to
>>> work. These are Java format strings.
>>>
>>> On Tue, Jun 14, 2022, 12:32 PM marc nicole  wrote:
>>>
 Hi,

 I want to identify a column of dates as such, the column has formatted
 strings in the likes of: "06-14-2022" (the format being mm-dd-) and get
 the minimum of those dates.

 I tried in Java as follows:

 if (dataset.filter(org.apache.spark.sql.functions.to_date(
> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() 
> !=
> 0) { 


 And to get the *min *of the column:

 Object colMin =
> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
> "mm-dd-"))).first().get(0);

 // then I cast the *colMin *to string.

 To note that if i don't apply *to_date*() to the target column then
 the result will be erroneous (i think Spark will take the values as string
 and will get the min as if it was applied on an alphabetical string).

 Any better approach to accomplish this?
 Thanks.

>>>


Re: How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread marc nicole
Hi Sean,

Even with MM for months it gives incorrect (but different this time) min
value.

Le mar. 14 juin 2022 à 20:18, Sean Owen  a écrit :

> Yes that is right. It has to be parsed as a date to correctly reason about
> ordering. Otherwise you are finding the minimum string alphabetically.
>
> Small note, MM is month. mm is minute. You have to fix that for this to
> work. These are Java format strings.
>
> On Tue, Jun 14, 2022, 12:32 PM marc nicole  wrote:
>
>> Hi,
>>
>> I want to identify a column of dates as such, the column has formatted
>> strings in the likes of: "06-14-2022" (the format being mm-dd-) and get
>> the minimum of those dates.
>>
>> I tried in Java as follows:
>>
>> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>>> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() !=
>>> 0) { 
>>
>>
>> And to get the *min *of the column:
>>
>> Object colMin =
>>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>>> "mm-dd-"))).first().get(0);
>>
>> // then I cast the *colMin *to string.
>>
>> To note that if i don't apply *to_date*() to the target column then the
>> result will be erroneous (i think Spark will take the values as string and
>> will get the min as if it was applied on an alphabetical string).
>>
>> Any better approach to accomplish this?
>> Thanks.
>>
>


Re: How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread Sean Owen
Yes that is right. It has to be parsed as a date to correctly reason about
ordering. Otherwise you are finding the minimum string alphabetically.

Small note, MM is month. mm is minute. You have to fix that for this to
work. These are Java format strings.

On Tue, Jun 14, 2022, 12:32 PM marc nicole  wrote:

> Hi,
>
> I want to identify a column of dates as such, the column has formatted
> strings in the likes of: "06-14-2022" (the format being mm-dd-) and get
> the minimum of those dates.
>
> I tried in Java as follows:
>
> if (dataset.filter(org.apache.spark.sql.functions.to_date(
>> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() !=
>> 0) { 
>
>
> And to get the *min *of the column:
>
> Object colMin =
>> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
>> "mm-dd-"))).first().get(0);
>
> // then I cast the *colMin *to string.
>
> To note that if i don't apply *to_date*() to the target column then the
> result will be erroneous (i think Spark will take the values as string and
> will get the min as if it was applied on an alphabetical string).
>
> Any better approach to accomplish this?
> Thanks.
>


How to recognize and get the min of a date/string column in Java?

2022-06-14 Thread marc nicole
Hi,

I want to identify a column of dates as such, the column has formatted
strings in the likes of: "06-14-2022" (the format being mm-dd-) and get
the minimum of those dates.

I tried in Java as follows:

if (dataset.filter(org.apache.spark.sql.functions.to_date(
> dataset.col(colName), "mm-dd-").isNotNull()).select(colName).count() !=
> 0) { 


And to get the *min *of the column:

Object colMin =
> dataset.agg(org.apache.spark.sql.functions.min(org.apache.spark.sql.functions.to_date(dataset.col(colName),
> "mm-dd-"))).first().get(0);

// then I cast the *colMin *to string.

To note that if i don't apply *to_date*() to the target column then the
result will be erroneous (i think Spark will take the values as string and
will get the min as if it was applied on an alphabetical string).

Any better approach to accomplish this?
Thanks.