[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-21 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1333745403


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,57 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+
+>>> dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', 
periods=4).to_series())
+>>> dfs.dt.isocalendar()
+year  week  day
+2019-12-29  2019527
+2019-12-30  2020 11
+2019-12-31  2020 12
+2020-01-01  2020 13
+
+>>> dfs.dt.isocalendar().week
+2019-12-2952
+2019-12-30 1
+2019-12-31 1
+2020-01-01 1
+Name: week, dtype: int64
+"""
+
+def pandas_isocalendar(
+pdf: pd.DataFrame,
+) -> ps.DataFrame[Any]:

Review Comment:
   Oh, sorry we can't use such typing for Pandas UDF. Let's go back to the 
previous way.
   ```suggestion
   return_types = [self._data.index.dtype, int, int, int]
   
   def pandas_isocalendar(
   pdf: pd.DataFrame,
   ) -> ps.DataFrame[return_types]:  # type: ignore[valid-type]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1332390021


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+
+>>> dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', 
periods=4).to_series())
+>>> dfs.dt.isocalendar()
+year  week  day
+2019-12-29  2019527
+2019-12-30  2020 11
+2019-12-31  2020 12
+2020-01-01  2020 13
+
+>>> dfs.dt.isocalendar().week
+2019-12-2952
+2019-12-30 1
+2019-12-31 1
+2020-01-01 1
+Name: week, dtype: int64
+"""
+
+return_types = [self._data.index.dtype, int, int, int]
+
+def pandas_isocalendar(  # type: ignore[no-untyped-def]
+pdf,
+) -> ps.DataFrame[return_types]:  # type: ignore[valid-type]

Review Comment:
   ```suggestion
   def pandas_isocalendar(
   pdf: pd.DataFrame,
   ) -> ps.DataFrame[Any]: 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1332376664


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+

Review Comment:
   We can merge it before Spark 4.0.0, so we have enough time tho.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1332376264


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+

Review Comment:
   Sure! It's not very urgent, so please take your time



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-19 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1330865687


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+

Review Comment:
   @dzhigimont Could you take a look at this comment when you find some time? 
We can do it in separate PR in the future if you have not enough time for this 
for now :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-07 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1319259812


##
python/pyspark/pandas/indexes/datetimes.py:
##
@@ -214,28 +215,8 @@ def microsecond(self) -> Index:
 )
 return Index(self.to_series().dt.microsecond)
 
-@property
-def week(self) -> Index:
-"""
-The week ordinal of the year.
-
-.. deprecated:: 3.5.0
-"""
-warnings.warn(
-"`week` is deprecated in 3.5.0 and will be removed in 4.0.0.",
-FutureWarning,
-)
-return Index(self.to_series().dt.week)
-
-@property
-def weekofyear(self) -> Index:
-warnings.warn(
-"`weekofyear` is deprecated in 3.5.0 and will be removed in 
4.0.0.",
-FutureWarning,
-)
-return Index(self.to_series().dt.weekofyear)
-
-weekofyear.__doc__ = week.__doc__
+def isocalendar(self) -> DataFrame:

Review Comment:
   Do we need docstring?



##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+.. note:: Returns have int64 type instead of UInt32 as is in pandas 
due to UInt32
+is not supported by spark
+
+Examples
+

Review Comment:
   Can we consider & have an example including `pd.NaT`? Seems like this case 
is not working currently:
   
   **Pandas**
   ```python
   >>> ser = pd.to_datetime(pd.Series(["2010-01-01", pd.NaT]))
   >>> ser.dt.isocalendar()
  year  week  day
   0  200953 5
   1  
   ```
   
   **Current implementation**
   ```python
   >>> ser = pd.to_datetime(pd.Series(["2010-01-01", pd.NaT]))
   >>> psser = ps.from_pandas(ser)
   # ValueError: cannot convert NA to integer
   ```
   
   In Spark, we can't use mixed type within single column, so we should convert 
`NA` to proper type (e.g. use NaN instead of `NA` for float type in this case) 
as below:
   
   ```python
   >>> psser.dt.week
   053.0
   1 NaN
   dtype: float64
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1310662293


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,55 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+Examples
+
+>>> dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', 
periods=4).to_series())
+>>> dfs.dt.isocalendar()
+year  week  day
+2019-12-29  2019527
+2019-12-30  2020 11
+2019-12-31  2020 12
+2020-01-01  2020 13
+>>> dfs.dt.isocalendar().week
+2019-12-2952
+2019-12-30 1
+2019-12-31 1
+2020-01-01 1
+Name: week, dtype: int64
+"""
+
+return_types = [self._data.index.dtype, int, int, int]
+
+def pandas_isocalendar(  # type: ignore[no-untyped-def]
+pdf,
+) -> ps.DataFrame[return_types]:  # type: ignore[valid-type]
+# cast to int64 due to UInt32 is not supported by spark

Review Comment:
   > cast to int64 due to UInt32 is not supported by spark

   Is this mean that the result is different from pandas ?? If so, let's add a 
"Note" to the docstring so that users recognize this difference instead of just 
adding the comment here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1310660048


##
python/pyspark/pandas/datetimes.py:
##
@@ -116,26 +117,55 @@ def pandas_microsecond(s) -> ps.Series[np.int32]:  # 
type: ignore[no-untyped-def
 def nanosecond(self) -> "ps.Series":
 raise NotImplementedError()
 
-# TODO(SPARK-42617): Support isocalendar.week and replace it.
-# See also https://github.com/pandas-dev/pandas/pull/33595.
-@property
-def week(self) -> "ps.Series":
+def isocalendar(self) -> "ps.DataFrame":
 """
-The week ordinal of the year.
+Calculate year, week, and day according to the ISO 8601 standard.
 
-.. deprecated:: 3.4.0
-"""
-warnings.warn(
-"weekofyear and week have been deprecated.",
-FutureWarning,
-)
-return self._data.spark.transform(lambda c: 
F.weekofyear(c).cast(LongType()))
+.. versionadded:: 4.0.0
 
-@property
-def weekofyear(self) -> "ps.Series":
-return self.week
+Returns
+---
+DataFrame
+With columns year, week and day.
 
-weekofyear.__doc__ = week.__doc__
+Examples
+
+>>> dfs = ps.from_pandas(pd.date_range(start='2019-12-29', freq='D', 
periods=4).to_series())
+>>> dfs.dt.isocalendar()
+year  week  day
+2019-12-29  2019527
+2019-12-30  2020 11
+2019-12-31  2020 12
+2020-01-01  2020 13

Review Comment:
   nit: Could you add a new line between each examples to split them when 
displaying in the documents.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-08-30 Thread via GitHub


itholic commented on code in PR #40420:
URL: https://github.com/apache/spark/pull/40420#discussion_r1310665191


##
python/pyspark/pandas/tests/indexes/test_datetime.py:
##
@@ -269,6 +256,10 @@ def test_map(self):
 mapper_pser = pd.Series([1, 2, 3], index=pidx)
 self.assert_eq(psidx.map(mapper_pser), pidx.map(mapper_pser))
 
+def test_isocalendar(self):

Review Comment:
   If we want to add a new test, then I think it's better to also move the 
related tests from "test_properties" to here.



##
python/pyspark/pandas/tests/indexes/test_datetime.py:
##
@@ -101,22 +102,8 @@ def test_properties(self):
 self.assert_eq(psidx.day_of_week, pidx.day_of_week)
 
 if LooseVersion(pd.__version__) >= LooseVersion("2.0.0"):

Review Comment:
   Let's remove this to focus on testing for the latest pandas version.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org