Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-02-13 Thread via GitHub


itholic commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1487597019


##
python/pyspark/pandas/frame.py:
##
@@ -10607,7 +10607,9 @@ def melt(
 name_like_string(name) if name is not None else 
"variable_{}".format(i)
 for i, name in enumerate(self._internal.column_label_names)
 ]
-elif isinstance(var_name, str):
+elif is_list_like(var_name):
+raise ValueError(f"{var_name=} must be a scalar.")
+else:

Review Comment:
   Fixed from: https://github.com/pandas-dev/pandas/pull/55948



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-01-29 Thread via GitHub


dongjoon-hyun commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1470668096


##
python/pyspark/pandas/namespace.py:
##
@@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, 
that_column_labels):
 if isinstance(obj, Series):
 num_series += 1
 series_names.add(obj.name)
-new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME))
+new_objs.append(obj.to_frame())

Review Comment:
   Thank you for the confirmation, @itholic .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-01-29 Thread via GitHub


itholic commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1470651441


##
python/pyspark/pandas/namespace.py:
##
@@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, 
that_column_labels):
 if isinstance(obj, Series):
 num_series += 1
 series_names.add(obj.name)
-new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME))
+new_objs.append(obj.to_frame())

Review Comment:
   Yes, actually this followups a bug fix from 
Pandas(https://github.com/pandas-dev/pandas/issues/15047) so I believe it 
should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-01-29 Thread via GitHub


dongjoon-hyun commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1470641012


##
python/pyspark/pandas/namespace.py:
##
@@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, 
that_column_labels):
 if isinstance(obj, Series):
 num_series += 1
 series_names.add(obj.name)
-new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME))
+new_objs.append(obj.to_frame())

Review Comment:
   Just a question, after this change, do we support old Pandas versions still?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-01-29 Thread via GitHub


itholic commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1470461086


##
dev/infra/Dockerfile:
##
@@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage 
matplotlib lxml
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.0' scipy coverage 
matplotlib lxml

Review Comment:
   Let met just use upperbound for now since it's only issue on PyPy3 CI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]

2024-01-28 Thread via GitHub


itholic commented on code in PR #44881:
URL: https://github.com/apache/spark/pull/44881#discussion_r1469178246


##
dev/infra/Dockerfile:
##
@@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \
 ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3
 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3
-RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage 
matplotlib lxml
+RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.0' scipy coverage 
matplotlib lxml

Review Comment:
   @zhengruifeng This breaks CI because Pandas 2.2.0 is not supported from 
PyPy3 yet.
   
   ```
   #18 2.142 ERROR: Ignored the following versions that require a different 
python version: 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 
1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 
Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 
Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 
Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 2.1.0 Requires-Python 
>=3.9; 2.1.0rc0 Requires-Python >=3.9; 2.1.1 Requires-Python >=3.9; 2.1.2 
Requires-Python >=3.9; 2.1.3 Requires-Python >=3.9; 2.1.4 Requires-Python 
>=3.9; 2.2.0 Requires-Python >=3.9; 2.2.0rc0 Requires-Python >=3.9
   #18 2.145 ERROR: Could not find a version that satisfies the requirement 
pandas==2.2.0 (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 
0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 
0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 
0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 
0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 
0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 
1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 
1.1.5, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 
1.3.4, 1.3.5, 1.4.0rc0, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0rc0, 1.5.0, 
1.5.1, 1.5.2, 1.5.3, 2.0.0rc0, 2.0.0rc1, 2.0.0, 2.0.1, 2.0.2, 2.0.3)
   #18 2.146 ERROR: No matching distribution found for pandas==2.2.0
   ```
   
   Seems like the current latest Pandas is 2.0.3 supported from PyPy3. Should 
we pending this PR until Pandas 2.2.0 is supported from PyPy3? Or should we get 
back to use upper-bound again?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org