Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1487597019 ## python/pyspark/pandas/frame.py: ## @@ -10607,7 +10607,9 @@ def melt( name_like_string(name) if name is not None else "variable_{}".format(i) for i, name in enumerate(self._internal.column_label_names) ] -elif isinstance(var_name, str): +elif is_list_like(var_name): +raise ValueError(f"{var_name=} must be a scalar.") +else: Review Comment: Fixed from: https://github.com/pandas-dev/pandas/pull/55948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1470668096 ## python/pyspark/pandas/namespace.py: ## @@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, that_column_labels): if isinstance(obj, Series): num_series += 1 series_names.add(obj.name) -new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME)) +new_objs.append(obj.to_frame()) Review Comment: Thank you for the confirmation, @itholic . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1470651441 ## python/pyspark/pandas/namespace.py: ## @@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, that_column_labels): if isinstance(obj, Series): num_series += 1 series_names.add(obj.name) -new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME)) +new_objs.append(obj.to_frame()) Review Comment: Yes, actually this followups a bug fix from Pandas(https://github.com/pandas-dev/pandas/issues/15047) so I believe it should be fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1470641012 ## python/pyspark/pandas/namespace.py: ## @@ -2554,7 +2553,7 @@ def resolve_func(psdf, this_column_labels, that_column_labels): if isinstance(obj, Series): num_series += 1 series_names.add(obj.name) -new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME)) +new_objs.append(obj.to_frame()) Review Comment: Just a question, after this change, do we support old Pandas versions still? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1470461086 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.0' scipy coverage matplotlib lxml Review Comment: Let met just use upperbound for now since it's only issue on PyPy3 CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1469178246 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas==2.2.0' scipy coverage matplotlib lxml Review Comment: @zhengruifeng This breaks CI because Pandas 2.2.0 is not supported from PyPy3 yet. ``` #18 2.142 ERROR: Ignored the following versions that require a different python version: 1.25.0 Requires-Python >=3.9; 1.25.0rc1 Requires-Python >=3.9; 1.25.1 Requires-Python >=3.9; 1.25.2 Requires-Python >=3.9; 1.26.0 Requires-Python <3.13,>=3.9; 1.26.0b1 Requires-Python <3.13,>=3.9; 1.26.0rc1 Requires-Python <3.13,>=3.9; 1.26.1 Requires-Python <3.13,>=3.9; 1.26.2 Requires-Python >=3.9; 1.26.3 Requires-Python >=3.9; 2.1.0 Requires-Python >=3.9; 2.1.0rc0 Requires-Python >=3.9; 2.1.1 Requires-Python >=3.9; 2.1.2 Requires-Python >=3.9; 2.1.3 Requires-Python >=3.9; 2.1.4 Requires-Python >=3.9; 2.2.0 Requires-Python >=3.9; 2.2.0rc0 Requires-Python >=3.9 #18 2.145 ERROR: Could not find a version that satisfies the requirement pandas==2.2.0 (from versions: 0.1, 0.2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.4.0rc0, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.5.0rc0, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 2.0.0rc0, 2.0.0rc1, 2.0.0, 2.0.1, 2.0.2, 2.0.3) #18 2.146 ERROR: No matching distribution found for pandas==2.2.0 ``` Seems like the current latest Pandas is 2.0.3 supported from PyPy3. Should we pending this PR until Pandas 2.2.0 is supported from PyPy3? Or should we get back to use upper-bound again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org