Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1955590870 Thank you so much all for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
bjornjorgensen commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1955073306 Great work @itholic Thank you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1954505946 Merged to master. Thank you again, @itholic and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun closed pull request #44881: [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 URL: https://github.com/apache/spark/pull/44881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495345854 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: Just updated to resample work in old Pandas as well. Now it's safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1953618046 Just updated to resample work in old Pandas as well. I think we can just make it as deprecate for now to avoid breaking the existing pipeline. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1953603875 Oh, wait. I just remembered that we just follow the Pandas behavior and separately mention the breaking changes into [release note](https://github.com/apache/spark/blob/master/python/docs/source/migration_guide/pyspark_upgrade.rst). So maybe we should add a release note instead of reverting the breaking changes here? @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1953588339 We should not bring any breaking change. Let me address them. Thanks, @dongjoon-hyun for double checking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495320737 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: Oh, sorry that was my mistake. This should work even in old Pandas before Spark 4.0.0 release. Let me fix them to work both Pandas 2.2.0 and old Pandas. ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: ~~However, the current rule is that it should not be accompanied by such a breaking change unless the major version changes.~~ ~~This means that users should be able to use their pipeline as is, as long as they are using at least version 3.x of Spark.~~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495316022 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: However, the current rule is that it should not be accompanied by such a breaking change unless the major version changes. This means that users should be able to use their pipeline as is, as long as they are using at least version 3.x of Spark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495314548 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: > Even for the users who choose old Pandas libraries, Apache Spark enforces this breaking change Yes. In current policy, if we want to use latest Apache Spark then we cannot avoid having to follow the behavior of latest Pandas as well IIRC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1953573086 - Is the change of python/pyspark/pandas/resample.py safe? It breaks the previous behavior, so if we plan to release other minor release (Spark 3.5.0) this should not be included. - What happens when the users decide to use old Pandas (<= 2.2.0)? Using deprecated aliases (`Y`, `M`, `H`, `T`, `S`) wouldn't work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495305345 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: The background of my question is that `Data Science` team has been struggling when they validate their pipelines on new Spark versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495305345 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: The background of my question is that `Data Science` team has been struggling when they validates their pipelines on new Spark versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495304624 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: Ya, that comes to my second question. (https://github.com/apache/spark/pull/44881#pullrequestreview-1889581226). Even for the users who choose old Pandas libraries, Apache Spark enforces this breaking change, @itholic ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495302902 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: Yeah, Pandas 2.2.0 brings couple of breaking changes so we should make sure we ship this support after Spark 4.0.0. See [related update from Pandas release note](https://pandas.pydata.org/docs/whatsnew/v2.2.0.html#deprecate-aliases-m-q-y-etc-in-favour-of-me-qe-ye-etc-for-offsets) for more detail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
dongjoon-hyun commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495297012 ## python/pyspark/pandas/series.py: ## @@ -7092,15 +7092,15 @@ def resample( -- rule : str The offset string or object representing target conversion. -Currently, supported units are {'Y', 'A', 'M', 'D', 'H', -'T', 'MIN', 'S'}. +Currently, supported units are {'YE', 'A', 'ME', 'D', 'h', +'min', 'MIN', 's'}. Review Comment: Is this a breaking change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1953550280 I believe now this PR completed to address all of Pandas 2.2.0 behavior. cc @HyukjinKwon @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1495189446 ## python/pyspark/pandas/namespace.py: ## @@ -2554,7 +2554,10 @@ def resolve_func(psdf, this_column_labels, that_column_labels): if isinstance(obj, Series): num_series += 1 series_names.add(obj.name) -new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME)) +if not ignore_index and not should_return_series: +new_objs.append(obj.to_frame()) +else: +new_objs.append(obj.to_frame(DEFAULT_SERIES_NAME)) Review Comment: Related to https://github.com/pandas-dev/pandas/issues/15047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1492108932 ## python/pyspark/pandas/plot/matplotlib.py: ## @@ -363,10 +364,23 @@ def _args_adjust(self): if is_list_like(self.bottom): self.bottom = np.array(self.bottom) +def _ensure_frame(self, data): +return data + +def _calculate_bins(self, data, bins): +return bins Review Comment: Pandas recently pushed couple of commits for refactoring the internal plotting structure such as https://github.com/pandas-dev/pandas/pull/55850 or https://github.com/pandas-dev/pandas/pull/55872, so we also should inherits couple of internal methods to follow the latest Pandas behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on PR #44881: URL: https://github.com/apache/spark/pull/44881#issuecomment-1942942082 Yeah, Pandas fixes many bugs from Pandas 2.2.0 that brings couple of behavior changes Let me fix them. Thanks for the confirm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1467133120 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: Got it. btw Pandas 2.2.0 again introduces some breaking changes Let me address it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
HyukjinKwon commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1467123646 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: Let's pin this to 2.2.0 for now. I think we have seen some issues when automaticaly using latest pandas version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1466155995 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: AFAIK, pip automatically finds the most recent version that meets the conditions as below (Has this not worked well so far btw??): ``` (pyspark-dev-env) spark % pip install "pandas<=2.2.0" Collecting pandas<=2.2.0 ... Installing collected packages: pandas Successfully installed pandas-2.2.0 ``` But I'm okay with the way `==`. WDYT, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
zhengruifeng commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1466096686 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: or using `==`? just because `<=` cannot confirm 2.2.0 is used -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
itholic commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1466077514 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: I think maybe the CI would be broken in the future when the higher version will be released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-46858][PYTHON][PS][BUILD] Upgrade Pandas to 2.2.0 [spark]
zhengruifeng commented on code in PR #44881: URL: https://github.com/apache/spark/pull/44881#discussion_r1466068442 ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml Review Comment: shall we use `>=` to confirm 2.2.0+ is used? ## dev/infra/Dockerfile: ## @@ -91,10 +91,10 @@ RUN mkdir -p /usr/local/pypy/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3.8 && \ ln -sf /usr/local/pypy/pypy3.8/bin/pypy /usr/local/bin/pypy3 RUN curl -sS https://bootstrap.pypa.io/get-pip.py | pypy3 -RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.1.4' scipy coverage matplotlib lxml +RUN pypy3 -m pip install numpy 'six==1.16.0' 'pandas<=2.2.0' scipy coverage matplotlib lxml -ARG BASIC_PIP_PKGS="numpy pyarrow>=14.0.0 six==1.16.0 pandas<=2.1.4 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" +ARG BASIC_PIP_PKGS="numpy pyarrow>=14.0.0 six==1.16.0 pandas<=2.2.0 scipy plotly>=4.8 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2" Review Comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org