Yikun edited a comment on pull request #34717: URL: https://github.com/apache/spark/pull/34717#issuecomment-979659317
Just to start the discussion, by using below sql according [1], we can got the all download stat of Pandas in last 3 months. ```SQL SELECT file.version AS file_version, COUNT(*) AS num_downloads, FROM `the-psf.pypi.file_downloads` WHERE file.project = 'pandas' AND -- Only query the last 3 months of history DATE(timestamp) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 3 MONTH) AND CURRENT_DATE() GROUP BY `file_version` ORDER BY `num_downloads` DESC ``` Here is the Top 20 data, about 77% of the overall data: | version | number | percent -- | -- | -- | -- 1 | 0.25.3 | 35149221 | 14.28% 2 | 1.1.5 | 28722806 | 11.67% 3 | 1.3.4 | 20944236 | 8.51% 4 | 1.3.3 | 16861573 | 6.85% 5 | 0.24.2 | 13235233 | 5.38% 6 | 1.0.5 | 9201989 | 3.74% 7 | 1.3.2 | 9077326 | 3.69% 8 | 1.2.5 | 7902532 | 3.21% 9 | 1.2.4 | 5754284 | 2.34% 10 | 1.1.4 | 5710439 | 2.32% 11 | 1.1.0 | 4760847 | 1.93% 12 | 1.1.2 | 4621441 | 1.88% 13 | 1.2.3 | 4607043 | 1.87% 14 | 1.0.3 | 4601230 | 1.87% 15 | 0.23.4 | 4251044 | 1.73% 16 | 0.25.0 | 3862673 | 1.57% 17 | 1.2.1 | 2952346 | 1.20% 18 | 1.0.1 | 2690006 | 1.09% 19 | 0.22.0 | 2680710 | 1.09% 20 | 1.2.0 | 2645339 | 1.07% 21 | 0.24.1 | 2635411 | 1.07% [1] https://packaging.python.org/guides/analyzing-pypi-package-downloads/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org