Fokko commented on pull request #28957:
URL: https://github.com/apache/spark/pull/28957#issuecomment-652022449


   My pleasure @holdenk 
   
   I ran a query against the public dataset of Google. They have a dataset that 
contains all the public pypi downloads:
   ```sql
   SELECT 
     EXTRACT(YEAR FROM timestamp) AS year,
     EXTRACT(MONTH FROM timestamp) AS month,
     SAFE.SUBSTR(details.python, 0, 3) AS python_version,
     COUNT(*) AS num_downloads
   FROM `the-psf.pypi.downloads*`
   WHERE file.project = 'pyspark'
   AND SAFE.SUBSTR(details.python, 0, 3) IS NOT NULL
   GROUP BY 
     EXTRACT(YEAR FROM timestamp),
     EXTRACT(MONTH FROM timestamp),
     SAFE.SUBSTR(details.python, 0, 3)
   ```
   
   This gives us the following per month:
   
![image](https://user-images.githubusercontent.com/1134248/86172253-5d7b5c00-bb1e-11ea-96b2-3c779b5f48d4.png)
   
   We can see that the majority uses 3.7 and 3.6. However, there is still a 
share of 3.5 and 2.7.
   
   If we look at the proportional share of people who'm using a compatible 
version:
   
   ```sql
   SELECT 
     EXTRACT(YEAR FROM timestamp) AS year,
     EXTRACT(MONTH FROM timestamp) AS month,
     if(SAFE.SUBSTR(details.python, 0, 3) >= '3.6', 'ok', 'not_ok') as OK,
     COUNT(*) AS num_downloads
   FROM `the-psf.pypi.downloads*`
   WHERE file.project = 'pyspark'
   AND SAFE.SUBSTR(details.python, 0, 3) IS NOT NULL
   GROUP BY 
     EXTRACT(YEAR FROM timestamp),
     EXTRACT(MONTH FROM timestamp),
     if(SAFE.SUBSTR(details.python, 0, 3) >= '3.6', 'ok', 'not_ok')
   ```
   
   Then the majority is ok:
   
![image](https://user-images.githubusercontent.com/1134248/86173180-dd55f600-bb1f-11ea-9dec-e0419606214d.png)
   
   The next question would be if Python <3.6 users are on 3.0 or on 2.x. My 
guess would be the latter, so we're (mostly) safe deprecating the old versions 
of Python.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to