[Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage

Tim Armstrong (Code Review) Thu, 06 Apr 2017 23:05:16 -0700

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5181: Extract PYPI metadata from a webpage
......................................................................



Patch Set 1:

(4 comments)

Thanks for solving this. I tried this out locally and it worked nicely for me. 
Only minor comments.

http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py
File infra/python/deps/pip_download.py:

Line 73:   if GET_METADATA_FROM_JSON:
Do we need to keep the JSON API version? The JSON code is cleaner but it seems 
easier to only maintain one version - there's always a chance we accidentally 
break the non-JSON version and don't catch it on checkin.


Line 83:     url = '{0}/simple/{1}/'.format(PYPI_MIRROR, pkg_name)
Mention that this is the PEP 503 format in case people want to look at the 
docs: https://www.python.org/dev/peps/pep-0503/


Line 86:     regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>'
I think this would be more robust if all of the * quantifiers were non-greedy, 
i.e. *? 

Otherwise it relies on there being only one <a>...</a> per line, which seems to 
be true in practice but isn't mentioned in the PEP 503 spec.


Line 91:       if (
nit: formatting of this seems weird. I'd probably prefer this (although I'll 
defer to you if this is standard python style).

      if (file_name.endswith('-{0}.tar.gz'.format(pkg_version)) or
          file_name.endswith('-{0}.tar.bz2'.format(pkg_version)) or
          file_name.endswith('-{0}.zip'.format(pkg_version))):


-- 
To view, visit http://gerrit.cloudera.org:8080/6579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage

Reply via email to