Tim Armstrong has posted comments on this change. Change subject: IMPALA-5181: Extract PYPI metadata from a webpage ......................................................................
Patch Set 1: (4 comments) Thanks for solving this. I tried this out locally and it worked nicely for me. Only minor comments. http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py File infra/python/deps/pip_download.py: Line 73: if GET_METADATA_FROM_JSON: Do we need to keep the JSON API version? The JSON code is cleaner but it seems easier to only maintain one version - there's always a chance we accidentally break the non-JSON version and don't catch it on checkin. Line 83: url = '{0}/simple/{1}/'.format(PYPI_MIRROR, pkg_name) Mention that this is the PEP 503 format in case people want to look at the docs: https://www.python.org/dev/peps/pep-0503/ Line 86: regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>' I think this would be more robust if all of the * quantifiers were non-greedy, i.e. *? Otherwise it relies on there being only one <a>...</a> per line, which seems to be true in practice but isn't mentioned in the PEP 503 spec. Line 91: if ( nit: formatting of this seems weird. I'd probably prefer this (although I'll defer to you if this is standard python style). if (file_name.endswith('-{0}.tar.gz'.format(pkg_version)) or file_name.endswith('-{0}.tar.bz2'.format(pkg_version)) or file_name.endswith('-{0}.zip'.format(pkg_version))): -- To view, visit http://gerrit.cloudera.org:8080/6579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
