Lars Volker has posted comments on this change. Change subject: IMPALA-5181: Extract PYPI metadata from a webpage ......................................................................
Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py File infra/python/deps/pip_download.py: Line 86: regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>' > I think this would be more robust if all of the * quantifiers were non-gree Can you add a comment explaining why we're using regexes to parse the HTML, i.e. why we can't use beautifulsoup or lxml or the like? -- To view, visit http://gerrit.cloudera.org:8080/6579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
