Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-5181: Extract PYPI metadata from a webpage ......................................................................
Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py File infra/python/deps/pip_download.py: Line 86: regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>' > Can you add a comment explaining why we're using regexes to parse the HTML, I considered using beautifulsoup, but the problem is that we have to download and install it first before using it in this script. Let me know if you have some ideas how we can do this (I think it's definitely a better solution). Since the html is guaranteed to be structured a certain way according to the PEP 503 documentation, I think it's ok to use regex to parse. -- To view, visit http://gerrit.cloudera.org:8080/6579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
