[Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage

Taras Bobrovytsky (Code Review) Fri, 07 Apr 2017 09:22:00 -0700

Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5181: Extract PYPI metadata from a webpage
......................................................................



Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py
File infra/python/deps/pip_download.py:

Line 86:     regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>'
> Can you add a comment explaining why we're using regexes to parse the HTML,
I considered using beautifulsoup, but the problem is that we have to download 
and install it first before using it in this script. Let me know if you have 
some ideas how we can do this (I think it's definitely a better solution).

Since the html is guaranteed to be structured a certain way according to the 
PEP 503 documentation, I think it's ok to use regex to parse.


-- 
To view, visit http://gerrit.cloudera.org:8080/6579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage

Reply via email to