On 10/03/2021 04:35, S Monzur wrote:
Thanks! I ended up using beautiful soup to remove the html tags and create
three lists (titles of article, publications dates, main body) but am still
facing a problem where the list is not properly storing the main body.
There is something wrong with my code for that section, and any comment
would be really helpful!
ListFile Text
<https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
How did you create that file?
> BeautifulSoup code for removing tags <https://pastebin.com/qvbVMUGD>
print(bodytext[0]) # so here, I'm only getting the first paragraph of the body
of the first article, not all of the first article
print(bodytext[1]) # here, I'm getting the second paragraph of the first
article, and not the second article
It may help if you process the individual articles with beautiful soup,
not the whole list at once.
--
https://mail.python.org/mailman/listinfo/python-list