Ah, shoot me. I had a .join() statement on the output queue but not on
in the input queue. So the threads for the input queue got terminated
before BeautifulSoup could get started. I went down that same rabbit
hole with CSVWriter the other day. *sigh*
Thanks for everyone's help.
Chris R.
--
Christopher Reimer via Python-list wrote:
> On 8/27/2017 1:31 PM, Peter Otten wrote:
>
>> Here's a simple example that extracts titles from generated html. It
>> seems to work. Does it resemble what you do?
> Your example is similar to my code when I'm using a list for the input
> to the parser.
Christopher Reimer writes:
> I have 20 read_threads requesting and putting pages into the output
> queue that is the input_queue for the parser.
Given how slow parsing is, you probably want to scrap the pages into
disk files, and then run the parser in parallel
On 8/27/2017 1:50 PM, MRAB wrote:
What if you don't sort the list? I ask because it sounds like you're
changing 2 variables (i.e. list->queue, sorted->unsorted) at the same
time, so you can't be sure that it's the queue that's the problem.
If I'm using a list, I'm using a for loop to input
On 8/27/2017 1:31 PM, Peter Otten wrote:
Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input
to the parser. You have soup_threads and write_threads, but no
On 2017-08-27 21:35, Christopher Reimer via Python-list wrote:
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and
coming in at different
Christopher Reimer via Python-list wrote:
> On 8/27/2017 11:54 AM, Peter Otten wrote:
>
>> The documentation
>>
>> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
>>
>> says you can make the BeautifulSoup object from a string or file.
>> Can you give a few more details
On 2017-08-27 20:35, Christopher Reimer via Python-list wrote:
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be
ents), I get the UserWarning that no parser wasn't
> explicitly set and a reference to line 80 in threading.py (which puts it
> in the RLock factory function).
>
> When I switched back to using list between the Requestor and Parser, the
> Parser worked again.
>
> BeautifulSoup doesn
puts it
in the RLock factory function).
When I switched back to using list between the Requestor and Parser, the
Parser worked again.
BeautifulSoup doesn't work with a threaded input queue?
Thank you,
Chris Reimer
--
https://mail.python.org/mailman/listinfo/python-list
12 matches
Mail list logo