Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
Ah, shoot me. I had a .join() statement on the output queue but not on in the input queue. So the threads for the input queue got terminated before BeautifulSoup could get started. I went down that same rabbit hole with CSVWriter the other day. *sigh* Thanks for everyone's help. Chris R. --

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
Christopher Reimer via Python-list wrote: > On 8/27/2017 1:31 PM, Peter Otten wrote: > >> Here's a simple example that extracts titles from generated html. It >> seems to work. Does it resemble what you do? > Your example is similar to my code when I'm using a list for the input > to the parser.

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Paul Rubin
Christopher Reimer writes: > I have 20 read_threads requesting and putting pages into the output > queue that is the input_queue for the parser. Given how slow parsing is, you probably want to scrap the pages into disk files, and then run the parser in parallel

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:50 PM, MRAB wrote: What if you don't sort the list? I ask because it sounds like you're changing 2 variables (i.e. list->queue, sorted->unsorted) at the same time, so you can't be sure that it's the queue that's the problem. If I'm using a list, I'm using a for loop to input

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:31 PM, Peter Otten wrote: Here's a simple example that extracts titles from generated html. It seems to work. Does it resemble what you do? Your example is similar to my code when I'm using a list for the input to the parser. You have soup_threads and write_threads, but no

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread MRAB
On 2017-08-27 21:35, Christopher Reimer via Python-list wrote: On 8/27/2017 1:12 PM, MRAB wrote: What do you mean by "queue (random order)"? A queue is sequential order, first-in-first-out. With 20 threads requesting 20 different pages, they're not going into the queue in sequential order

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 1:12 PM, MRAB wrote: What do you mean by "queue (random order)"? A queue is sequential order, first-in-first-out. With 20 threads requesting 20 different pages, they're not going into the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and coming in at different

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
Christopher Reimer via Python-list wrote: > On 8/27/2017 11:54 AM, Peter Otten wrote: > >> The documentation >> >> https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup >> >> says you can make the BeautifulSoup object from a string or file. >> Can you give a few more details

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread MRAB
On 2017-08-27 20:35, Christopher Reimer via Python-list wrote: On 8/27/2017 11:54 AM, Peter Otten wrote: The documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup says you can make the BeautifulSoup object from a string or file. Can you give a few more details

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
On 8/27/2017 11:54 AM, Peter Otten wrote: The documentation https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup says you can make the BeautifulSoup object from a string or file. Can you give a few more details where the queue comes into play? A small code sample would be

Re: BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Peter Otten
ents), I get the UserWarning that no parser wasn't > explicitly set and a reference to line 80 in threading.py (which puts it > in the RLock factory function). > > When I switched back to using list between the Requestor and Parser, the > Parser worked again. > > BeautifulSoup doesn

BeautifulSoup doesn't work with a threaded input queue?

2017-08-27 Thread Christopher Reimer via Python-list
puts it in the RLock factory function). When I switched back to using list between the Requestor and Parser, the Parser worked again. BeautifulSoup doesn't work with a threaded input queue? Thank you, Chris Reimer -- https://mail.python.org/mailman/listinfo/python-list