On 9/6/2017 9:26 PM, Christopher Reimer wrote:
On Sep 6, 2017, at 9:14 PM, Stefan Ram wrote:
I can run this (your code) without an error here (Python 3.6.0),
from a file named "Scraper1.py":
I'll check tomorrow. I recently switched from 3.5.x to 3.6.1 in the
Greetings,
After reading everyone's comments and doing a little more research, I
re-implemented my function as a callable class.
def __call__(self, key, value):
if key not in self._methods:
return value
return self._methods[key](value)
This behaves like my
Greetings,
I was playing around this piece of example code (written from memory).
def filter_text(key, value):
def do_nothing(text): return text
return {'this': call_this,
'that': call_that,
'what': do_nothing
}[key](value)
Is
Ah, shoot me. I had a .join() statement on the output queue but not on
in the input queue. So the threads for the input queue got terminated
before BeautifulSoup could get started. I went down that same rabbit
hole with CSVWriter the other day. *sigh*
Thanks for everyone's help.
Chris R.
--
On 8/27/2017 1:50 PM, MRAB wrote:
What if you don't sort the list? I ask because it sounds like you're
changing 2 variables (i.e. list->queue, sorted->unsorted) at the same
time, so you can't be sure that it's the queue that's the problem.
If I'm using a list, I'm using a for loop to input
On 8/27/2017 1:31 PM, Peter Otten wrote:
Here's a simple example that extracts titles from generated html. It seems
to work. Does it resemble what you do?
Your example is similar to my code when I'm using a list for the input
to the parser. You have soup_threads and write_threads, but no
On 8/27/2017 1:12 PM, MRAB wrote:
What do you mean by "queue (random order)"? A queue is sequential
order, first-in-first-out.
With 20 threads requesting 20 different pages, they're not going into
the queue in sequential order (i.e., 0, 1, 2, ..., 17, 18, 19) and
coming in at different
On 8/27/2017 11:54 AM, Peter Otten wrote:
The documentation
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup
says you can make the BeautifulSoup object from a string or file.
Can you give a few more details where the queue comes into play? A small
code sample would be
Greetings,
I have Python 3.6 script on Windows to scrape comment history from a
website. It's currently set up this way:
Requestor (threads) -> list -> Parser (threads) -> queue -> CVSWriter
(single thread)
It takes 15 minutes to process ~11,000 comments.
When I replaced the list with a
On 5/20/2017 1:19 AM, dieter wrote:
If your (590) pages are linked together (such that you must fetch
a page to get the following one) and page fetching is the limiting
factor, then this would limit the parallelizability.
The pages are not linked together. The URL requires a page number. If I
Greetings,
I was playing around with a piece of code to remove lowercase letters
and leave behind uppercase letters from a string when I got unexpected
results.
string = 'Whiskey Tango Foxtrot'
list(filter((lambda x: not x.islower()), string))
['W', ' ', 'T', ' ', 'F']
Note
On 4/26/2016 8:56 PM, Random832 wrote:
what exactly do you mean by property decorators? If you're just
accessing them in a dictionary what's the benefit over having the
values be simple attributes rather than properties?
After considering the feedback I got for sanity checking my code, I've
12 matches
Mail list logo