Ravi Teja schrieb: > Thomas Ploch wrote: >> Ravi Teja schrieb: >>> Thomas Ploch wrote: >>>> Hi folks, >>>> >>>> I have a data structure that looks like this: >>>> >>>> d = { >>>> 'url1': { >>>> 'emails': ['a', 'b', 'c',...], >>>> 'matches': ['d', 'e', 'f',...] >>>> }, >>>> 'url2': {... >>>> } >>>> >>>> This dictionary will get _very_ big, so I want to write it somehow to a >>>> file after it has grown to a certain size. >>>> >>>> How would I achieve that? >>>> >>>> Thanks, >>>> Thomas >>> Pickle/cPickle are standard library modules that can persist data. >>> But in this case, I would recommend ZODB/Durus. >>> >>> (Your code example scares me. I hope you have benevolent purposes for >>> that application.) >>> >>> Ravi Teja. >>> >> Thanks, but why is this code example scaring you? >> >> Thomas > > The code indicates that you are trying to harvest a _very_ (as you put > it) large set of email addresses from web pages. With my limited > imagination, I can think of only one group of people who would need to > do that. But considering that you write good English, you must not be > one of those mean people that needed me to get a new email account just > for posting to Usenet :-). > > Ravi Teja. >
Oh, well, yes you are right that this application is able to harvest email addresses. But it can do much more than that. It has a text matching engine, that according to given meta keywords can scan or not scan documents in the web and harvest all kinds of information. It can also be fed with callbacks for each of the Content-Types. I know that the email matching engine is a kind of a 'grey zone', and I asked myself, if it needs the email stuff. But I mean you could easily include the email regex to the text matching engine yourself, so I decided to add this functionality (it is 'OFF' by default :-) ). Thomas P.S.: No, I am a good person. -- http://mail.python.org/mailman/listinfo/python-list