Re: Writing Multiple files at a times
On Sunday, June 29, 2014 4:19:27 PM UTC+5:30, subhaba...@gmail.com wrote: Dear Group, I am trying to crawl multiple URLs. As they are coming I want to write them as string, as they are coming, preferably in a queue. If any one of the esteemed members of the group may kindly help. Regards, Subhabrata Banerjee. Dear Group, Thank you for your kind suggestion. But I am not being able to sort out, fp = open( scraped/body{:05d}.htm.format( n ), w ) please suggest. Regards, Subhabrata Banerjee. -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing Multiple files at a times
On Mon, 30 Jun 2014 12:23:08 -0700, subhabangalore wrote: Thank you for your kind suggestion. But I am not being able to sort out, fp = open( scraped/body{:05d}.htm.format( n ), w ) please suggest. look up the python manual for string.format() and open() functions. The line indicated opens a file for write whose name is generated by the string.format() function by inserting the number N formatted to 5 digits with leading zeroes into the string scraped/bodyN.htm It expects you to have a subdir called scraped below the dir you're executing the code in. Also, this newsgroup is *NOT* a substitute for reading the manual for basic python functions and methods. Finally, if you don't understand basic string and file handling in python, why on earth are you trying to write code that arguably needs a level of competence in both? Perhaps as your starter project you should try something simpler, print hello world is traditional. To understand the string formatting, try: print hello {:05d} world.format( 5 ) print hello {:05d} world.format( 50 ) print hello {:05d} world.format( 500 ) print hello {:05d} world.format( 5000 ) print hello {:05d} world.format( 5 ) -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing Multiple files at a times
On 29/06/2014 11:49, subhabangal...@gmail.com wrote: Dear Group, I am trying to crawl multiple URLs. As they are coming I want to write them as string, as they are coming, preferably in a queue. If any one of the esteemed members of the group may kindly help. Regards, Subhabrata Banerjee. https://docs.python.org/3/library/urllib.html#module-urllib https://pypi.python.org/pypi/requests/2.3.0 https://docs.python.org/3/library/queue.html -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing Multiple files at a times
In article mailman.11325.1404048700.18130.python-l...@python.org, Dave Angel da...@davea.name wrote: subhabangal...@gmail.com Wrote in message: Dear Group, I am trying to crawl multiple URLs. As they are coming I want to write them as string, as they are coming, preferably in a queue. If any one of the esteemed members of the group may kindly help. From your subject line, it appears you want to keep multiple files open, and write to each in an arbitrary order. That's no problem, up to the operating system limits. Define a class that holds the URL information and for each instance, add an attribute for an output file handle. Don't forget to close each file when you're done with the corresponding URL. One other thing to mention is that if you're doing anything with fetching URLs from Python, you almost certainly want to be using Kenneth Reitz's excellent requests module (http://docs.python-requests.org/). The built-in urllib support in Python works, but requests is so much simpler to use. -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing Multiple files at a times
On Sunday, June 29, 2014 7:31:37 PM UTC+5:30, Roy Smith wrote: In article mailman.11325.1404048700.18130.python-l...@python.org, Dave Angel da...@davea.name wrote: subhabangal...@gmail.com Wrote in message: Dear Group, I am trying to crawl multiple URLs. As they are coming I want to write them as string, as they are coming, preferably in a queue. If any one of the esteemed members of the group may kindly help. From your subject line, it appears you want to keep multiple files open, and write to each in an arbitrary order. That's no problem, up to the operating system limits. Define a class that holds the URL information and for each instance, add an attribute for an output file handle. Don't forget to close each file when you're done with the corresponding URL. One other thing to mention is that if you're doing anything with fetching URLs from Python, you almost certainly want to be using Kenneth Reitz's excellent requests module (http://docs.python-requests.org/). The built-in urllib support in Python works, but requests is so much simpler to use. Dear Group, Sorry if I miscommunicated. I am opening multiple URLs with urllib.open, now one Url has huge html source files, like that each one has. As these files are read I am trying to concatenate them and put in one txt file as string. From this big txt file I am trying to take out each html file body of each URL and trying to write and store them with attempts like, for i, line in enumerate(file1): f = open(/python27/newfile_%i.txt %i,'w') f.write(line) f.close() Generally not much of an issue, but was thinking of some better options. Regards, Subhabrata Banerjee. -- https://mail.python.org/mailman/listinfo/python-list
Re: Writing Multiple files at a times
On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote: I am opening multiple URLs with urllib.open, now one Url has huge html source files, like that each one has. As these files are read I am trying to concatenate them and put in one txt file as string. From this big txt file I am trying to take out each html file body of each URL and trying to write and store them OK, let me clarify what I think you said. First you concatenate all the web pages into a single file. Then you extract all the page bodies from the single file and save them as separate files. This seems a silly way to do things, why don't you just save each html body section as you receive it? This sounds like it should be something as simple as: from BeautifulSoup import BeautifulSoup import requests urlList = [ http://something/;, http://something/;, http://something/;, ... ] n = 0 for url in urlList: r = requests.get( url ) soup = BeautifulSoup( r.content ) body = soup.find( body ) fp = open( scraped/body{:05d}.htm.format( n ), w ) fp.write( body.prettify() ) fp.close n += 1 will give you: scraped/body0.htm scraped/body1.htm scraped/body2.htm for as many urls as you have in your url list. (make sure the target directory exists!) -- Denis McMahon, denismfmcma...@gmail.com -- https://mail.python.org/mailman/listinfo/python-list