Re: Writing Multiple files at a times

2014-06-30 Thread subhabangalore
On Sunday, June 29, 2014 4:19:27 PM UTC+5:30, subhaba...@gmail.com wrote:
 Dear Group,
 
 
 
 I am trying to crawl multiple URLs. As they are coming I want to write them 
 as string, as they are coming, preferably in a queue. 
 
 
 
 If any one of the esteemed members of the group may kindly help.
 
 
 
 Regards,
 
 Subhabrata Banerjee.

Dear Group,

Thank you for your kind suggestion. But I am not being able to sort out,
fp = open( scraped/body{:05d}.htm.format( n ), w ) 
please suggest.

Regards,
Subhabrata Banerjee. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing Multiple files at a times

2014-06-30 Thread Denis McMahon
On Mon, 30 Jun 2014 12:23:08 -0700, subhabangalore wrote:

 Thank you for your kind suggestion. But I am not being able to sort out,
 fp = open( scraped/body{:05d}.htm.format( n ), w ) 
 please suggest.

look up the python manual for string.format() and open() functions.

The line indicated opens a file for write whose name is generated by the 
string.format() function by inserting the number N formatted to 5 digits 
with leading zeroes into the string scraped/bodyN.htm

It expects you to have a subdir called scraped below the dir you're 
executing the code in.

Also, this newsgroup is *NOT* a substitute for reading the manual for 
basic python functions and methods.

Finally, if you don't understand basic string and file handling in 
python, why on earth are you trying to write code that arguably needs a 
level of competence in both? Perhaps as your starter project you should 
try something simpler, print hello world is traditional.

To understand the string formatting, try:

print hello {:05d} world.format( 5 )
print hello {:05d} world.format( 50 )
print hello {:05d} world.format( 500 )
print hello {:05d} world.format( 5000 )
print hello {:05d} world.format( 5 )

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing Multiple files at a times

2014-06-29 Thread Mark Lawrence

On 29/06/2014 11:49, subhabangal...@gmail.com wrote:

Dear Group,

I am trying to crawl multiple URLs. As they are coming I want to write them as 
string, as they are coming, preferably in a queue.

If any one of the esteemed members of the group may kindly help.

Regards,
Subhabrata Banerjee.



https://docs.python.org/3/library/urllib.html#module-urllib
https://pypi.python.org/pypi/requests/2.3.0

https://docs.python.org/3/library/queue.html

--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com


--
https://mail.python.org/mailman/listinfo/python-list


Re: Writing Multiple files at a times

2014-06-29 Thread Roy Smith
In article mailman.11325.1404048700.18130.python-l...@python.org,
 Dave Angel da...@davea.name wrote:

 subhabangal...@gmail.com Wrote in message:
  Dear Group,
  
  I am trying to crawl multiple URLs. As they are coming I want to write them 
  as string, as they are coming, preferably in a queue. 
  
  If any one of the esteemed members of the group may kindly help.
  
 
 From your subject line,  it appears you want to keep multiple files open, 
 and write to each in an arbitrary order.  That's no problem,  up to the 
 operating system limits.  Define a class that holds the URL information and 
 for each instance,  add an attribute for an output file handle. 
 
 Don't forget to close each file when you're done with the corresponding URL.

One other thing to mention is that if you're doing anything with 
fetching URLs from Python, you almost certainly want to be using Kenneth 
Reitz's excellent requests module (http://docs.python-requests.org/).  
The built-in urllib support in Python works, but requests is so much 
simpler to use.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing Multiple files at a times

2014-06-29 Thread subhabangalore
On Sunday, June 29, 2014 7:31:37 PM UTC+5:30, Roy Smith wrote:
 In article mailman.11325.1404048700.18130.python-l...@python.org,
 
  Dave Angel da...@davea.name wrote:
 
 
 
  subhabangal...@gmail.com Wrote in message:
 
   Dear Group,
 
   
 
   I am trying to crawl multiple URLs. As they are coming I want to write 
   them 
 
   as string, as they are coming, preferably in a queue. 
 
   
 
   If any one of the esteemed members of the group may kindly help.
 
   
 
  
 
  From your subject line,  it appears you want to keep multiple files open, 
 
  and write to each in an arbitrary order.  That's no problem,  up to the 
 
  operating system limits.  Define a class that holds the URL information 
  and 
 
  for each instance,  add an attribute for an output file handle. 
 
  
 
  Don't forget to close each file when you're done with the corresponding URL.
 
 
 
 One other thing to mention is that if you're doing anything with 
 
 fetching URLs from Python, you almost certainly want to be using Kenneth 
 
 Reitz's excellent requests module (http://docs.python-requests.org/).  
 
 The built-in urllib support in Python works, but requests is so much 
 
 simpler to use.

Dear Group,

Sorry if I miscommunicated. 

I am opening multiple URLs with urllib.open, now one Url has huge html source 
files, like that each one has. As these files are read I am trying to 
concatenate them and put in one txt file as string. 
From this big txt file I am trying to take out each html file body of each URL 
and trying to write and store them with attempts like,

for i, line in enumerate(file1):
f = open(/python27/newfile_%i.txt %i,'w')
f.write(line)
f.close()

Generally not much of an issue, but was thinking of some better options.

Regards,
Subhabrata Banerjee. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Writing Multiple files at a times

2014-06-29 Thread Denis McMahon
On Sun, 29 Jun 2014 10:32:00 -0700, subhabangalore wrote:

 I am opening multiple URLs with urllib.open, now one Url has huge html
 source files, like that each one has. As these files are read I am
 trying to concatenate them and put in one txt file as string.
 From this big txt file I am trying to take out each html file body of
 each URL and trying to write and store them

OK, let me clarify what I think you said.

First you concatenate all the web pages into a single file.
Then you extract all the page bodies from the single file and save them 
as separate files.

This seems a silly way to do things, why don't you just save each html 
body section as you receive it?

This sounds like it should be something as simple as:

from BeautifulSoup import BeautifulSoup
import requests

urlList = [ 
http://something/;, 
http://something/;, 
http://something/;, 
... ]

n = 0
for url in urlList:
r = requests.get( url )
soup = BeautifulSoup( r.content )
body = soup.find( body )
fp = open( scraped/body{:05d}.htm.format( n ), w )
fp.write( body.prettify() )
fp.close
n += 1

will give you:

scraped/body0.htm
scraped/body1.htm
scraped/body2.htm


for as many urls as you have in your url list. (make sure the target 
directory exists!)

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list