<keith...@beyondbb.com> wrote

Hi I am trying to read a html document into a text file

Hi, welcome to the tutor list.

First thing to point out is that HTML files are just text files with a particular structure. But so far as reading them in Python goes they are no different to any other text file.

purpose of spliting the data.

Now this is where it gets interesting, it depends what exactly you are trying to "split". What do you mean by the "data"? If its the HTML elements there are specialised Python tools that will make this a lot easier. But if it is simply splitting into separate lines, read on...

I am not sure I am doing this right. I have my html document, my text file and a python script I called convt.py on the desktop

Its usually a bad idea in Windows to do anythiong on the Desktop, keep that as a place for putting icons to launch programs. Put working files into separate project
folders.

in a folder.

I assume this means a folder on your Desktop?

That's slightly better but still leaves problems because the true path to your folder is:

C:\Documents and Settings\YourName\Desktop\YourFolder

Which Windows tries to hide most of the time!

Personally I'd recommend creating a "Projects" or "Work"
folder at the top level of one of your drives (if you have more than one)
and moving your folder under that. Then the full path becomes

D:\Work\MyFolder

Which is a lot easier to deal with and less likely to run into Windows "cleverness" issues.

I opened up IDLE (python GUI) opened the folder on my desk top and then went to run module.

OK, I'm still not 100% clear on what you are doing here but this is probably a good time to get to know the Windows command prompt. (Take a look at the box on the Getting Started topic in my tutorial for a brief intro.) Thats a better way to run your programs on real data IMHO.

I keep getting an error saying "No such file or directory: 'source.txt'

If you start a command prompt

Start->Run
Type CMD, Hit OK

At the prompt

C:\WINDOWS> or similar

type
python myscript.py

It should now find it.

I am new to python so I really don't know if I am doing this right.
#!/usr/bin/python
u=open("source.txt").read()
lines = u.split("<p><b>")
print lines[1]

You are splitting by the sequence of <p><b>.
That is each "line" starts with a paragraph tag followed immediately by a bold tag, is that really what you want? If so it looks fine.

You could modify your program so that it takes the filename at the command line, so you can process more than one file:

python myscript foo.html

or

python myscript.py bar.html

for example

HTH,


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to