Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
I'm battling to understand this. I am switching to python while in a production environment so I am tossed into the deep end. Python seems easier to learn than other languages, but some of the conventions still trip me up. Thanks for the link - I'll have to go through all the previous chapters to

Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
Runsun Pan helped me out with the following: You can also try the following very primitive solution that I sometimes use to extract simple information in a quick and dirty way: def extract(text,s1,s2): ''' Extract strings wrapped between s1 and s2. t=this is a

Re: Mining strings from a HTML document.

2006-01-26 Thread Cameron Laird
In article [EMAIL PROTECTED], Derick van Niekerk [EMAIL PROTECTED] wrote: . . . I suppose very few books on python start off with HTML processing in stead of 'hello world' :p . .

Re: Mining strings from a HTML document.

2006-01-26 Thread Magnus Lycka
Derick van Niekerk wrote: Could you/anyone explain the 4 lines of code to me though? A crash course in Python shorthand? What does it mean when you use two sets of brackets as in : beg = [1,0][text.startswith(s1)] ? It's not as strange as it looks. [1,0] is a list. If you put [] after a list,

Re: Mining strings from a HTML document.

2006-01-26 Thread Runsun Pan
def extract(text,s1,s2): ''' Extract strings wrapped between s1 and s2. t=this is a spantest/span for spanextract()/span that spandoes multiple extract/span extract(t,'span','/span') ['test', 'extract()', 'does multiple extract'] '''

Re: Mining strings from a HTML document.

2006-01-26 Thread Derick van Niekerk
Thanks Guys! I've written several functions yesterday to import from different types of raw data including html and different text formats. In the end I never used the extract function or the parser module, but your advice put me on the right track. All these functions are now in a single object

Mining strings from a HTML document.

2006-01-25 Thread Derick van Niekerk
Hi, I am new to Python and have been doing most of my work with PHP until now. I find Python to be *much* nicer for the development of local apps (running on my machine) but I am very new to the Python way of thinking and I don't realy know where to start other than just by doing it...so far I'm

Re: Mining strings from a HTML document.

2006-01-25 Thread Chris Lasher
I think Jay's advice is solid: you shouldn't rule out HTML parsing. It's not too scary and it's probably not overboard. Using a common HTML parsing library saves you from having to write and debug your own parser. Try looking at Dive Into Python's chapter on it, first.

Re: Mining strings from a HTML document.

2006-01-25 Thread Derick van Niekerk
Thanks, Jay! I'll try this out today. Trying to write my own parser is such a pain. This BeatifullSoup script is very nice! I'll give it a try. If you can help me out with an example of how to do what I explained, I would appreciate it. I actually finished doing an import last night, but there