[Boston.pm] [job] Software Engineer, superpages.com, Waltham, MA

2007-05-02 Thread Ronald J Kimball
Superpages.com is looking for a team player to work in a dynamic group of developers. We are a brand new company (recent spin-off). Regardless we are a large, profitable and stable company. The office is located in Waltham, MA just next to exit 28b of I-95/128. Both senior and quick-learning

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
That worked. Thanks! Running lynx on my local copies of the *.html files works reasonably well, although the output is not what IE produces, and is harder for me to parse. A minor follow up question. Currently I have to run lynx from its own directory. Otherwise I got \lynx_w32\lynx.bat

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Tolkin, Steve
Thanks Jerrad, I actually tried lynx first. However, the html files are on a server that needs authentication. Even adding -auth my-user-id:my-pw To lynx was not enough. Here is the lynx output (I added the # as these are comments in the perl program): # Looking up [my proxy] # Making

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce
NTLM is bad, 'm-k? -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Chris Devers
On Wed, 2 May 2007, Tolkin, Steve wrote: > Q1. Is there a way to automate IE or Mozilla Firefox to save 100's of > files as text? Probably, but might it be easier to automate using `lynx -dump` (or better still, `links -dump`) ? If those produce output the way you want them, automating it

Re: [Boston.pm] Extract text from html preserving newlines

2007-05-02 Thread Jerrad Pierce
lynx -dump -- Free map of local environmental resources: http://CambridgeMA.GreenMap.org -- MOTD on Boomtime, the 49th of Discord, in the YOLD 3173: It is useless for sheep to pass resolutions in favor of vegetarianism while wolves remain of a different opinion.