RE: Removing HTML tags

Sergey Martynoff Sun, 04 Sep 2005 14:32:35 -0700

> Does wget have any option/facility to remove the HTML
> tags of the retrieved pages so that only the text
> content can be obtained? For example:


No, there's no such built-in functionality. But there are a lot of
external programs and scripts to do the thing. For example, lynx
with -dump option will print text representation of the page. Or,
there is html2text utility, available on many platforms. Try
googling "convert html to plain text" for many other ways to do
the convertion.


-- 
Sergey Martynoff

RE: Removing HTML tags

Reply via email to