T o n g 写道:
For not-so-simple tasks, you need not-so-simple tools. Depending on how
much time you'd like to investigate into such not-so-simple tools, take a
look at lib?, sgrep or the xpath language.
Sure. libwww and sgrep are tools, while xpath is a language. I believe I
should try
Steve Kemp 写道:
You might enjoy my html-tool command which would do the
job for you via:
Thank you very much for mentioning this tool. A first glance it seems
this tool is just too wonderful, it is just designed to solve problems
like mine. However after I try it what I worry most
On Sun Jan 31, 2010 at 10:54:46 +0800, Zhang Weiwu wrote:
I want to remove all advertisements in my 100 html files. They are
pretty neatly classed, like the following:
div class=advertisement
...
/div
You might enjoy my html-tool command which would do the
job for you via:
Zhang Weiwu 写道:
Sure. libwww and sgrep are tools, while xpath is a language. I believe I
should try xpath because I might use use it in other places too, but
what tool to use for xpath?
Now I think I can answer my own question, partly at least. There is a
good tool for xpath that is named
On Sun, 31 Jan 2010 20:05:46 +0800, Zhang Weiwu wrote:
$ tidy -q -asxml -utf8 page_07_zh.html | xpath -e
'//d...@class=advertisement]'
exactly. Glad that you found both tidy libxml-xpath-perl, and solve the
problem yourself.
--
Tong (remove underscore(s) to reply)
Hello. I believe this is a common case and must have been discussed
before on various other forums like awk/sed/regular expression group.
However I could not google them out. You would be helping me a lot if
you simply point to a reference to a solution.
I want to remove all advertisements in my
On Sun, 31 Jan 2010 10:54:46 +0800, Zhang Weiwu wrote:
I want to remove all advertisements in my 100 html files. They are
pretty neatly classed, like the following:
div class=advertisement
...
/div
However I could not simply do this:
s/div class=advertisement.*/div//
Because it is
On Sun, 31 Jan 2010 10:54:46 +0800
Zhang Weiwu zhangwe...@realss.com wrote:
...
I want to remove all advertisements in my 100 html files. They are
pretty neatly classed, like the following:
div class=advertisement
...
/div
However I could not simply do this:
s/div
8 matches
Mail list logo