Such as the title, I want crawl a page with many urls, but only the ones in
a specified div are meaningful to me. So I want to write a plugin to filter
it, but I don't know which extension point should I choose.

The htmlparser filter can get the html content, but seems like process
after the "add to fetch list" operation. And the urlfilter can control the
fetch list, but I cant get the html content in it.

Look forward to any helpful replies, thx.

Reply via email to