Hi there,

I am working on a project that need to identify contact points on company's
website and used for the purpose of enhancing security.

Right now, I managed to crawl several rounds of sites. The next step will
be to parse the HTML pages and locate where the contact information is. In
this case, I am only interested in email addresses and phone numbers....

Here is what I am planning to do, we can write a map reduce jobs to parse
HTML file and use things like regular expression in combo with
Jsoup/Beautifulsoup HTML parsers to find the regular expression.

However, I am wondering is there any parser plugin that has already been
implemented and maybe tested used for this purpose?

Also, any feedback how to achieve this is much appreciated!

Best regards,

Bin

Reply via email to