Hi All, I want to ask question about NecoHTML parser that is used by
Nutch. I want to know
whether we can have textExtraction funtion extracting displayed data in
HTML documents
between <body> and </body> tags ?
This textExtraction function can work like below:
case 1: Assume that our html document is given as:
<html>
<body>
<a href="example.com"> this is an example </a>
</body>
</html>
the textExtraction function returns the string "this is an example". for
case 1.
<html>
<body>
<a href="example.com"> </a>
</body>
</html>
in this case textExtraction function returns null for case 2.
Is anybody know how to perform that by using NecoHTML parser?
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general