Hi all, I'm trying to extract a content from a web page, and i find the following example on internet :
=======START CODE====== String url = "http://www.bbc.com/news/uk-england-41255962"; URL _url = new URL(url); InputStream input = _url.openStream(); LinkContentHandler linkHandler = new LinkContentHandler(); ContentHandler textHandler = new BodyContentHandler(); ToHTMLContentHandler toHTMLHandler = new ToHTMLContentHandler(); TeeContentHandler teeHandler = new TeeContentHandler(linkHandler, textHandler, toHTMLHandler); Metadata metadata = new Metadata(); ParseContext parseContext = new ParseContext(); HtmlParser parser = new HtmlParser(); parser.parse(input, teeHandler, metadata, parseContext); content = (StringEscapeUtils.escapeHtml(textHandler.toString())); System.out.println("il contenuto "+content); =======END CODE======== But the output is useless, as i ===============START OUTPUT================== Accessibility links Skip to content Accessibility Help BBC iD Notifications BBC navigation Home Home News News Sport Weather Shop ==============END PART OF OUTPUT============= How i can understand why this happen, and also how can solve it (for some other web page, for example http://www.vogella.com/tutorials/AndroidTestingEspresso/article.html) -- Ing. Viscomi Francesco
