Hi, With the Boilerpipe patch enabled i get an exception in DOMBuilder.endElement when parsing certain pages. Looking at the pages at random it seems the problem is limited to sites with frames.
Commenting out the two lines of code in the method `fixes` the problem it looks like everything else still works. m_elemStack.pop(); m_currentNode = m_elemStack.isEmpty() ? null : (Node)m_elemStack.peek(); But, as i am unsure what this code is doing and more imporantly why it is needed i'm checking here to see if someone can offer an explanation. Cheers, 2011-07-15 15:11:01,095 ERROR tika.TikaParser - Error parsing http://www.zeemuseum.nl/ java.util.EmptyStackException at java.util.Stack.peek(Stack.java:85) at java.util.Stack.pop(Stack.java:67) at org.apache.nutch.parse.tika.DOMBuilder.endElement(DOMBuilder.java:349) at org.apache.tika.parser.html.BoilerpipeContentHandler.endDocument(BoilerpipeContentHandler.java:315) at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115) at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212) at org.apache.tika.sax.TextContentHandler.endDocument(TextContentHandler.java:57) at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115) at org.ccil.cowan.tagsoup.Parser.eof(Parser.java:639) at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:589) at org.apache.tika.parser.html.HtmlParser$1.scan(HtmlParser.java:209) at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449) at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:213) at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:115) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35) at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.lang.Thread.run(Thread.java:662) 2011-07-15 15:11:01,095 WARN parse.ParseSegment - Error parsing: http://www.zeemuseum.nl/: failed(2,0): null

