Hi,
For some time (in 2.x) we have commented out this test as it was
waiting for TIKA-748 to be resolved... which now has been resolved
however I'm getting some confusing output when trying to resurrect the
test!
So @line 105 we do
String text = parse.getText();
assertEquals("The quick brown fox jumps over the lazy dog", text.trim());
But I was wanting to implement the suggested test for title e.g.
String title = parse.getTitle();
String text = parse.getText();
assertEquals("test rft document", title);
assertEquals("The quick brown fox jumps over the lazy dog", text.trim());
The test fails on the 2nd assertion which with the following
Testcase: testIt took 5.668 sec
FAILED
null expected:<[The quick brown fox jumps over the lazy dog]> but
was:<[test rft document]>
junit.framework.ComparisonFailure: null expected:<[The quick brown fox
jumps over the lazy dog]> but was:<[test rft document]>
at org.apache.nutch.parse.tika.TestRTFParser.testIt(TestRTFParser.java:)
So this looks like parse.getText() returns the same (in this instance)
as parse.getTitle()... which smells like rotting herring to me.
Any immediate thoughts whether this is a known problem in the Tika RTF
parser, parse-tika's DomContentUtils class or somewhere in between?
Thank you
Lewis
--
Lewis