A two-column scientific paper. One column reads: The effect of muscle a-tocopherol concentration (induced by dietary treatment) on TBARS at different storage times was evaluated (Figure 2). There was a linear effect (P < 0·001) of muscle a-tocopherol concentration on TBARS on day 0, but a linear plus quadratic effect on the following days (P < 0·001). Also in this case the linear plus quadratic effect indicated an exponential response, which was fitted in each case as follows:
The parser (code below) returns this: The effect of m (induced by dietar storage times was linear effect (P < concentration on T quadratic effect o Also in this case indicated an expo in each case as foll On some lines of parsing, characters at the left are missing, as if the parser started after the beginning of the text, case in point: ted storage (L = linear effect, P < 0·001; P< 0·001). The data were adjusted to a l equation (solid line) as indicated in is the fragment extracted from: Figure 2 Relationship between a-tocopherol concentration and lipid oxidation (assessed by the concentration of thiobarbituric acid reactive substances, TBARS, mg malonaldehyde per kg muscle) in longissimus lumborum muscle of Manchego lambs after 0 (u), 3 (n), 6 (s) and 9 (l) days of refrigerated storage (L = linear effect, P< 0·001; Q = quadratic effect, P<0·001). The data were adjusted to a linear or exponential equation (solid line) as indicated in the text. The paper itself is found by following the link from here: http://openagricola.nal.usda.gov/Record/IND23271089 (I will send the file offlist if needed; it's 64k) Code used is this: Parser parser = new AutoDetectParser(); Metadata metadata = new Metadata(); File f = new File("volume_73_part_3_p451-457.pdf"); TikaInputStream tis = TikaInputStream.get(f); StringWriter writer = new StringWriter(); WriteOutContentHandler handler = new WriteOutContentHandler(writer); parser.parse(tis,handler,metadata,new ParseContext()); System.out.println(handler.toString()); My questions are these: Can Tika (PdfBox) correctly parse multi-column content? What am I missing? Many thanks in advance. Jack
