On Wednesday 23 May 2007, lucien.taylor at oxil.co.uk wrote: > We are using pdftotext to strip out text from pdf's to prepare for > search indexing and more. This works well except with our own > pdf's (produced in Scribus) which getting badly broken up - we > suspect through kerning. The text generated is simply fragmented > into meaningless chunks. It remains in sequential order and some > words are fine, but generally it's not working. > > We are using (the great) Bitstream Vera which looks so good both on > screen and in print, however we are also getting the same effect > when we convert our text to Arial. > > 1. Has anybody experienced this? Is this a pdftotext thing? > 2. Are there alternative pdf-to-text parsers that anyone would > recommend? > > Lucien > Oxford Information Labs >
Hi, It is a known issue in Scribus because Scribus uses absolute positioning of all glyphs to assure on-screen presentation is *really* what gets printed. It is not ideal and there is a bug for it, but it is a low priority item at the moment. HTH, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. Url : http://nashi.altmuehlnet.de/pipermail/scribus/attachments/20070523/806477c2/attachment.pgp
