subject:"\[dspace\-tech\] Pdfbox Text Extract Issues"

Re: [dspace-tech] Pdfbox Text Extract Issues

2016-06-06 Thread Terry Brady

Ivan, Thanks for the note. As I have investigated this further, I have discovered that the issue lies in the way that I have scripted my call to filter-media and not in the text extraction code. Terry On Sat, Jun 4, 2016 at 3:41 PM, helix84 wrote: > Hi Terry, > > could

Re: [dspace-tech] Pdfbox Text Extract Issues

2016-06-04 Thread helix84

Hi Terry, could this be the culprit or the fix? https://jira.duraspace.org/browse/DS-1187 Regards, ~~helix84 -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send

[dspace-tech] Pdfbox Text Extract Issues

2016-06-03 Thread Terry Brady

I attempted to re-extract text from some of our PDF files containing Arabic characters since upgrading to DSpace 5. Most of these characters were lost by the extraction process. The text from the same documents had been extracted while running DSpace 3 or DSpace 4 and the extract was reasonably