Just chiming in here: is this something that could be fixed for notes export, too?
Using Skim 1.6.9, if I create a note for a passage in the PDF that includes a hyphen at the line break, it still includes a soft hyphen (U+00AD) and a space, and I have to trim each of these by hand. FWIW, the PDF was OCR'd from scanned pages using the latest version of Adobe Acrobat. I checked some other PDF readers and found that Preview does that same thing (because it also uses PDFKit?), while Adobe Acrobat Pro, FoxIt Reader, and PDF Expert all trim out the soft hyphen, as well as the space. It seems that if there are any soft hyphens (U+00AD) followed by a space in a string copied from a PDF, these two characters can safely be trimmed out. Regular hyphens in the PDFs I've checked are represented by U+002D, so there should be no danger of losing them if Skim were to perform this operation on strings. I did a web search and found a fair amount of discussion on the interwebs about OCR'd text and soft hyphens, with many people asking how they can fix this problem with various apps. Any thoughts about this? Thanks again, M. On Sat, Mar 19, 2022 at 7:53 AM Christiaan Hofman <cmhof...@gmail.com> wrote: > > > On 18 Mar 2022, at 23:09, Christiaan Hofman <cmhof...@gmail.com> wrote: > > > > On 18 Mar 2022, at 22:45, Jan David Hauck via Skim-app-users < > skim-app-users@lists.sourceforge.net> wrote: > > Hi all, > Is there a way to do a search in a PDF with an AND operator? > In the search field, when checking “whole words only” it returns all pages > with any of the words in the search field. > With “whole words only” unchecked, it tries to find the exact phrase in > the search field. > I’m trying to find a way to search for pages that contain word A *and* > word B (not either word A or word B). > Help much appreciated. > Jan > > > No, that is not supported. You should realize that it searches for > strings, not for pages. > > Christiaan > > > Looking at our code, I realized that it still attempts to combine > hyphenated words. It just fails, because PDFKit seems to insert spaces > between the lines, rather than newlines, so we did not see the hyphens at > the end of the lines. I have replaced this by looking for lines in the > layed out text, rather than just the strings, with hyphens at the end, and > that seems to be working well. > > Christiaan > > _______________________________________________ > Skim-app-users mailing list > Skim-app-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/skim-app-users >
_______________________________________________ Skim-app-users mailing list Skim-app-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/skim-app-users