Sorry, that comment was added to the wrong thread. It was about the text of (highlight) notes.
Christiaan > On 19 Mar 2022, at 03:29, Mark Roberts <mroberts1...@gmail.com> wrote: > > Just chiming in here: is this something that could be fixed for notes export, > too? > > Using Skim 1.6.9, if I create a note for a passage in the PDF that includes a > hyphen at the line break, it still includes a soft hyphen (U+00AD) and a > space, and I have to trim each of these by hand. FWIW, the PDF was OCR'd from > scanned pages using the latest version of Adobe Acrobat. > > I checked some other PDF readers and found that Preview does that same thing > (because it also uses PDFKit?), while Adobe Acrobat Pro, FoxIt Reader, and > PDF Expert all trim out the soft hyphen, as well as the space. > > It seems that if there are any soft hyphens (U+00AD) followed by a space in a > string copied from a PDF, these two characters can safely be trimmed out. > Regular hyphens in the PDFs I've checked are represented by U+002D, so there > should be no danger of losing them if Skim were to perform this operation on > strings. > > I did a web search and found a fair amount of discussion on the interwebs > about OCR'd text and soft hyphens, with many people asking how they can fix > this problem with various apps. > > Any thoughts about this? > > Thanks again, > > M. > > > > > > On Sat, Mar 19, 2022 at 7:53 AM Christiaan Hofman <cmhof...@gmail.com > <mailto:cmhof...@gmail.com>> wrote: > > >> On 18 Mar 2022, at 23:09, Christiaan Hofman <cmhof...@gmail.com >> <mailto:cmhof...@gmail.com>> wrote: >> >> >> >>> On 18 Mar 2022, at 22:45, Jan David Hauck via Skim-app-users >>> <skim-app-users@lists.sourceforge.net >>> <mailto:skim-app-users@lists.sourceforge.net>> wrote: >>> >>> Hi all, >>> Is there a way to do a search in a PDF with an AND operator? >>> In the search field, when checking “whole words only” it returns all pages >>> with any of the words in the search field. >>> With “whole words only” unchecked, it tries to find the exact phrase in the >>> search field. >>> I’m trying to find a way to search for pages that contain word A and word B >>> (not either word A or word B). >>> Help much appreciated. >>> Jan >> >> No, that is not supported. You should realize that it searches for strings, >> not for pages. >> >> Christiaan >> > > > Looking at our code, I realized that it still attempts to combine hyphenated > words. It just fails, because PDFKit seems to insert spaces between the > lines, rather than newlines, so we did not see the hyphens at the end of the > lines. I have replaced this by looking for lines in the layed out text, > rather than just the strings, with hyphens at the end, and that seems to be > working well. > > Christiaan > > _______________________________________________ > Skim-app-users mailing list > Skim-app-users@lists.sourceforge.net > <mailto:Skim-app-users@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/skim-app-users > <https://lists.sourceforge.net/lists/listinfo/skim-app-users> > _______________________________________________ > Skim-app-users mailing list > Skim-app-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/skim-app-users Christiaan
_______________________________________________ Skim-app-users mailing list Skim-app-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/skim-app-users