On Thu, Aug 25, 2011 at 1:17 PM, Lorna Priest <[email protected]> wrote:
> The recent discussion on PUA characters reminded me of a question I've had. > I am wondering if anyone has a tool whereby we could search for all > documents on a local computer (or server) that use PUA codepoints. I suppose > what I'd like is to be able to identify beginning and ending codepoints to > search for, such as "F130..F32F" or something along that line. I have a utility called unidesc, part of my uniutils package ( http://billposer.org/Software/unidesc.html), that identifies the ranges to which characters belong. You could run this on the various files and check the output for "Private Use Area". To obtain a sorted list of the ranges found in a file (rather than the default of the range to which each portion of the file belongs), use the -r option. This is runs on Linux and BSD systems, so probably can be compiled for MacOS too. I don't know about MS Windows.

