On 2020-Jan-23, Robert Haas wrote: > No, that's not it. Suppose that Álvaro Herrera has some custom > settings he likes to put on all the PostgreSQL clusters that he uses, > so he creates a file álvaro.conf and uses an "include" directive in > postgresql.conf to suck in those settings. If he also likes UTF-8, > then the file name will be stored in the file system as a 12-byte > value of which the first two bytes will be 0xc3 0xa1. In that case, > everything will be fine, because JSON is supposed to always be UTF-8, > and the file name is UTF-8, and it's all good. But suppose he instead > likes LATIN-1.
I do have files with Latin-1-encoded names in my filesystem, even though my system is UTF-8, so I understand the problem. I was wondering if it would work to encode any non-UTF8-valid name using something like base64; the encoded name will be plain ASCII and can be put in the manifest, probably using a different field of the JSON object -- so for a normal file you'd have { path => '1234/2345' } but for a Latin-1-encoded file you'd have { path_base64 => '4Wx2YXJvLmNvbmYK' }. Then it's the job of the tool to ensure it decodes the name to its original form when creating/querying for the file. A problem I have with this idea is that this is very corner-casey, so most tool implementors will never realize that there's a need to decode certain file names. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services