Hi Guille, Esteban and Henry, Thanks for your replies.
On Tue, Sep 18, 2018 at 10:09:02AM +0200, Guillermo Polito wrote: > > > On Mon, Sep 17, 2018 at 6:52 PM Alistair Grant <[email protected]> wrote: > > Hi Esteban, Guille and Everyone, > > I haven't looked at using FFI much, however it is easy to imagine that > different file encoding rules on different platforms will make writing > FFI calls more difficult, > > > Well not really (from my point of view :)) > From the point of view of the FFI call an encoded string is just a bunch of > bytes. FFI does not do any interpretation of them. Right, but getting the appropriately encoded bunch of bytes is the issue. :-) > i.e. some of the different formats are: > > - OSX uses Mac specific decomposed UTF8 encoding > - Windows uses Wide Strings (16 bit Unicode characters) > - Linux allows pretty much anything, but precomposed UTF8 is common > > > > At the image side, we could have an strategy that, depending on the OS, could > encode in one encoding or another, or even not encode at all. > > > Believe it or not, I'm still working on getting the > FileAttributesPlugin working (file name encoding on Windows being the > latest issue - the tests in Pharo need to be extended). > > > I believe you, don't worry ^^. > > > Would it be useful for future FFI work to have primitives available > which convert file names to and from the various platform specific > formats? (Linux is basically a no-op, and Windows could be written > in-image, but OSX requires the platform routines to be called). > > > Maybe... Are the OSX routines exposed as C functions (that we can call through > FFI) or they are objective-C methods/functions (that are more complicated to > map)? The OSX routines are exposed as C functions (and available as Objective-C methods), see convertChars() in platforms/unix/vm/sqUnixCharConv.c. On Tue, Sep 18, 2018 at 11:21:41AM +0200, Esteban Lorenzano wrote: > > self > > ffiCall: #(bool saveContentsToFile(String fileName, String contents)) > > options: #(+stringEncodings( fileName return , platformAPI contents) > > This is cool. > What I do not like is to rely on primitives to do that encoding. > This should be in image??? using FFI if needed (this is all because we > want to rely less and less on plugins :P) I realise of course that this could all be done in FFI, and I agree with all Estaban's arguments in favour of FFI, my main motivation was that the code is already in the VM, and to avoid code duplication with the obvious benefit that if a bug is fixed it will apply everywhere. On Tue, Sep 18, 2018 at 11:23:56AM +0200, Esteban Lorenzano wrote: > > > On 18 Sep 2018, at 11:04, Guillermo Polito <[email protected]> > wrote: > > > > On Tue, Sep 18, 2018 at 10:43 AM Henrik Sperre Johansen < > [email protected]> wrote: > > It *would* be pretty handy for adding some auto-conversion into the > marshaller based on parameter encoding options though... (other than > filename, could be done in smalltalk using exisiting encoders) > > self > ffiCall: #(bool saveContentsToFile(String fileName, String > contents)) > options: #(+stringEncodings( fileName return , platformAPI > contents) > > > Well, I like this idea. > > > (And yes, I've probably badly mangled the options syntax) > > Is much less verbose than having to manually convert Strings to the > proper > platform Unicode encodings before calling. > Depends a bit on whether the primitive argument is > Byte/Widestrings(latin1/utf32), or if it accepts only utf8 bytes and > one has > to convert first anyways. > > It's not like this isn't a pain point, there are plenty of currently > used > API's that are broken if you try to use non-ascii. > > > Yes, but I think this may be because in general people tend to not know > how > encodings work... (even myself I don't feel I know enough :)) > But this makes me think that we should make encoding explicit? > > > Yup, explicit please. Nothing hide behind the carpet :) > > > > Maybe we should force people to specify an encoding if they specify a > callout using a string. > > And then, either they specify it at the level of the callout, or at the > level of the library (like setting a default encoding for all strings). > > > > You can have some global FFI settings (I was thinking on adding some global > options settings for FFI in general, btw) and even ?library based settings?, > to > simplify. > > Esteban > > > > Because this raises also the question of what is the default encoding? > And I'd say that in there is no satisfactory default encoding... I'll defer to Sven every time when it comes to character encoding, but my understanding is that the only platform that has consistent encoding rules is OSX, which uses the platform specific decomposed UTF8. Both Windows and Linux use precomposed UTF8, but other character encodings are possible (particularly for older files). So we certainly shouldn't make the encoding hard-coded. UTF8 as the default encoding I think does make sense (this is what FilePlugin currently uses). Cheers, Alistair
