Hey thanks guys - so looking at readStreamEncoded: - how do I know what the valid encodings are? Skimming those doc’s Sven referenced, I can start to pick out some - but is there a list? I see that method parameter says “anEncoding” but the type hint on that is misleading as it seems like its a String or is it a Symbol? If I search for Encoder classes - I do find ZnCharacterEncoder - and it has class methods for latin1, utf8, ascii - so is this the definitive list? And should the encoding strings used in those methods be constants or something I can reference in my code?
Gosh - this raises a whole host of things I just naively assumed happened for me. So it looks like the file giving me issues - seems to have characters like £ or ¬ in it. So I’m wondering how I know what the proper encoding format would be (I think these files were written out with some PHP app) - is it just a trial and error thing? I tried changing my code to: details parseStream: (firmEfs readStreamEncoded: 'iso-8859-1’). - and other variants like ‘ASCII’ and ‘latin1’ - and this then gives me another error: "ZnCharacterEncodingError: Character Unicode code point outside encoder range” So it does sound like I have a file that isn’t conforming to known standards - and I guess I have to use #beLenient option. Sven - In the examples for using #beLenient - you seem to show something that assumes you will iterate with Do - as my existing code takes a stream, that it wants to do a #nextLine on - would it be bad to do something like this: efsStream := (firmEfs readStreamEncoded: 'latin1'). efsStream encoder beLenient. details parsStream: efsStream. That is - get the endcoder from my Stream and make it lenient? Appreciate the pointers on this guys - I’m definitely learning something new here. Tim > On 20 Jul 2021, at 12:11, Guillermo Polito <guillermopol...@gmail.com > <mailto:guillermopol...@gmail.com>> wrote: > > > >> El 20 jul 2021, a las 11:45, Sven Van Caekenberghe <s...@stfx.eu >> <mailto:s...@stfx.eu>> escribió: >> >> >> >>> On 20 Jul 2021, at 11:03, Sven Van Caekenberghe <s...@stfx.eu >>> <mailto:s...@stfx.eu>> wrote: >>> >>> Hi Tim, >>> >>> An introduction to this part of the system is in >>> https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html >>> >>> <https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/Zinc-Encoding-Meta/Zinc-Encoding-Meta.html> >>> [Character Encoding and Resource Meta Description] from the "Enterprise >>> Pharo" book. >>> >>> The error means that a file that you try to read as UTF-8 does contain >>> things that are invalid with respect to the UTF-8 standard. >>> >>> Are you sure the file is in UTF-8, maybe it is in ASCII, Latin-1 or >>> something else ? >>> >>> It is possible to customise the encoding to something different than the >>> default UTF-8. For non-UTF encoders, there is a strict/lenient option to >>> disallow/allow illegal stuff (but then you will get these in your strings). >>> >>> I can show you how to do that if you want. >> >> '/var/log/system.log' asFileReference readStreamDo: [ :in | in upToEnd ]. >> >> '/var/log/system.log' asFileReference binaryReadStreamDo: [ :in | >> (ZnCharacterReadStream on: in encoding: #ascii) upToEnd ]. >> >> '/var/log/system.log' asFileReference binaryReadStreamDo: [ :in | >> (ZnCharacterReadStream on: in encoding: ZnCharacterEncoder ascii >> beLenient) upToEnd ]. > > There is also readStreamEncoded:[do:], which is a bit more concise but does > the same :) > >> >> HTH >> >>> Sven >>> >>>> On 20 Jul 2021, at 10:31, Tim Mackinnon <tim@testit.works >>>> <mailto:tim@testit.works>> wrote: >>>> >>>> Hi - I’m doing a bit of log file processing with Pharo - and I’ve hit an >>>> unexpected error and am wondering what the best way to approach it is. >>>> >>>> It seems that I have a log file that has unexpected characters, and so my >>>> readStream loop that reads lines gets an error: "ZnInvalidUTF8: Illegal >>>> continuation byte for utf-8 encoding”. >>>> >>>> For some reason this file (unlike my others) seems to contain characters >>>> that it shouldn’t - but what is the best way for me to continue >>>> processing? Should I be opening my files in a different way - or can I >>>> resume the error somehow- I’m not familiar with this area of Pharo and am >>>> after a bit of advice. >>>> >>>> My code is like this (and I get the error when doing nextLine) >>>> >>>> >>>> parseStream: aFileStream with: aBlock >>>> | line items | >>>> [ (line := aFileStream nextLine) isNil ] >>>> whileFalse: [ >>>> items := $/ split: line. >>>> items size = 3 ifTrue: [aBlock value: items]] >>>> >>>> My stream is created like this: >>>> >>>> firmEfs := (pathName , '/' , firmName , '_files') asFileReference. >>>> details parseStream: firmEfs readStream. >>>> >>>> >>>> Should I be opening the stream a bit differently - or can I catch that >>>> encoding error and resume it with some safe character? >>>> >>>> Thanks for any help. >>>> >>>> Tim