Re: [swift-evolution] Strings in Swift 4

Ben Cohen via swift-evolution Fri, 10 Feb 2017 11:48:07 -0800

Hi Ted,

Here’s a sketch of one way to handle this kind of processing without requiring 
integer indexing. Hopefully not too buggy though I haven’t tested it 
extensively :).


Here I’m stashing the parsed values in a dictionary, but you could also write 
code to insert them into a proper data structure where the dictionary set is 
happening (or maybe stick with the dictionary build, and then use that 
dictionary to populate your data structure, along with some more data 
validation and error handling).

import Foundation
extension String: Collection { }

let fieldLengths: DictionaryLiteral = [
    "CompanyName":30,
    "PresidentLastName":15,
    "PresidentFirstName":8,
    "VPMarketingLastName":15,
    "VPMarketingFirstName":8,
    "AlternateContactTitle":10,
    "AlternateContactLastName":15,
    "AlternateContactFirstName":8,
    "Address":15,
    "City":15,
    "State":2,
    "Zip":5,
]

var data = "Premier Properties            Murray         Mitch   Ricky          
Roma    Office MgrWilliamson     John    350 Fifth Av   New York       NY10118"
var keyedRecord: [String:String] = [:]

for (key,length) in fieldLengths {
    let field = data.prefix(length)

    guard field.count == length
    else { fatalError("Input too short while reading \(key)") }
    // or however you want to handle it
    
    keyedRecord[key] = field.trimmingCharacters(in: CharacterSet.whitespaces)
    
    data = data.dropFirst(length)
}
guard data.isEmpty
else { fatalError("Input too long") }

print(keyedRecord)

I think it’s worth noting how seductive it is, with the integer indexing, to 
perform unchecked indexing into the data: recordStr[ 0..<30] is great until you 
have to process a corrupt record. Working in terms of higher-level APIs 
encourages handling of the failure cases. As an added bonus, when you upgrade 
your system and now the incoming data turns out to be utf8, your system doesn’t 
crash when a bored intern inserts some emoji into the president’s name.

There is still definitely room to make this easier/more discoverable for users:

- The “patterns” concept that is briefly touched on in the string manifesto 
would hopefully provide a another way of expressing this, with patterns 
matching fixed numbers of characters.
 - The need to walk over the field multiple times (first prefix, then count, 
then dropFirst) should be better-handled by some other scanning APIs mentioned 
in the manifesto e.g. if let field = data.dropPrefix(lengthPattern). Note that 
if the underlying String held only ASCII/Latin1, these should still be 
constant-time operations under the hood. 
- Another approach is to provide generic operations on Collection that chunks 
collections into subsequences of given lengths and serves them up, possibly via 
a a lazy view. This would have the advantage of not requiring mutable state in 
the loop.

But the above is what we can achieve with the tools we have today.

p.s. as someone who has worked in a bank with thousands of ancient file 
formats, no argument from me that COBOL rules :)

> On Feb 10, 2017, at 9:20 AM, Ted F.A. van Gaalen via swift-evolution 
> <[email protected]> wrote:
> 
> Please see in-line response below
>> On 10 Feb 2017, at 03:56, Shawn Erickson <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> On Thu, Feb 9, 2017 at 5:09 PM Ted F.A. van Gaalen <[email protected] 
>> <mailto:[email protected]>> wrote:
>>> On 10 Feb 2017, at 00:11, Dave Abrahams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> on Thu Feb 09 2017, "Ted F.A. van Gaalen" <tedvgiosdev-AT-gmail.com 
>>> <http://tedvgiosdev-at-gmail.com/>> wrote:
>>> 
>>>> Hello Shawn
>>>> Just google with any programming language name and “string manipulation”
>>>> and you have enough reading for a week or so :o)
>>>> TedvG
>>> 
>>> That truly doesn't answer the question.  It's not, “why do people index
>>> strings with integers when that's the only tool they are given for
>>> decomposing strings?”  It's, “what do you have to do with strings that's
>>> hard in Swift *because* you can't index them with integers?”
>> 
>> Hi Dave,
>> Ok. here are just a few examples: 
>> Parsing and validating an ISBN code? or a (freight) container ID? or EAN13 
>> perhaps? 
>> of many of the typical combined article codes and product IDs that many 
>> factories and shops use? 
>> 
>> or: 
>> 
>> E.g. processing legacy files from IBM mainframes:
>> extract fields from ancient data records read from very old sequential files,
>> say, a product data record like this from a file from 1978 you’d have to 
>> unpack and process:   
>> 123534-09EVXD4568,991234,89ABCYELLOW12AGRAINESYTEMZ3453
>> into:
>> 123, 534, -09, EVXD45, 68,99, 1234,99, ABC, YELLOW, 12A, GRAIN, ESYSTEM, 
>> Z3453.
>> product category, pcs, discount code, product code, price Yen, price $, 
>> class code, etc… 
>> in Cobol and PL/1 records are nearly always defined with a fixed field 
>> layout like this.:
>> (storage was limited and very, very expensive, e.g. XML would be regarded as 
>> a 
>> "scandalous waste" even the commas in CSV files! ) 
>> 
>> 01  MAILING-RECORD.
>>        05  COMPANY-NAME            PIC X(30).
>>        05  CONTACTS.
>>            10  PRESIDENT.
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>            10  VP-MARKETING.
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>            10  ALTERNATE-CONTACT.
>>                15  TITLE           PIC X(10).
>>                15  LAST-NAME       PIC X(15).
>>                15  FIRST-NAME      PIC X(8).
>>        05  ADDRESS                 PIC X(15).
>>        05  CITY                    PIC X(15).
>>        05  STATE                   PIC XX.
>>        05  ZIP                     PIC 9(5).
>> 
>> These are all character data fields here, except for the numeric ZIP field , 
>> however in Cobol it can be treated like character data. 
>> So here I am, having to get the data of these old Cobol production files
>> into a brand new Swift based accounting system of 2017, what can I do?   
>> 
>> How do I unpack these records and being the data into a Swift structure or 
>> class? 
>> (In Cobol I don’t have to because of the predefined fixed format record 
>> layout).
>> 
>> AFAIK there are no similar record structures with fixed fields like this 
>> available Swift?
>> 
>> So, the only way I can think of right now is to do it like this:
>> 
>> // mailingRecord is a Swift structure
>> struct MailingRecord
>> {
>>     var  companyName: String = “no Name”
>>      var contacts: CompanyContacts
>>      .
>>      etc.. 
>> }
>> 
>> // recordStr was read here with ASCII encoding
>> 
>> // unpack data in to structure’s properties, in this case all are Strings
>> mailingRecord.companyName                       = recordStr[ 0..<30]
>> mailingRecord.contacts.president.lastName  = recordStr[30..<45]
>> mailingRecord.contacts.president.firstName = recordStr[45..<53]
>> 
>> 
>> // and so on..
>> 
>> Ever worked for e.g. a bank with thousands of these files unchanged formats 
>> for years?
>> 
>> Any alternative, convenient en simpler methods in Swift present? 
>> These looks like examples of fix data format
> Hi Shawn,
> No, it could also be an UTF-8 String.
>   
>> that could be parsed from a byte buffer into strings, etc. 
> How would you do that? could you please provide an example how to do this, 
> with a byte buffer? 
> eg. read from flat ascii file —> unpack fields —> store in structure props? 
> 
> 
>> Likely little need to force them via a higher order string concept,
> What do you mean here with “high order string concept” ??
> Swift is a high level language, I expect to do this with Strings directly,
> instead of being forced to use low-level coding with byte arrays etc.
> (I have/want no time for that)
> Surely, one doesn’t have to resort to that in a high level language like 
> Swift? 
> If I am certain that all characters in a file etc. are of fixed width, even 
> in UTF-32
> (in the above example I am 100% sure of that) then 
> using  str[n1..<n2] is that case legitimate, because there are no
> grapheme characters involved.
> Therefore IMHO String direct subscripting should be available in Swift 
> for all Unicode types, and that the responsibility wether or not to use
> this feature is with the programmer, not the language designer.
> 
>> at least not until unpacked from its compact byte form.
> I am sorry, but to me, it all sounds a bit like:
> “why solve the problem with simple solution, when one can make it much
> more complicated?” Be more pragmatic.
> 
> 
> TedvG, 
>> 
>> -Shawn 
> 
> _______________________________________________
> swift-evolution mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-evolution 
> <https://lists.swift.org/mailman/listinfo/swift-evolution>

_______________________________________________
swift-evolution mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-evolution

Re: [swift-evolution] Strings in Swift 4

Reply via email to