Many thanks to Tim for suggesting isSubset, which I had somehow missed while browsing the documentation.
To answer Dave's questions: unfortunately the Set comparisons aren't an ideal candidate for asynchronous work. The comparisons take place as part of a CSS-like styling phase, whereby a number of Sets (the stylesheet definitions) are compared against many thousands of other Sets (the class lists for each styleable object). While I believe the Int comparisons would undoubtedly be faster, I'm likely to lose even more time calculating the hashValues for each Set. I believe the use of isSubset as well as some result caching (storing matches for subsequent queries including identical class lists) is likely to result in a decent speed improvement. Thank you both for taking the time to respond. And if anyone has any suggestions for alternative approaches I'm all ears :-) On Sun, Nov 13, 2016 at 11:49 PM, David Sweeris <daveswee...@mac.com> wrote: > > On Nov 13, 2016, at 1:58 PM, Nial Giacomelli via swift-users < > swift-users@swift.org> wrote: > > Using Swift 3 I have a function that's called extremely frequently and is > appearing regularly in Profiler runs. The function takes two Set<String> > instances and simply attempts to determine whether all items from Set A are > present in Set B (it's important to note that Set B may well contain > additional items). > > I've attempted to approach the problem in two ways: > > let diff = a.subtracting(b) > guard diff.count == 0 else { return false } > > And also by simply iterating over the contents of Set A, like so: > > for item in a { > if !b.contains(item) { > return false > } > } > > Both ultimately end up spending the majority of their time in String._ > compareDeterministicUnicodeCollaton(String) -> Int. Which makes sense, > given what I'm doing - but ideally I'd like to come up with a more > efficient way of performing the above check. Swift's String representation > is incredibly robust, but for my needs the strings could be adequately > represented in ASCII encoding. I've also considered storing and comparing > the hashValue of the strings, to speed up comparisons... > > Hopefully this is an acceptable question for this mailing list. I'm aware > that this may not be a Swift-specific question and could well be solved > with a more efficient data structure or approach, but I'd really appreciate > some input from more experienced Swift developers :-) > > > How often do the sets change? And where do you get them from? I’m > wondering, if there’s enough time between when you get the strings and when > you need to know if setA is a subset of setB, could you somehow compute the > answer asynchronously in a background thread or something? Strictly > speaking it wouldn't be any less work, but if you can move it to where > you’re waiting for user input or something you’ll probably get your answer > sooner. > > Regarding how to get the answer using less work in the first place, I > think you’re doing a tiny bit of extra work — just some allocation > overhead, really — by calculating the set difference and checking that its > count == 0, rather than just checking if a.isSubset(of: b). Beyond that, > I’m not really sure… You might be able to do something with the strings’ > hash values. I mean, I don’t know if they really “work like that” (I’m > quite foggy on how hash values are calculated and exactly what they’re > suitable for… this might be a horrible idea), but if they do, comparing > hash values (Ints) will definitely be faster than comparing Strings. I did > some quick’n’dirty testing on my machine with some code very much like: > > let setA: Set<String> = ["some", "set", "of", "random", "strings"] > let setB: Set<String> = ["some", "other", "set", "of", "strings"] > let hashedA = Set(setA.map{$0.hashValue}) > let hashedB = Set(setB.map{$0.hashValue}) > setA.isSubset(of: setB) > hashedA.isSubset(of: hashedB) > > > The actual execution of hashedA.isSubset(of: hashedB) seems to be very > roughly 2x faster than setA.isSubset(of: setB). However, if you include > the overhead of calculating them, using hash values drops to about 5x > *slower*. So unless you’d be reusing the hashed values a lot or > something, I think you’re already doing about the least amount of work I > can think of. > > - Dave Sweeris >
_______________________________________________ swift-users mailing list swift-users@swift.org https://lists.swift.org/mailman/listinfo/swift-users