If you are interested, in Java getting the character type (property) is even a bit slower, but it’s of course machine dependent:
java.lang.Character.getType(codepoint) is constantly 39 ns/op on 300000000 iterations on my machine, no matter if it’s an ascii or a fullwidth character. (Oracle JDK 8 SE). Java uses huge Unicode arrays internally to lookup the code point, I doubt that there’s much room for optimization and even if, probably nobody cares when saving a few nano seconds. — Christian > Am 23.12.2015 um 03:04 schrieb Sam Whited <[email protected]>: > > On Mon, Dec 21, 2015 at 4:08 PM, Christian Schudt > <[email protected]> wrote: >> If you mean having a huge code point table, like in your tables.go file: I >> think Java already has such tables internally. >> What could be improved here, is that Character.getType(cp) could only be >> invoked once. I haven’t done any benchmark for this, but I don’t expect a >> significant performance benefit. > > Out of curiosity, I answered my own question here. I'm using Go, which > also has lots of Unicode tables in the standard library, so I > benchmarked running the algorithm (I modified it slightly from the > version in my generator to remove the NFKC step, which is very slow, > this way it more closely resembles your algorithm), and looking up a > value in the large pre-generated trie. I have no idea where the > bottlenecks / optimizations in Java would be, so these results may be > meaningless to you, but, at least in Go, the single Trie lookup was > much faster: > > $ go test -bench . -benchmem > PASS > BenchmarkAsciiLookup-4 300000000 3.85 ns/op > 0 B/op 0 allocs/op > BenchmarkFullwidthLookup-4 200000000 9.21 ns/op > 0 B/op 0 allocs/op > BenchmarkAsciiCalculate-4 100000000 17.4 ns/op > 0 B/op 0 allocs/op > BenchmarkFullwidthCalculate-4 20000000 71.4 ns/op > 0 B/op 0 allocs/op > ok _/home/sam/Projects/golang-x-text/unicode/precis 7.632s > > Each test here is looking up or calculating the derived properties for > a single character (the ASCII tests are looking up 'u' and the Unicode > tests are looking up 'u' [full width] which was chosen very > scientifically, I assure you), the second column is the number of > tests that were run until the timings reached equilibrium. > > For the worst case, there's a pretty good speed difference, whether > that difference is worth pre-generating the data is another matter, of > course ☺ > > Best, > Sam > > > -- > Sam Whited > pub 4096R/54083AE104EA7AD3 > https://blog.samwhited.com _______________________________________________ precis mailing list [email protected] https://www.ietf.org/mailman/listinfo/precis
