If you are interested, in Java getting the character type (property) is even a 
bit slower, but it’s of course machine dependent:

java.lang.Character.getType(codepoint) is constantly 39 ns/op on 300000000 
iterations on my machine, no matter if it’s an ascii or a fullwidth character. 
(Oracle JDK 8 SE).
Java uses huge Unicode arrays internally to lookup the code point, I doubt that 
there’s much room for optimization and even if, probably nobody cares when 
saving a few nano seconds.

— Christian


> Am 23.12.2015 um 03:04 schrieb Sam Whited <[email protected]>:
> 
> On Mon, Dec 21, 2015 at 4:08 PM, Christian Schudt
> <[email protected]> wrote:
>> If you mean having a huge code point table, like in your tables.go file: I 
>> think Java already has such tables internally.
>> What could be improved here, is that Character.getType(cp) could only be 
>> invoked once. I haven’t done any benchmark for this, but I don’t expect a 
>> significant performance benefit.
> 
> Out of curiosity, I answered my own question here. I'm using Go, which
> also has lots of Unicode tables in the standard library, so I
> benchmarked running the algorithm (I modified it slightly from the
> version in my generator to remove the NFKC step, which is very slow,
> this way it more closely resembles your algorithm), and looking up a
> value in the large pre-generated trie. I have no idea where the
> bottlenecks / optimizations in Java would be, so these results may be
> meaningless to you, but, at least in Go, the single Trie lookup was
> much faster:
> 
> $ go test -bench . -benchmem
> PASS
> BenchmarkAsciiLookup-4          300000000                3.85 ns/op
>        0 B/op          0 allocs/op
> BenchmarkFullwidthLookup-4      200000000                9.21 ns/op
>        0 B/op          0 allocs/op
> BenchmarkAsciiCalculate-4       100000000               17.4 ns/op
>        0 B/op          0 allocs/op
> BenchmarkFullwidthCalculate-4   20000000                71.4 ns/op
>        0 B/op          0 allocs/op
> ok      _/home/sam/Projects/golang-x-text/unicode/precis        7.632s
> 
> Each test here is looking up or calculating the derived properties for
> a single character (the ASCII tests are looking up 'u' and the Unicode
> tests are looking up 'u' [full width] which was chosen very
> scientifically, I assure you), the second column is the number of
> tests that were run until the timings reached equilibrium.
> 
> For the worst case, there's a pretty good speed difference, whether
> that difference is worth pre-generating the data is another matter, of
> course ☺
> 
> Best,
> Sam
> 
> 
> -- 
> Sam Whited
> pub 4096R/54083AE104EA7AD3
> https://blog.samwhited.com

_______________________________________________
precis mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/precis

Reply via email to