After more digging, I realised it's because Ur is converting all non-ASCII
characters into Unicode escapes, when the test expects straight UTF-8.
I made a small PR that attempts a UTF-8 conversion for all printable
characters that don't need escaping (like < and &):

https://github.com/urweb/urweb/pull/177

It now passes the Fortunes test and the others in that site, but I know
very little about Unicode, Ur/Web and C, so your feedback on it would be
most welcome!

Oisín

On Sun, 11 Aug 2019, 17:52 Karn Kallio, <tierplusplusli...@skami.org> wrote:

>
> > Hello! I used "git bisect" to find the commit that introduced
> > behaviour that causes Ur/Web to fail the Fortunes test in the
> > benchmarks. It's commit 5cc729b48aad084757a049b7e5cdbadae5e9e400 from
> > November 2018. Unfortunately that's a pretty big squashed commit from
> > a PR:
> >
> https://github.com/urweb/urweb/commit/5cc729b48aad084757a049b7e5cdbadae5e9e400
> >
> > It'd be great if someone could take a look and see why it strips UTF-8
> > output in that benchmark test. Note that the test runs in a Docker
> > container, so perhaps it's trying to infer a system-wide i18n setting?
> >
> > Once we fix this, we can update the benchmarks repo and solve the
> > sorry state of affairs with Ur/Web way down the performance rankings.
> > I'd love to see more people active in the community, and things like
> > this would help raise awareness of the project.
> >
> > Oisín
> >
>
> I suspect this has to do with the difference between LTR and RTL
> languages.  A given database, in the fortunes table, may have Arabic
> text stored "backwards", and without any markings such as U+200F
> RIGHT-TO-LEFT MARK and so the U8_NEXT macro is seeing a trailing byte
> of a UTF-8 encoded character and failing to loop.
>
> _______________________________________________
> Ur mailing list
> Ur@impredicative.com
> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
>
_______________________________________________
Ur mailing list
Ur@impredicative.com
http://www.impredicative.com/cgi-bin/mailman/listinfo/ur

Reply via email to