pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651717472
Phew. It worked. RTools 4.0 is still broken, but there doesn't seem to be
anything we can do, except perhaps disable that job. I'm gonna merge and leave
the R cleanup to someone
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651366993
> The version installed is compiled with gcc 8. RTools 35 uses gcc 4.9
What difference does it make? This is plain C.
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651353338
> This means there also needs to be a PKGBUILD
Why? `libutf8proc` is installed.
This is an automated
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651338264
@xhochy Could you help on the utf8proc issue on RTools 3.5?
See here:
https://github.com/apache/arrow/pull/7449/checks?check_run_id=819772618#step:10:169
It seems that
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651316656
I pushed a commit that raises an error on invalid UTF8. It does not seem to
make the benchmarks slower.
This is
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282959
Main point remaining is whether we raise an error on invalid UTF8 input. I
see no reason not too (an Arrow string array has to be valid UTF8 as per the
spec, just like a Python
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-651282415
> Having a benchmark run on non-ascii codepoints (I think we want to do this
separate from this PR, but important point).
Yes, I think we can defer that to a separate PR.
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-650606456
@wesm I can also take this since you already have quite a bit on your plate.
This is an automated message from the
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645389395
I think it would be more acceptable to inline the relevant utf8proc
functions.
This is an automated message from
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645382201
Let's step back a bit: why do we care about micro-optimizing this?
This is an automated message from the Apache
pitrou commented on pull request #7449:
URL: https://github.com/apache/arrow/pull/7449#issuecomment-645278822
I don't know how important it is to get good performance on non-ASCII data.
Note that the ASCII fast path could well be applied to subsets of the array.
11 matches
Mail list logo