Re: Kudu CLI tool JSON format

Todd Lipcon Tue, 11 Jun 2019 09:44:36 -0700

I guess the issue is that we use rapidjson's 'String' support to write out
C++ strings, which are binary data, not valid UTF8. That's somewhat
incorrect of us, and we should be base64-encoding such binary data.


Fixing this is a little bit incompatible, but for something like partition
keys I think we probably should do it anyway and release note it,
considering partition keys are quite likely to be invalid UTF8.

-Todd

On Tue, Jun 11, 2019 at 6:08 AM Pavel Martynov <[email protected]> wrote:

> Hi, guys!
>
> We trying to use an output of "kudu cluster ksck master -ksck_format
> json_compact" for integration with our monitoring system and hit a little
> strange. Some part of output can't be read as UTF-8 with Python 3:
> $ kudu cluster ksck master -ksck_format json_compact > kudu.json
> $ python
> with open(' kudu.json', mode='rb') as file:
>   bs = file.read()
>   bs.decode('utf-8')
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position
> 705196: invalid start byte
>
> There how SublimeText shows this block of text:
> https://yadi.sk/i/4zpWKZ37iP8OEA
> As you can see kudu tool encodes zeros as \u0000, but don't encode some
> other non-text bytes.
>
> What do you think about it?
>
> --
> with best regards, Pavel Martynov
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Kudu CLI tool JSON format

Reply via email to