On Tue, 25 Jun 2013, Douglas Creager wrote:
json2avro lets you pick from null, snappy, deflate and lzma codecs,
specify a custom block size and optionally skips over JSON lines that it
is unable to parse. I'm also thinking of adding a target max file size
so that it would automatically split output into multiple sizes.
Very cool! Kind of the reverse of avrocat or avropipe. We could clean
it up and add it as another C command-line tool if you like.
Sure I'm all for it!
It uses Jansson as the JSON parser which is conveniently bundled with
Avro-C. (One thing that I'm not clear on is that Jansson cannot handle
nulls, not sure if this is a Jansson-specific limitation or something
inherent to JSON.)
Can you elaborate on this? Jansson should support null JSON values
(it's the keyword null, not the string value "null"). And the Avro C
bindings should use that for Avro null values.
Sorry I wasn't clear. Jansson uses null-terminated strings. The docs state
"Normal null terminated C strings are used, so JSON strings may not
contain embedded null characters." I've tested it and indeed, they cannot,
Jansson cannot parse a string like "abc\u0000def".
Grisha