Re: [Zeek-Dev] changing format of uid to ULID?

Karl Pietrzak Fri, 30 Aug 2019 11:55:47 -0700

I'd say the tooling is still Java-focused, but I found some decent CLI
tooling at https://github.com/apache/parquet-mr/tree/master/parquet-tools

Specifically, I used the convert command
<https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ConvertCommand.java>
to go from JSON -> Parquet.  JSON.gz to Parquet (gzip compression code)
saved us about 35%.

When you say "log writer", do you mean custom Zeek writer
<https://docs.zeek.org/en/stable/frameworks/logging.html> that writes to
Parquet directly?

The major issue we're facing is that the schema for Zeek output can change
over time (more columns can be added).  That's an issue for Parquet.

On Fri, Aug 30, 2019 at 2:21 PM Justin Azoff <[email protected]> wrote:

> On Fri, Aug 30, 2019 at 2:17 PM Karl Pietrzak <[email protected]> wrote:
>
>> Good morning everyone.
>>
>> I'm researching compression of Zeek data.  I'm currently dumping Zeek
>> data into Parquet files
>>
>
> I don't have much feedback on the uid bits, but I'm very interested in
> Parquet!  I had looked into doing this a while back but the tooling around
> parquet was very java/big data focussed and not very CLI friendly.  Are you
> using the new c++ implementation in a log  writer or are you converting
> json to parquet?
>
> --
> Justin
>

-- 
Karl

_______________________________________________
zeek-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev

Re: [Zeek-Dev] changing format of uid to ULID?

Reply via email to