Re: Any standard way for min/max values per record-batch?

2021-07-19 Thread Kohei KaiGai
alues. Not only Apache Arrow files generated by pg2arrow, this min/max statistics values are appendable by rewrite of the Footer portion, without relocation of record-batches. So, we plan to provide a standalone command to attach the min/max statistics onto the existing Apache Arrow

Re: Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
re as a > parallel list on RecordBatch itself. > > If we do add a new structure or arbitrary key-value pair we should not use > KeyValue but should have something where the values can be bytes. > > On Wed, Feb 17, 2021 at 7:17 PM Kohei KaiGai wrote: > > > Hello, > &g

Any standard way for min/max values per record-batch?

2021-02-17 Thread Kohei KaiGai
Hello, Does Apache Arrow have any standard way to embed min/max values of the fields per record-batch basis? It looks FieldNode supports neither dedicated min/max attribute nor custom-metadata. https://github.com/apache/arrow/blob/master/format/Message.fbs#L28 If we embed an array of min/max

Pcap2Arrow - Packet capture and data conversion tool to Apache Arrow on the fly

2021-02-15 Thread Kohei KaiGai
Hello, Let me share my recent works below: https://github.com/heterodb/pg-strom/wiki/804:-Pcap2Arrow This standalone command-line tool allows to capture network packets from network interface devices, and convert them into Apache Arrow data format according to the pre-defined data schema for

Re: Human-readable version of Arrow Schema?

2020-01-08 Thread Kohei KaiGai
Hello, pg2arrow [*1] has '--dump' mode to print out schema definition of the given Apache Arrow file. Does it make sense for you? $ ./pg2arrow --dump ~/hoge.arrow [Footer] {Footer: version=V4, schema={Schema: endianness=little, fields=[{Field: name="id", nullable=true, type={Int32}, children=[],

Re: How about inet4/inet6/macaddr data types?

2019-04-30 Thread Kohei KaiGai
an see a UUID type I have defined and > serialized through Arrow's binary protocol machinery > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/extension_type-test.cc > > Thanks > Wes > > [1]: > https://github.com/apache/arrow/commit/a79cc8098831924179

Re: How about inet4/inet6/macaddr data types?

2019-04-30 Thread Kohei KaiGai
t; > On Monday, April 29, 2019, Kohei KaiGai wrote: > > > Hello folks, > > > > How about your opinions about network address types support in Apache > > Arrow data format? > > Network address always appears at network logs massively generated by > > any netwo

How about inet4/inet6/macaddr data types?

2019-04-29 Thread Kohei KaiGai
Hello folks, How about your opinions about network address types support in Apache Arrow data format? Network address always appears at network logs massively generated by any network facilities, and it is a significant information when people analyze their backward logs. I'm working on Apache

Re: Format specification document?

2019-01-05 Thread Kohei KaiGai
quot;pandas\0"). Value is at 0x0058 + 0x0010. Here is a int value: 0x03b4 (= 948byes), then the next byte (0x006c) begins the cstring body. ("{pandas_version ... ). I didn't follow the entire data file, however, it makes me more clear. Best regards, 2019年1月6日(日) 8:50 Wes McKinney : > > hi

Format specification document?

2019-01-03 Thread Kohei KaiGai
Hello, I'm now trying to understand the Apache Arrow format for my application. Is there a format specification document including meta-data layout? I checked out the description at: https://github.com/apache/arrow/tree/master/docs/source/format https://github.com/apache/arrow/tree/master/format