Hello,

I am working with the C/GLib Arrow interface to read Parquet files and I am 
having trouble accessing all of the file metadata.

Reading my file into Python and printing the metadata like this:
```
pq.ParquetFile('f1.parquet').metadata
```

Results in this metadata:
```
<pyarrow._parquet.FileMetaData object at 0x1176e8ea0>
  created_by: parquet-cpp-arrow version 5.0.0
  num_columns: 3
  num_rows: 10
  num_row_groups: 1
  format_version: 1.0
  serialized_size: 420
```

But reading the same file into the C/GLib interface and printing the metadata 
from this call (where the schema is from the same file):
```
garrow_schema_to_string_metadata(schema, trueGbooleanValue)
```

Results in this metadata, which is only the schema and doesn’t include any of 
the above metadata:
```
first-int-col: int64
str-col: string
second-int-col: int64
```

My specific question is: is it possible to easily get the number of rows of a 
Parquet file in the C/GLib Arrow library? (i.e., without having to read in the 
whole table), but I would also be interested in getting the rest of the 
metadata that is shown in pyarrow. I wasn’t able to find a way to do this in 
the C/GLib documentation, but feel like I must be missing something. Thank you.

Best,
Ben McDonald

Reply via email to