Hi Arthur,

I'm not very clear about the usecase here. Just to clarify, in your
original parquet file, do you have List<int64> typed columns?

On Wed, Nov 16, 2022 at 8:02 AM Arthur Passos <[email protected]> wrote:

> Hi
>
> I am reading a parquet file with arrow::RecordBatchReader and the
> arrow::Table returned contains columns with two chunks
> (column->num_chunks() == 2). The column in question, although not limited
> to, is of type Array(Int64).
>
> I want to extract the data (nested column data) as well as the offsets
> from that column. I have found only one example
> <https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row_wise_conversion_example.cc#L121>
>  of Array columns and it assumes the nested type is known at compile time
> AND the column has only one chunk.
>
> I have tried to loop over the Array(Int64) column chunks and grab the
> `values()` member, but for some reason, for that specific Parquet file, the
> values member point to the same memory location. Therefore, if I do
> something like the below, I end up with duplicated data:
>
> static std::shared_ptr<arrow::ChunkedArray> 
> getNestedArrowColumn(std::shared_ptr<arrow::ChunkedArray> & arrow_column)
> {    arrow::ArrayVector array_vector;    
> array_vector.reserve(arrow_column->num_chunks());    for (size_t chunk_i = 0, 
> num_chunks = static_cast<size_t>(arrow_column->num_chunks()); chunk_i < 
> num_chunks; ++chunk_i)      {          arrow::ListArray & list_chunk = 
> dynamic_cast<arrow::ListArray &>(*(arrow_column->chunk(chunk_i)));          
> std::shared_ptr<arrow::Array> chunk = list_chunk.values();          
> array_vector.emplace_back(std::move(chunk));      }    return 
> std::make_shared<arrow::ChunkedArray>(array_vector);
> }
>
>
> I can provide more info, but to keep the initial request short and simple,
> I'll leave it at that.
>
> Thanks in advance,
> Arthur
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to