Hi

I'm curious if there's a way of creating a zero copy union of two tables.
Currently,  I'm augmenting an existing table by adding new columns (with
say moving averages - see snippet below). I do a pointer swap at the end
and release the memory of the old table (reset).

I wonder if it's more efficient if I created a new table with the new
columns and then created some kind of "zero-copy table union" of the new
table with the old table. Does that exist?

That said, perhaps the AddColumn method does re-use the existing table
memory location when it creates a "new Table".

arrow::Status AddMovingAverage(shared_ptr<arrow::Table>& table,
                               const std::string& colNameIn, int n,
                               const std::string& colNameOut) {
    auto vals = table->GetColumnByName(colNameIn);

    // calculate moving average vector
    vector<double> ma
    ....

    // convert vector to arrow array
    shared_ptr<arrow::Array> ma_arr;
    arrow::DoubleBuilder dbl_builder = arrow::DoubleBuilder();

    ARROW_RETURN_NOT_OK(dbl_builder.AppendValues(ma.begin(), ma.end()));
    ARROW_ASSIGN_OR_RAISE(ma_arr, dbl_builder.Finish());
    // LOG(INFO) << ma_arr->ToString() << std::endl;

    // add new column to table (need to convert to chunked array first)
    auto f0 = arrow::field(colNameOut, arrow::float64());
    auto ma_chunked_arr = std::make_shared<arrow::ChunkedArray>(ma_arr);

    // Can this be done more efficiently with copying the original table to
    // a new memory location?
    ARROW_ASSIGN_OR_RAISE(auto new_table,
                          table->AddColumn(0, f0, ma_chunked_arr));

    // swap pointer to new table and clean up
    table.swap(new_table);
    new_table.reset();

    return arrow::Status::OK();
}

Reply via email to