Hi Antoine,

Thanks for the fast reply.

1/2)Regarding the immutability of the data, I can work around this.
3)For my use case some kind of compression would be very beneficial,
perhaps this might come in the spec some day
4)The app uses mimalloc/jemalloc so that will work nicely
5)Ok I was worried that some static variables would result in lots of ross
NUMA memory access
6)Thanks I'll look into the Executor

Matthieu




Op di 15 mrt. 2022 om 12:53 schreef Antoine Pitrou <[email protected]>:

>
> Hello Matthieu,
>
> On Tue, 15 Mar 2022 12:28:17 +0100
> Matthieu Bolt <[email protected]> wrote:
> > Dear Arrow developers,
> >
> > I'm investigating if the Arrow library would be useful in our server
> > backend application and I am having some questions:
> >
> > 1) How can a value in an Array/Table be updated? In the examples that I
> > have seen a table is constructed using ArrayBuilders, which results in
> > Arrays that can be used to construct a Table with a Schema. It is unclear
> > to me how to update a value once this process has been executed. Perhaps
> > updating should be implemented in terms of Slicing/RecordBatches instead
> of
> > Tables? Or is Arrow more suitable for static data and updating values
> does
> > not fit into the general idea of Arrow.
>
> Arrow C++ is built around the idea of immutable data, so indeed the
> Array/Table/etc. objects are not suitable for updating values once you
> have generated them.  Immutable data greatly simplifies data access and
> eliminates synchronization costs (contention on locks etc.)
>
> > 2) If updating is not possible to implement for all types of Arrays, is
> > this a reasonable feature request for a DictionaryArray?
>
> Neither. A dictionary array is just another kind of Arrow array and is
> immutable like the others.
>
> > 3) Does the StringDictionaryBuilder execute some fancy run length
> > encoding/zipping in Finish? If not, is this a reasonable feature request?
>
> The ArrayBuilders produce data conformant to the Arrow in-memory format
> specification (*), which doesn't have a run length encoding. So the
> answer is "no" to both questions :-)
>
> (if the Arrow spec ever gets a run length encoding option, then of
> course it will have to be implemented in the Arrow C++ library)
>
> (*) https://arrow.apache.org/docs/format/Columnar.html
>
> > 4) Do all memory allocations occur in a given MemoryPool? More
> specifically
> > if a (NUMA aware) allocator is provided where possible in the API (by
> > subclassing MemoryPool?) will this allocator then be used for all
> > allocations?
>
> It will be used whenever you pass that MemoryPool to Arrow C++ APIs,
> yes.
> (if not, it's a bug which you should report on our bug tracker)
>
> Before you write your own MemoryPool implementation, though, I suggest
> you try the "standard" memory pools provided by Arrow C++ (jemalloc,
> mimalloc, system) to see if one of them already fits the bill.
>
> > 5) Does the Arrow library have static variables (not constexpr) that are
> > frequently accessed or that allocate memory during compute function
> > execution?
>
> The question is a bit unspecific. Most existing compute functions do not
> need persistent state, so the answer would be no. What are you concerned
> about?
>
> > 6) How can application threads be provided to the compute framework,
> > something like asio::io_context? Other than a bool async_mode in
> ExecPlan I
> > couldn't find anything in the API related to multi threading.
>
> The ExecContext you pass to ExecPlan can be customized with an Executor.
>
> Regards
>
> Antoine.
>
>
>

Reply via email to