Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Felipe Oliveira Carvalho
On Wed, Jun 14, 2023 at 3:07 PM Andrew Lamb wrote: > Arrow has at least 7 native "official" implementations (Java, Rust, Golang, > C#, Javascript, Julia and C++), 5 bindings on C++ (C, Ruby, Python, R, and > Matlab) and likely other implementations (like arrow2 in rust) Yes, the introduction

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Weston Pace
> Can't implementations add support as needed? I assume that the "depending on what support [it] aspires to" implies this, but if a feature isn't used in a community then it can leave it unimplemented. On the flip side, if it is used in a community (e.g. C++) is there no way to upstream it without

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Antoine Pitrou
So each community would have its own version of the Arrow format? Le 14/06/2023 à 22:47, Aldrin a écrit : > Arrow has at least 7 native "official" implementations... 5 bindings on C++... and likely other implementations (like arrow2 in rust)  I think it is worth remembering that

pyarrow Table.from_pylist doesn;t release memory

2023-06-14 Thread Jerald Alex
Hi Experts, Pyarrow *Table.from_pylist* does not release memory until the program terminates. I created a sample script to highlight the issue. I have also tried setting up `pa.jemalloc_set_decay_ms(0)` but it didn't help much. Could you please check this and let me know if there are potential

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Antoine Pitrou
Not to mention third-party systems able to consume Arrow data, without relying on any of the official implementations. Regards Antoine. Le 14/06/2023 à 20:06, Andrew Lamb a écrit : Arrow has at least 7 native "official" implementations (Java, Rust, Golang, C#, Javascript, Julia and C++),

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Andrew Lamb
Arrow has at least 7 native "official" implementations (Java, Rust, Golang, C#, Javascript, Julia and C++), 5 bindings on C++ (C, Ruby, Python, R, and Matlab) and likely other implementations (like arrow2 in rust) I think it is worth remembering that depending on what level of support ListView

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Felipe Oliveira Carvalho
General approach to alternative formats aside, in the specific case of ListView, I think the implementation complexity is being overestimated in these discussions. The C++ Arrow implementation shares a lot of code between List and LargeList. And with some tweaks, I'm able to share that common

Re: [DISCUSS][Format] Alternative layouts (was: implementation of the ArrayView array format)

2023-06-14 Thread Antoine Pitrou
Le 14/06/2023 à 17:08, Weston Pace a écrit : Also, I'm very lukewarm towards the concept of "alternative layouts" suggested somewhere else in this thread. It does not seem a good choice to complexify the Arrow format that much. I think, in my opinion, this depends on how many of these

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Weston Pace
> perhaps we could support this use-case as > a canonical extension type over dictionary encoded, variable-sized > arrays I believe this suggestion is valid and could be used to solve the if-else case. The algorithm, if I understand it, would be roughly: ``` // Note: Simple pseudocode,

Arrow R package development sync call - Thursday 15th June at 16:30 UTC (12:30 ET)

2023-06-14 Thread Nic Crane
The fortnightly Arrow R package dev community call is on Thursday 15th June at 16:30 UTC (12:30 ET). Joining instructions are below. Video call link: https://meet.google.com/dbm-ybmv-evb Phone numbers: https://tel.meet/dbm-ybmv-evb?pin=9199558189233 The meeting notes can be found here; please

Re: Group rows in a stream of record batches by group id?

2023-06-14 Thread Haocheng Liu
Hi Jerry, I asked similar questions on how to "write the data iteratively in smaller quantities over successive writes?" as hive partitioned parquet months ago and the reply from Weston was extremely helpful to me. Here are the related threads on how to use acero

RE: Group rows in a stream of record batches by group id?

2023-06-14 Thread Jerry Adair
Hi Weston (and dev group), Speaking of the grouper in the C++ library and writing partitioned data, I had a tangential question if I may. I noticed in the example C++ source that an Arrow table, then in-memory dataset were created first, followed by a writing of the data to a partitioned data

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Antoine Pitrou
I agree that ListView cannot be an extension type, given that it features a new layout, and therefore cannot reasonably be backed by an existing storage type (AFAICT). Also, I'm very lukewarm towards the concept of "alternative layouts" suggested somewhere else in this thread. It does not

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-14 Thread Raphael Taylor-Davies
Hi All, I might be missing something, but rather than opening the can of worms of alternative layouts, etc... perhaps we could support this use-case as a canonical extension type over dictionary encoded, variable-sized arrays. I'll try to explain my reasoning below, but the major advantage