[C++] Purpose of C++ bundled dependencies

2022-08-03 Thread Will Jones
I was creating this ticket ARROW-17295 [1], but ended up unsure if this is something we'd like to maintain, so I thought I would bring it up for discussion. Essentially: should we expand the capabilities of our bundled dependency system? Or should we constrain the scope and point users that wish

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-03 Thread Gosh Arzumanyan
Hi team! 2cents(maybe less): if I get the idea right, StringView data type might be very handy/optimal for cases where users already have string data in some other formats available (e.g. std::unordered_map, flat buffer structures etc.) Off which record batches are created and shipped to the

Re: [QUESTION] How is mmap implemented for 8bit padded files?

2022-08-03 Thread Antoine Pitrou
Le 03/08/2022 à 18:29, Jorge Cardoso Leitão a écrit : Hi Antoine, Thanks a lot for your answer. So, if I understand (I may have not), we do not impose restrictions to the alignment of the data when we get the pointer; only when we read from it. Doesn't this require checking for alignment at

Re: [QUESTION] How is mmap implemented for 8bit padded files?

2022-08-03 Thread Jorge Cardoso Leitão
Hi Antoine, Thanks a lot for your answer. So, if I understand (I may have not), we do not impose restrictions to the alignment of the data when we get the pointer; only when we read from it. Doesn't this require checking for alignment at runtime? Best, Jorge On Tue, Aug 2, 2022 at 6:59 PM

[QUESTION][C#] Append values to PrimitiveArray

2022-08-03 Thread Aleksei Smirnov
Hello All,   I am experiencing  issues with finding documentation on .Net API of apache arrow. Are there any examples or sample projects, that can be used for learning?   Currently I am  trying to implement ability to append extra values to existing primitives arrays, but I was not able to

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-03 Thread Antoine Pitrou
Le 03/08/2022 à 16:19, Lee, David a écrit : There are probably two ways to approach this. Physically store the json as a UTF8 string Or Physically store the json as nested lists and structs. This works if all JSON values follow a predefined schema, which is not necessarily the case.

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-03 Thread Weston Pace
I think, from a compute perspective, one would just cast before doing anything. So you wouldn't need much beyond parse and unparse. For example, if you have a JSON document and you want to know the largest value of $.weather.temperature then you could do...

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-03 Thread Lee, David
There are probably two ways to approach this. Physically store the json as a UTF8 string Or Physically store the json as nested lists and structs. This is more complicated and ideally this method would also support including json schemas to help address missing values and round trip

Re: Help with writing/reading from s3

2022-08-03 Thread Li Jin
Thanks! Removing the "gs://" prefix indeed fixes it. On Tue, Aug 2, 2022 at 4:01 PM Will Jones wrote: > Hi Li Jin, > > I'm not sure yet what changed, but I believe you can fix that error simply > by omitting the scheme prefix from the URI and just use the page when > loading the dataset. Here's

Re: [RESULT][VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-03 Thread Neal Richardson
CRAN is closed for new submissions until August 5, so I'll submit the R package next week. On Wed, Aug 3, 2022 at 7:19 AM Krisztián Szűcs wrote: > Below is the current status of the post release tasks, I > "soft-assigned" a couple of them to the relevant maintainers, could > you please help

Re: [RESULT][VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-03 Thread Krisztián Szűcs
Below is the current status of the post release tasks, I "soft-assigned" a couple of them to the relevant maintainers, could you please help with those? - [done] Make the released version as “RELEASED” on JIRA - [done] Start the new version on JIRA on the ARROW project - [done] Upload source -

[RESULT][VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-03 Thread Krisztián Szűcs
Hi, The vote carries with 4 +1 binding votes, 2 +1 non-binding votes and no -1 votes. I'm starting to work on the post-release tasks and keep this thread updated about the current status. Thanks everyone! - Krisztian On Wed, Aug 3, 2022 at 6:25 AM Yibo Cai wrote: > > +1 (binding) > > Verified

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-03 Thread Lee, David
While I do like having a json type, adding processing functionality especially around compute capabilities might be limiting. Arrow already supports nested lists and structs which can cover json structures while offering vectorized processing. Json should only be a logical representation of