Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-10 Thread Dimitri Vorona
Hi Wes, cool initiative! Reminded me of "Building Advanced SQL Analytics From Low-Level Plan Operators" from SIGMOD 2021 ( http://db.in.tum.de/~kohn/papers/lolepops-sigmod21.pdf) which proposes a set of building block for advanced aggregation. Cheers, Dimitri. On Thu, Aug 5, 2021 at 7:59 PM Juli

[jira] [Created] (ARROW-3734) [C++] Linking static zstd library fails on Arch x86-64

2018-11-09 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-3734: - Summary: [C++] Linking static zstd library fails on Arch x86-64 Key: ARROW-3734 URL: https://issues.apache.org/jira/browse/ARROW-3734 Project: Apache Arrow

[jira] [Created] (ARROW-3662) [C++] Add a const overload to MemoryMappedFile::GetSize

2018-10-31 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-3662: - Summary: [C++] Add a const overload to MemoryMappedFile::GetSize Key: ARROW-3662 URL: https://issues.apache.org/jira/browse/ARROW-3662 Project: Apache Arrow

[jira] [Created] (ARROW-3660) [C++] Don't unnecessary lock MemoryMappedFile for resizing in readonly files

2018-10-31 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-3660: - Summary: [C++] Don't unnecessary lock MemoryMappedFile for resizing in readonly files Key: ARROW-3660 URL: https://issues.apache.org/jira/browse/ARROW-3660 Pr

spectrum.chat as community channel

2018-09-07 Thread Dimitri Vorona
Hi everybody, I wanted to bring spectrum.chat (https://spectrum.chat) to your attention. It is a community communication platform which seeks to combine the advantages of mailing lists (searchable, easily accessible) with the interactivity of chats. A spectrum community contains multiple threads,

Re: Developing native Arrow interfaces to database protocols

2018-08-21 Thread Dimitri Vorona
Hi Wes, I would personally be very interested in this project and see it as huge extension of Arrow's capabilities. I actually experimented with integration of Arrow into a main-memory db (HyPer [0]) though I might have had a slightly different focus. The way I took was to compile the export/impo

[jira] [Created] (ARROW-2837) [C++] ArrayBuilder::null_bitmap returns PoolBuffer

2018-07-12 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2837: - Summary: [C++] ArrayBuilder::null_bitmap returns PoolBuffer Key: ARROW-2837 URL: https://issues.apache.org/jira/browse/ARROW-2837 Project: Apache Arrow

[jira] [Created] (ARROW-2835) [C++] ReadAt/WriteAt are inconsistent with moving the files position

2018-07-12 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2835: - Summary: [C++] ReadAt/WriteAt are inconsistent with moving the files position Key: ARROW-2835 URL: https://issues.apache.org/jira/browse/ARROW-2835 Project: Apache

Re: Uninitialized buffer memory leads to buffer warnings

2018-07-09 Thread Dimitri Vorona
dea; I'd be interested to look > at the diff to see how many code paths are impacted. We already have a > number of places where we are zeroing buffers but as you have found it > is not 100% consistent. > > - Wes > > On Tue, Jul 3, 2018 at 10:13 AM, Dimitri Vorona > wro

[jira] [Created] (ARROW-2790) [C++] Buffers contain uninitialized memory

2018-07-04 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2790: - Summary: [C++] Buffers contain uninitialized memory Key: ARROW-2790 URL: https://issues.apache.org/jira/browse/ARROW-2790 Project: Apache Arrow Issue Type

Re: Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Dimitri Vorona
lead to security issues. > > I'd go for option 2. Option 3 sounds much more costly (we would be > zero-initializing large memory areas instead of small padding areas). > > Regards > > Antoine. > > > Le 03/07/2018 à 13:11, Dimitri Vorona a écrit : > > Hi all,

Uninitialized buffer memory leads to buffer warnings

2018-07-03 Thread Dimitri Vorona
Hi all, currently, running json-integration-test with valgrind leads to the following warning: "Syscall param write(buf) points to uninitialised byte(s)". This is caused by PrimitiveBufferBuilder not initializing its data memory. Note: we initialize null_bitmap_data_ by zeroing, i.e. setting all v

[jira] [Created] (ARROW-2784) [C++] MemoryMappedFile::WriteAt allow writing past the end

2018-07-02 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2784: - Summary: [C++] MemoryMappedFile::WriteAt allow writing past the end Key: ARROW-2784 URL: https://issues.apache.org/jira/browse/ARROW-2784 Project: Apache Arrow

Re: Recruiting more maintainers for Apache Arrow

2018-07-02 Thread Dimitri Vorona
Hi Wes, to contribute an outsiders POW: while it is clear, what's expected if you'd like to make a PR, it's not at all clear to me, where would I start if I wanted to help with PR reviews without being heavily involved with the community/being a full maintainer. Should I just grab a PR, test it, c

Re: Minimal GCC version

2018-06-25 Thread Dimitri Vorona
e soon to GCC 7+. The manylinux standard will also > publish a manylinux2010 tag that will upgrade their minimal GCC requirement > to 4.9. Both mentioned tools are used to provide binary Python packages. > > Cheers > Uwe > > On Mon, Jun 25, 2018, at 6:24 PM, Dimitri Vorona wrote

Minimal GCC version

2018-06-25 Thread Dimitri Vorona
Hi, I wondered what is the decision process behind the minimal supported GCC version (currently 4.8)? Is it something like "the default GCC in the oldest supported LTS Ubuntu"? Or maybe there are some ASF guidelines? Cheers, Dimitri.

Re: Building against arrow static library

2018-06-25 Thread Dimitri Vorona
Hi, I'd this a similar issue some time ago, and the solution was building after a clean checkout, which I interpreted as some kind of caching issue. Generally, I've found that starting with a clean checkout and following the steps from [0] never failed for me. Hope that helps! Cheers, Dimitri.

Re: Gandiva Initiative

2018-06-21 Thread Dimitri Vorona
Hey Jaques, Great stuff! I'm actually researching the integration of arrow and flight into a main memory database which also uses LLVM for dynamic query generation! Excited to have a more detailed look at Gandiva! Cheers, Dimitri. On Thu, Jun 21, 2018, 21:15 Jacques Nadeau wrote: > Hey Guys, >

[jira] [Created] (ARROW-2701) [C++] Make MemoryMappedFile resizable

2018-06-12 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2701: - Summary: [C++] Make MemoryMappedFile resizable Key: ARROW-2701 URL: https://issues.apache.org/jira/browse/ARROW-2701 Project: Apache Arrow Issue Type: New

Re: C++ RecordBatchWriter/ReadRecordBatch clarification

2018-04-17 Thread Dimitri Vorona
Hi Rares, you use a different reader for the RecordBatch streams. See arrow/ipc/ipc-read-write-test.cc:569-596 for the gist. Also, the second argument to arrow::RecordBatch::Make takes the number of rows in the batch, so you have to set it to 1 in your example. See https://gist.github.com/alendi

Re: Buffer slices are unsafe

2018-04-11 Thread Dimitri Vorona
hub.com/alendit/00e4bd5f0e8a79e8ff9ebd86995f3905 On Wed, Apr 11, 2018 at 12:35 PM, Antoine Pitrou wrote: > > Hi Dimitri, > > Le 11/04/2018 à 12:28, Dimitri Vorona a écrit : > > > > I think, it comes down to the memory ownership. While Buffer apparently > > never owns it's memory

Re: Buffer slices are unsafe

2018-04-11 Thread Dimitri Vorona
rk such cases as super unsafe, i.e. UnsafeBuffer. Maybe I'm overthinking it, but I can imagine, that it'll come to bite us, when the code base grows. Cheers, Dimitri. On Wed, Apr 11, 2018 at 11:19 AM, Antoine Pitrou wrote: > > Hi Dimitri, > > Le 11/04/2018 à 09:02, Dimitri Vo

Buffer slices are unsafe

2018-04-11 Thread Dimitri Vorona
Hi everybody, to continue the discussion in [0]: right now this [1] can happen and the sliced buffer has no way to foresee or to check against it beforehand. I'd suggest to create a new class SlicedBuffer, which would reference the parent buffer and return it's data() pointer, insted of grabbing

[jira] [Created] (ARROW-2330) Optimize delta buffer creation with partially finishable array builders

2018-03-20 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2330: - Summary: Optimize delta buffer creation with partially finishable array builders Key: ARROW-2330 URL: https://issues.apache.org/jira/browse/ARROW-2330 Project

[jira] [Created] (ARROW-2176) [C++] Extend DictionaryBuilder to support delta dictionaries

2018-02-19 Thread Dimitri Vorona (JIRA)
Dimitri Vorona created ARROW-2176: - Summary: [C++] Extend DictionaryBuilder to support delta dictionaries Key: ARROW-2176 URL: https://issues.apache.org/jira/browse/ARROW-2176 Project: Apache Arrow

Delta dictionaries: implementation

2018-02-05 Thread Dimitri Vorona
Hi, ARROW-1727 added format support for delta dictionaries. It makes possible to interleave record batches which contain dictionary encoded field with delta dictionary batches which add new dictionary entries. As far as I can see there is not implementation of this feature in cpp, yet. Is anyone