Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-24 Thread Jacques Nadeau
our valuable comments. > > Best, > Chao > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau wrote: > > > > Yes, that was roughly what I was requesting (I was suggesting a single PR > > with many commits that would be merged with the history). > > > > It'

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-18 Thread Jacques Nadeau
, we'd be happy to help further improve readability & > maintainability of the codebase and resolving issues raised from the > community. Will this work for you? really appreciate if you understand > our situation. > > Thanks, > Chao > > On Wed, Jan 17, 2024 at 11:30 AM

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
contain internal info which we need to > remove upon open sourcing. How about we just add a summary in the PR > itself, and add everyone that has contributed to it as co-author to > the PR? > > Chao > > On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau > wrote: > > > &

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
Hey Chao, it would be great for you to share the code some place with commit history. (PR to the repo that Andy made or something else.) On Mon, Jan 15, 2024 at 7:38 AM Andy Grove wrote: > Hi Chao, > > I have created https://github.com/apache/arrow-datafusion-comet and you > should be able to

Re: [VOTE] Substrait for Flight SQL

2022-09-08 Thread Jacques Nadeau
My vote continues to be +1 On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson wrote: > +1 > > Neal > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > +1 (non-binding) > > > > On Thu, Sep 8, 2022 at 9:41 AM Gavin Ray wrote: > > > > > Oh, so that's what "non-binding" means in vote threads > >

Re: [VOTE] Substrait for Flight SQL

2022-08-31 Thread Jacques Nadeau
+1 (binding) On Wed, Aug 31, 2022, 5:15 PM Larry White wrote: > +1 (non-binding) > > On Wed, Aug 31, 2022 at 7:55 PM Vinicius Fraga wrote: > > > +1 (non-binding) > > > > On Wed, 31 Aug 2022, 20:51 David Li, wrote: > > > > > Hello, > > > > > > I am proposing to extend the Flight SQL

Re: ARROW-11465

2022-05-18 Thread Jacques Nadeau
I second Weston's comments. The idea of separate files is part of the de jure spec but not the de facto one. It's up to the parquet community whether the de facto spec should be "altered" . Afaik, zero oss readers support use of this field. On Wed, May 18, 2022, 8:53 AM Weston Pace wrote: > I

Re: [Rust] Enable GitHub discussions for Rust projects?

2022-05-04 Thread Jacques Nadeau
No vote here but a little feedback. We've generally found Github Discussions somewhat lacking in Substrait. If other people find it good, great. I might be more inclined to just drive people to something like StackOverflow or the mailing list. We were initially quite enthusiastic but the

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-23 Thread Jacques Nadeau
I'm generally -0.01 against narrow decimals. My experience in practice has been that widening happens so quickly that they are little used and add unnecessary complexity. For reference, the original Arrow code actually implemented Decimal9 [1] and Decimal18 [2] but we removed both because of this

Re: [C++] output field names in Arrow Substrait

2022-04-23 Thread Jacques Nadeau
In the specification, there are both read and intermediate write rels. No one has implemented the protobuf yet for write. Both carry field names. The names of fields is an internal rel node concern just like condition is for filter. This is because many formats require names. For example, parquet

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-04-18 Thread Jacques Nadeau
I'm -0.9 on Arrow Compute engine. It makes it sound like it is THE canonical Arrow one, second classing Datafusion and Gandiva. No strong feelings on other names. Naming in general is an extremely subjective process... On Thu, Mar 31, 2022, 2:33 PM Weston Pace wrote: > I'm +1 for "arrow

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Jacques Nadeau
James, I agree that you could use JSON but that feels a bit hacky (mis-use of the paradigm). Instead, I'd really like to do something like David is suggesting: support Substrait as an alternative to a SQL string. Something like this:

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-04 Thread Jacques Nadeau
Hey Wes, thanks for bringing this up. And more importantly, thanks for working as the PMC chair this last year! I think Kouhei would be a great choice for the PMC chair. Jacques On Tue, Jan 4, 2022 at 12:44 AM Wes McKinney wrote: > hello all, > > As we discussed at the end of 2020 [1], we

Re: [RESULT][VOTE] Proposed addition to Arrow Flight: Arrow Flight RPC

2021-12-25 Thread Jacques Nadeau
That's great news. Congrats and thanks to the team who worked on it. This is a great addition to Arrow! On Thu, Dec 23, 2021, 11:26 AM David Li wrote: > The integration tests and existing PRs were merged into a separate branch. > We also merged in a few build fixes during final review. Just in

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-10 Thread Jacques Nadeau
I'm strongly in support of much of this. Thanks for bringing this up. It is long overdue. On initial read, my thoughts would be: Stongly inclined: - String view - constant view Weakly inclined - All null - rle Somewhat disinclined - Sequence change With dictionary and string view, I feel

[DISCUSS][FLIGHT SQL] Intentions around JDBC and/or ODBC for Flight SQL?

2021-12-09 Thread Jacques Nadeau
Hey all, I was curious if there was anyone planning on implementing JDBC and/or ODBC wrappers on top of the Flight SQL Java [1] and Flight SQL C++ implementations [2] since they seem to be completing soon. It seems like JDBC/ODBC could quickstart integration between Flight SQL and other

Re: Question about Arrow Mutable/Immutable Arrays choice

2021-11-03 Thread Jacques Nadeau
Hey Alessandro, take a look at the top level docs on ValueVector: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ValueVector.html Specifically the following: - values need to be written in order (e.g. index 0, 1, 2, 5) - null vectors start with all values as null

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-22 Thread Jacques Nadeau
. The proposal (actually, its 3rd iteration) is > > described here at https://openjdk.java.net/jeps/393, and has been > available > > as an incubator feature for several JDK releases (Javadoc: > > > https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubator.foreign/jdk/

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-07 Thread Jacques Nadeau
Clearly this patch was driven by an implicit set of needs but it's hard to guess at what they are. As Laurent asked, what is the main goal here? There may be many ways to solve this goal. Some thoughts in general: - The allocator is a finely tuned piece of multithreaded machinery that is used on

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

2021-09-30 Thread Jacques Nadeau
In the past I was dealing with something similar. My experience was when data was accepted at the edge, the cost of validating that the first offset is zero, the last is within the data bounds and that all others are equal or increasing was a reasonable overhead associated with validating offsets

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-20 Thread Jacques Nadeau
+1 on time variation. Please add me to to the invite. Thanks On Sun, Sep 19, 2021 at 9:49 PM Benson Muite wrote: > New to this. A suggestion may be to consider two of the times, eg. 4:00 > UTC and 16:00 UTC perhaps alternating allowing geographic diversity in > joining convenience. > > On

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-07 Thread Jacques Nadeau
gt; > > > > think that's a good follow up PR. Having separate representations > for > > > > > logical/physical plans seems like a waste of effort ultimately. I > > think > > > > we > > > > > can find a good balance. > &g

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-23 Thread Jacques Nadeau
In a lucky turn of events, Phillip actually turned out to be in my neck of the woods on Friday so we had a chance to sit down and discuss this. To help, I actually shared something I had been working on a few months ago independently (before this discussion started). For reference: Wes PR:

Re: [C++] Shall we modify the ORC reader?

2021-01-10 Thread Jacques Nadeau
I don't think 1 & 2 make sense. I don't think there are a lot of users reading 2gb strings or lists with 2B objects in them. Saying we just don't support that pattern seems fine for now. I also believe the string and list types have better cross-language support than the large variants. On Sun,

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau
> > I don’t have a problem with releasing out of branches. I think I (or > someone) proposed this in the past and there was not consensus but it seems > like a good time to revisit the issue. > Thanks for the recap. I just couldn't remember where people were at on this. I'm a big +1 for

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau
I'm catching up here. A couple questions. - I don't think we should require the inclusion of the release commits in the main branch. Having leafs created right before release seems to simplify this and resolve any issues around force PRs, no? Or maybe I'm misunderstanding something?

[ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-23 Thread Jacques Nadeau
I am pleased to announce that we have a new PMC chair and VP as per our newly started tradition of rotating the chair once a year. I have resigned and Wes was duly elected by the PMC and approved unanimously by the board. Please join me in congratulating Wes! Jacques

Re: October board report for Arrow

2020-10-11 Thread Jacques Nadeau
Hey all, with the focus on the PMC chair rotation discussion, we have a pretty thin report this month. I've added a few comments in the doc Wes posted. It would be great if others provided additional modifications:

Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow specification

2020-09-29 Thread Jacques Nadeau
+1 On Tue, Sep 29, 2020 at 11:19 AM Wes McKinney wrote: > +1 > > On Tue, Sep 29, 2020 at 4:07 AM Fan Liya wrote: > > > > +1 > > > > Best, > > Liya Fan > > > > On Tue, Sep 29, 2020 at 4:55 PM Antoine Pitrou > wrote: > > > > > > > > +1 (binding) > > > > > > I didn't look at the implementation.

Re: [DISCUSS] Rotating the PMC Chair

2020-09-29 Thread Jacques Nadeau
I'm super supportive of this, Julian. Thanks for bringing it up. Unlike some leaders, I'm even happy to guarantee a peaceful transition of power! Re now vs Feb 17: I'm totally open to either. In general, I'm a do it now kind of person so if others think a slightly longer tenure sounds good, we

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-10 Thread Jacques Nadeau
e/arrow/pull/8147 > > On Fri, Mar 13, 2020 at 9:47 PM Fan Liya wrote: > >> Hi Jacques, >> >> Thanks a lot for your valuable comments. >> >> I agree with you that collapsing nullable and non-nullable >> implementations is a good idea, and it does not contradic

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
And yes, for those of you looking closely, I commented on ARROW-245 when it was committed. I just forgot about it. It looks like I had mostly the same concerns then that I do now :) Now I'm just more worried about format sprawl... On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau wrote: > What

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
> > What do you mean? The Endianness field (a Big|Little enum) was added 4 > years ago: > https://issues.apache.org/jira/browse/ARROW-245 I didn't realize that was done, my bad. Good example of format rot from my pov.

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-30 Thread Jacques Nadeau
gt; > fully locally? How many additional PRs will be needed and what do > > > > > they look like (I think there already a few more in the queue)? > > > > > > > > > > * Will it introduce performance regressions? > > > > > > &g

Re: Gandiva and Threads

2020-08-14 Thread Jacques Nadeau
@ravin...@dremio.com @prav...@dremio.com thoughts? On Tue, Jul 28, 2020 at 3:39 PM Wes McKinney wrote: > Perhaps Gandiva does not handle sliced arrays properly? This would be > worth investigating > > On Mon, Jul 27, 2020 at 7:43 PM Matt Youill > wrote: > > > > Managed to track down the

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-14 Thread Jacques Nadeau
+1, let's be cautious adding these kinds of things. On Wed, Aug 5, 2020 at 5:49 AM Wes McKinney wrote: > I also am not sure there is a good case for a new built-in type since it > introduces a good deal of complexity, particularly when there is the > extension type option. We’ve been living

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-08-14 Thread Jacques Nadeau
Do we have a good definition of what is necessary to add a new data type? Adding a type but not pulling it through most of the code seems less than ideal since it means one part of Arrow doesn't work with another (providing a less optimal end-user experience). For example, would this work include

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-14 Thread Jacques Nadeau
Per my comments there, the introduction of field buffers was added as part of the fieldvector addition when we have vectors that weren't field level. This meant that getbuffers and getfieldbuffers were at different levels at hierarchy (getbuffers being more general). I believe we no longer have

Re: [DISCUSS] Adding a pull-style iterator API to the C data interface

2020-08-14 Thread Jacques Nadeau
I think this unlocks a bunch of use cases. I think people are generally using Arrow in simpler, non-streaming ways right now and thus the quiet. Producing an iterator pattern is logical as you move to streams of smaller chunks (common in distributed and multi-tenant systems). On Mon, Aug 10, 2020

Re: [Java] Supporting Big Endian

2020-08-14 Thread Jacques Nadeau
Hey Micah, thanks for starting the discussion. I just skimmed that thread and it isn't entirely clear that there was a conclusion that the overhead was worth it. I think everybody agrees that it would be nice to have the code work on both platforms. On the flipside, the code noise for a rare case

Re: [ext] Re: language independent representation of filter expressions

2020-07-24 Thread Jacques Nadeau
t; typed when I think fields should just contain a field name. > > -Original Message- > From: Jacques Nadeau > Sent: Thursday, July 23, 2020 10:14 PM > To: dev > Subject: [ext] Re: language independent representation of filter > expressions > > Have you tried to

Re: language independent representation of filter expressions

2020-07-23 Thread Jacques Nadeau
> On 2020/07/13 09:21:19, Antoine Pitrou wrote: > > On Sat, 11 Jul 2020 09:55:16 -0700 > > Jacques Nadeau wrote: > > > > > > I'm against extending use of flatbuf within Arrow. The language > support is > > > too weak. Language support isn't just abou

Re: [DISCUSS] Using direct memory size as a limit of populated off-heap buffers in Java

2020-07-23 Thread Jacques Nadeau
I'd like to simplify this discussion and start with clarity of use case. If we're talking about a Java developer using the datasets API in a java application, we should respect the Java direct memory size limits set via -XX:MaxDirectMemorySize. Doing something else would violate the principle of

Re: Writing very large rowgroups to Apache Parquet

2020-07-17 Thread Jacques Nadeau
ected to be at least 5mb if I read their docs correctly >> [1]) >> >> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html >> >> >> On Saturday, July 11, 2020, Jacques Nadeau wrote: >> >> > I'd suggest a new write pattern. Write the columns page

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Jacques Nadeau
I'd suggest a new write pattern. Write the columns page at a time to separate files then use a second process to concatenate the columns and append the footer. Odds are you would do better than os swapping and take memory requirements down to page size times field count. In s3 I believe you could

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
For reference, the doc (from eight years ago) I meant to link in my initial message was: https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote: > On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau >

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > I would also be interested in having a reusable serialized format for > filter- and projection-like expressions. I think trying to go so far > as full logical query plans suitable for building a SQL engine is > perhaps a bit too far but we

Re: Renaming master branch, removing blacklist/whitelist

2020-06-24 Thread Jacques Nadeau
Hi Suvayu, thanks for sharing your experiences. Clearly we have work to do. Wrt to specific name changes, I agree with Wes. If something is negative to a non-trivial portion of the population, why not use something that avoids that issue where possible. On Fri, Jun 19, 2020, 7:44 PM Suvayu Ali

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Jacques Nadeau
Per my comments on the pr, I also think this is preferred. I believe we will avoid the potential for validity inconsistency and simplify construction of union data in most cases. On Wed, Jun 24, 2020, 7:58 AM Wes McKinney wrote: > hi folks, > > As discussed on the recent GitHub PR [1], as a

Re: Arrow Flight connector for SQL Server

2020-05-19 Thread Jacques Nadeau
Hey Brendan, Welcome to the community. At Dremio we've exposed flight as an input and output for sql result datasets. I'll have one of our guys share some details. I think a couple questions we've been struggling with include how to standardize additional metadata operations, what should the

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Jacques Nadeau
Generally Ive found that this isnt an important optimization in the use cases we see. Memory overhead, especially with our Java shared allocation scheme is nominal. Optimizing null checks at the word level usually is much more impactful since non null and null runs are much more common on a

Re: JDBC / Flight questions

2020-01-29 Thread Jacques Nadeau
At Dremio we have two things at the moment: A JDBC driver that is built on Arrow and served as the inspiration for some of the design choices in flight [1] A preview flight connector that doesn't yet expose JDBC [2] These the former is built on Avatica (part of the Apache Calcite project) so the

Re: [Format] Array/RowBatch filters

2020-01-26 Thread Jacques Nadeau
At Dremio, we use four main types of selection vector/bitmaps: Dense Format (record valid or not, no ordering) - single bit (bitmap) Sparse formats (identifies valid records as well as their order) - 2 byte (for record batches up to 2^16 records). - 4 byte (for 2^16 batches of 2^16 records); - 6

Re: [DISCUSS] C Data Interface, take 2

2020-01-21 Thread Jacques Nadeau
u want to try to > block the C++ contributors from doing this we may be barreling toward > a governance crisis in the project. I'm stepping back from this > discussion for a time now to allow others to catch up on the > discussion and to weigh in as needed > > On Mon, Jan 20, 2020 a

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau
ucture). We should not > advertise this as being a part of the project specification. > > - Wes > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > wrote: > > > > As I noted on the pull request, I think fundamentally this work is at > odds > > with the Arrow s

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
> > I think what we have determined is that the changes that are being > discussed in this thread would not render any existing serialized > Flatbuffers unreadable, unless they are malformed / unable to be > read with the current libraries. > I think we need to separate two different things:

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau
iding a C-header-based data > interface to the C++ project only. That was the original problem > statement and it seems in attempting to make it useful beyond C++ has > made it difficult to reach consensus. > > Thanks > Wes > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
> > To be clear, I agree that we need to check that our various validation > and integration suites pass properly. But once that is done and > assuming all the metadata variations are properly tested, data > variations should not pose any problem. > Unless I'm misunderstanding your proposal,

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
I think it is too late in the game to make this fundamental change. It would be very hard to assess whether it is no op or has massive implications to existing datasets. Just among Dremio customers in the 30 days we stored more than 100mm datasets that leveraged the current format. I'm supportive

Re: [Java] Large Memory Allocators (Taking a dependency on JNA?)

2020-01-19 Thread Jacques Nadeau
It seems like jna is overkill & unnecessary for simply allocating/freeing memory. A simple way to do this is either to use unsafe directly or call the existing netty unsafe facade directly. PlatformDependent.allocateMemory(long) PlatformDependent.freeMemory(long) Should be relatively

[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-10 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7549: - Summary: [Java] Reorganize Flight modules to keep top level clean/organized Key: ARROW-7549 URL: https://issues.apache.org/jira/browse/ARROW-7549 Project: Apache

Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau
nk we should try to be more conservative about what > issues we pre-emptively assign fix versions -- there may be a more > constructive way that we can prioritize issues and distinguish between > "optimistic" / nice-to-have issues and "must do to release" issues. &

Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau
> > I agree on a 0.16.0 release. In the meantime I'll try to help out > with > > > > getting the Java side ready for 1.0. > > > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya > wrote: > > > > > > > > > Hi Jacques, > > > > > &

[jira] [Created] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7534: - Summary: Create a new java/contrib module Key: ARROW-7534 URL: https://issues.apache.org/jira/browse/ARROW-7534 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package

2020-01-09 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7533: - Summary: [Java] Move ArrowBufPointer out of the java the memory package Key: ARROW-7533 URL: https://issues.apache.org/jira/browse/ARROW-7533 Project: Apache Arrow

Re: [DRAFT] Apache Arrow Board Report January 2020

2020-01-09 Thread Jacques Nadeau
Posted with correction. Thanks to Wes, Antoine and Todd! On Wed, Jan 8, 2020 at 10:15 AM Wes McKinney wrote: > Not sure what happened there. The two words after "grow" can be removed > > ## Description: > > The mission of Apache Arrow is the creation and maintenance of software > related > to

Re: Pending Java pull requests

2020-01-09 Thread Jacques Nadeau
I think there are a decent chunk that are of questionable value. We need to be more willing to simply reject requests rather than leave them in no-man's land. I'll try to do a pass through and help dispatch, etc. On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs wrote: > Hi, > > Roughly 40% of the

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau
I guess we'd still need to introduce a way to nest, it only has type representation. On Sat, Jan 4, 2020 at 2:16 PM Jacques Nadeau wrote: > What do people think about using the C interface representation? > > On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield > wrote: > &g

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau
What do people think about using the C interface representation? On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield wrote: > I opened https://github.com/google/flatbuffers/issues/5688 to try to get > some clarity. > > On Tue, Dec 24, 2019 at 12:13 PM Wes McKinney wrote: > > > On Tue, Dec 24,

Re: Looking to 1.0

2020-01-04 Thread Jacques Nadeau
> Liya Fan > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau wrote: > > > I identified three things in the java library that I think are top of > mind > > and should be fixed before 1.0 to avoid weird incompatibility changes in > > the java apis (technical debt).

Re: Looking to 1.0

2020-01-03 Thread Jacques Nadeau
I identified three things in the java library that I think are top of mind and should be fixed before 1.0 to avoid weird incompatibility changes in the java apis (technical debt). I've tagged them as pre-1.0 as I don't exactly see what is the right way to tag/label a target release for a ticket.

[jira] [Created] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7495: - Summary: [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager Key: ARROW-7495 URL: https://issues.apache.org/jira/browse/

[jira] [Created] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7494: - Summary: [Java] Remove reader index and writer index from ArrowBuf Key: ARROW-7494 URL: https://issues.apache.org/jira/browse/ARROW-7494 Project: Apache Arrow

Re: [DISCUSS] C Data Interface, take 2

2019-12-21 Thread Jacques Nadeau
Thanks for addressing my comments. I'm actively reviewing the proposal. It is taking me more time than I would like given the time of the year but I want to make sure that you know that I'm looking at it and hope to provide additional feedback beyond that which I've provided thus far on the PR.

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau
, 2019 at 11:15 AM Jacques Nadeau wrote: > I question the value of adding the Orc format. The format is fragmented > with the main tool writing it (hive) writing a version of the format (acid > v2) that can't be consumed by systems that only use the Orc libraries > (since they don't

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau
I question the value of adding the Orc format. The format is fragmented with the main tool writing it (hive) writing a version of the format (acid v2) that can't be consumed by systems that only use the Orc libraries (since they don't support acid). If you want to consume that data, you have to

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-12-13 Thread Jacques Nadeau
hes with different schemas > > in the same stream, though with some added complexity on each side > > > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau > wrote: > >> > >> I'd vote for explicitly not supported. We should keep our primitives > &g

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-06 Thread Jacques Nadeau
-1 (binding) I'm voting -1 on this. I posted the thinking why on the PR. The high-level is that I think it needs to better address the pipelined use case as right now it fails to support that at all and has too much weight to ignore that use case. I actually would have posted it here but totally

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-28 Thread Jacques Nadeau
nd require multiple calls and coordination > with the deployment topology) in order to accomplish this? > > Best, > David > > On 11/27/19, Jacques Nadeau wrote: > > Fair enough. I'm okay with the bytes approach and the proposal looks good > > to me. > > > >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-27 Thread Jacques Nadeau
> >>> > > > > >>> > > > Regards > >>> > > > > >>> > > > Antoine. > >>> > > > > >>> > > > > >>> > > > Le 21/10/2019 à 15:46, David Li a écr

[jira] [Created] (ARROW-7198) [Java] Allow a user to provide an alternative "chunk" allocator

2019-11-17 Thread Jacques Nadeau (Jira)
Jacques Nadeau created ARROW-7198: - Summary: [Java] Allow a user to provide an alternative "chunk" allocator Key: ARROW-7198 URL: https://issues.apache.org/jira/browse/ARROW-7198 Proje

Re: [DISCUSS][Java] Builders for java classes

2019-10-27 Thread Jacques Nadeau
+1 on the idea of enhancing builder interfaces. >>IntVectorBuilder addAll(int[] values); Let's make sure that anything like the above is efficient. People will judge the quality of the project on the efficiency of the methods we provide. If everybody starts using int[] to build Arrow vectors, we

Re: [Rust] DataFusion benchmarks

2019-10-20 Thread Jacques Nadeau
Super cool. Thanks for sharing! On Sun, Oct 20, 2019 at 10:52 AM Andy Grove wrote: > Now that the DataFusion query execution code has been re-written to use a > physical query plan with support for multi-threaded execution, I have > started running some benchmarks again. Here are the results so

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-20 Thread Jacques Nadeau
etadata field, but oneof prevents that from happening, and > overall having a clear separation between data and control messages is > cleaner. > > As for using Protobuf's Any: so far, we've refrained from exposing > Protobuf by using bytes, would we want to change that now? > >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-16 Thread Jacques Nadeau
quite a while). > > Thanks, > David > > On 10/15/19, Jacques Nadeau wrote: > > I like it. Added some comments to the doc. Might worth discussion here > > depending on your thoughts. > > > > On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > > > >>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-15 Thread Jacques Nadeau
I like it. Added some comments to the doc. Might worth discussion here depending on your thoughts. On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > Hey Ryan, > > Thanks for the comments. > > Concrete example: I've edited the doc to provide a Python strawman. > > Sync vs async: while I don't

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
wrote: > > It's good with me. > > Regards > > Antoine. > > > Le 10/10/2019 à 22:51, Jacques Nadeau a écrit : > > Antoine, is my synopsis fair? > > > > On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney > wrote: > > > >> +1

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Antoine, is my synopsis fair? On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney wrote: > +1 > > On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: > > > Proposed report update below. LMK your thoughts. > > > > ## Description: > > The mission of Apache A

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Proposed report update below. LMK your thoughts. ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: * We are struggling with Continuous Integration scalability as the project has

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Arg... accidental send before ready. What do think about the statement below for community health? Does it fairly capture the concerns/perspective? On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau wrote: > Many contributors are struggling with the slowness of pre-commit CI. Arrow > has a

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
y CI capacity has been a "hot topic as of late": > > > https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E > > > > (I didn't know this list -- bui...@apache.org -- existed, by the way) > > > > Regards > > > > An

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau
monitor any potentially destructive > actions that they may take, such as modifying unrelated repository > webhooks related to IP provenance. > > - Wes > > On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau wrote: > > > > I think we need to more direct in listing issues for the

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau
I think we need to more direct in listing issues for the board. What have we done? What do we want them to do? In general, any large org is going to be slow to add new deep integrations into GitHub. I don't think we should expect Apache to be any different (it took several years before we could

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Jacques Nadeau
buffers aren't entirely straight-forward and I think if we do > move > > > forward with an API based on Column/Array we should consider > alternatives > > > as long as the necessary parsing code can be done in a small amount of > > code > > > (I'm personally against JSON for this, but can see the arguments

Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)

2019-10-05 Thread Jacques Nadeau
> > Is it possible for a single gRPC server to expose multiple services > through the same port (it sounds like it is)? It would be a good idea > to do similar refactoring in C++ so that Flight RPC endpoints can be > provided alongside some other non-Flight endpoints in the same gRPC > server >

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Jacques Nadeau
s with small code size > > https://github.com/nanopb/nanopb > > Let me know if this makes more sense. > > I think it's important to communicate clearly about this primarily for > the benefit of the outside world which can confuse easily as we have > observed over the last f

Re: [DISCUSS] C-level in-process array protocol

2019-10-01 Thread Jacques Nadeau
I disagree with this statement: - the IPC format is meant for serialization while the C data protocol is meants for in-memory communication, so different concerns apply If that is how the a particular implementation presents it, that is a weaknesses of the implementation, not the format. The

Re: [DISCUSS][Java] Reduce the range of synchronized block when releasing an ArrowBuf

2019-09-29 Thread Jacques Nadeau
For others that don't realize, the discussion of this is happening on the pull request here: https://github.com/apache/arrow/pull/5526 On Fri, Sep 27, 2019 at 4:52 AM Fan Liya wrote: > Dear all, > > When releasing an ArrowBuf, we will run the following piece of code: > > private int

Re: [DISCUSS] C-level in-process array protocol

2019-09-29 Thread Jacques Nadeau
On Sun, Sep 29, 2019 at 12:59 AM Antoine Pitrou wrote: > > Le 29/09/2019 à 06:10, Jacques Nadeau a écrit : > > * No dependency on Flatbuffers. > > * No buffer reassembly (data is already exposed in logical Arrow format). > > * Zero-copy by design. > > * Ea

Re: [DISCUSS] C-level in-process array protocol

2019-09-28 Thread Jacques Nadeau
]; buffers: [Buffer]; } On Sat, Sep 28, 2019 at 9:02 PM Jacques Nadeau wrote: > I'm not clear on why we need to introduce something beyond what > flatbuffers already provides. Can someone explain that to me? I'm not > really a fan of introducing a second representation of the same d

  1   2   3   4   >