subject:"\[DISCUSS\] Developing an \"Arrow Compute IR \[Intermediate Representation\]\" to decouple language front ends from Arrow\-native compute engines"

RE: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-10-17 Thread Yang, Binwei

ptimize it. The inputs to python is 1) data source or shuffled data, 2) the query plan. Thanks Binwei -Original Message- From: Jacques Nadeau Sent: Wednesday, September 8, 2021 07:06 To: dev Subject: Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]"

Substrait compute IR initiative [was Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines]

2021-09-14 Thread Wes McKinney

Renaming the subject to increase visibility. As we've dug deeper into this topic over the last 5-6 weeks, there have been several learnings/observations: * There are projects beyond Arrow, and which do not use Arrow at all, which could make use of portable "compute IR". This speaks to a need to p

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-07 Thread Jacques Nadeau

As Phillip mentioned, I think there is something powerful in producing a standard serialized representation of compute operations beyond just Arrow and I'd really like to create a broader community around it. This has been something I had been independently thinking about for the last several month

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-01 Thread Phillip Cloud

Hey everyone, As many of you know, the compute IR project has a lot of interested parties and has generated a lot of feedback. In light of some of the feedback we’ve received, we want to stress that the specification is intended to have input from many diverse points of view and that we welcome fo

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-30 Thread Weston Pace

My (incredibly naive) interpretation is that there are three problems to tackle. 1) How do you represent a graph and relational operators (join, union, groupby, etc.) - The PR appears to be addressing this question fairly well 2) How does a frontend query a backend to know what UDFs are supported

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-30 Thread Phillip Cloud

Hey everyone, There's some interesting discussion around types and where their location is in the current PR [1] (and in fact whether to store them at all). It would be great to get some community feedback on this [2] part of the PR in particular, because the choice of whether to store types at a

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-26 Thread Micah Kornfield

As an FYI, Iceberg is also considering an IR in relation to view support [1]. I chimed in and pointed them to this thread and Wes's doc. Phillip and Jacques chimed in there as well. [1] https://mail-archives.apache.org/mod_mbox/iceberg-dev/202108.mbox/%3CCAKRVfm6h6WxQtp5fj8Yj8XWR1wFe8VohOkPuoZZG

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-26 Thread Phillip Cloud

Thanks for the feedback Jacques, very helpful. In the latest version of the PR, I've tried to incorporate nearly all of these points. - I've incorporated most of what you had for dereferencing operations into the PR, and gotten rid of schemas except on Read/Write relations. - With respect to prope

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-23 Thread Jacques Nadeau

In a lucky turn of events, Phillip actually turned out to be in my neck of the woods on Friday so we had a chance to sit down and discuss this. To help, I actually shared something I had been working on a few months ago independently (before this discussion started). For reference: Wes PR: https:/

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-17 Thread Phillip Cloud

On Tue, Aug 17, 2021 at 10:56 AM Wes McKinney wrote: > Looking at Ben's alternate PR [1], having an IR that leans heavily on > memory references to an out-of-band data sidecar seems like an > approach that would substantially ratchet up the implementation > complexity as producing the IR would th

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-17 Thread Benjamin Kietzman

WRT out-of-band data: if encapsulation is the priority over reuse of Buffer etc that's straightforward to accommodate by replacement with an alternative to Buffer. I have made that change to my PR in https://github.com/apache/arrow/pull/10934/commits/ebd4fc665579dd6bba29c5c4731c2350ea0fa70a > as m

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-17 Thread Wes McKinney

Looking at Ben's alternate PR [1], having an IR that leans heavily on memory references to an out-of-band data sidecar seems like an approach that would substantially ratchet up the implementation complexity as producing the IR would then have the level of complexity of producing the Arrow IPC form

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-16 Thread Arun Sharma

Thank you for putting together this proposal. Very exciting development. I left some comments in the RFC doc, summarized here as: * Flatbuffer is usable as a serialization agnostic IDL ( https://adsharma.github.io/flattools/) * serde library + msgpack is a worthy candidate to consider for serializ

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-13 Thread Phillip Cloud

Hey all, Just wanted to give an update on the effort here. Ben Kietzman has created an alternative proposal to the initial design [1]. It largely overlaps with the original, but differs in a few important ways: * A big focus of the design is on flexibility, allowing producers, consumers and ulti

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-12 Thread Julian Hyde

> Wes wrote: > > Supporting this kind of intra-application engine > heterogeneity is one of the motivations for the project. +1 The data format is the natural interface between tasks. (Defining “task” here as “something that is programmed using the IR”.) That is Arrow’s strength. So I think the

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-12 Thread Wes McKinney

On Wed, Aug 11, 2021 at 11:22 PM Phillip Cloud wrote: > > On Wed, Aug 11, 2021 at 4:48 PM Jorge Cardoso Leitão < > jorgecarlei...@gmail.com> wrote: > > > Couple of questions > > > > 1. Is the goal that IRs have equal semantics, i.e. given (IR,data), the > > operation "(IR,data) - engine -> result"

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-11 Thread Phillip Cloud

On Wed, Aug 11, 2021 at 4:48 PM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Couple of questions > > 1. Is the goal that IRs have equal semantics, i.e. given (IR,data), the > operation "(IR,data) - engine -> result" MUST be the same for all "engine"? > I think that might be a non-sta

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-11 Thread Jorge Cardoso Leitão

Couple of questions 1. Is the goal that IRs have equal semantics, i.e. given (IR,data), the operation "(IR,data) - engine -> result" MUST be the same for all "engine"? 2. if yes, imo we may need to worry about: * a definition of equality that implementations agree on. * agreement over what the sem

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-11 Thread Phillip Cloud

Thanks Wes, Great to be back working on Arrow again and engaging with the community. I am really excited about this effort. I think there are a number of concerns I see as important to address in the compute IR proposal: 1. Requirement for output types. I think that so far there's been many rea

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-10 Thread Wes McKinney

Thank you for all the feedback and comments on the document. I'm on vacation this week, so I'm delayed responding to everything, but I will get to it as quickly as I can. I will be at VLDB in Copenhagen next week if anyone would like to chat in person about it, and we can relay the content of any d

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-10 Thread Dimitri Vorona

Hi Wes, cool initiative! Reminded me of "Building Advanced SQL Analytics From Low-Level Plan Operators" from SIGMOD 2021 ( http://db.in.tum.de/~kohn/papers/lolepops-sigmod21.pdf) which proposes a set of building block for advanced aggregation. Cheers, Dimitri. On Thu, Aug 5, 2021 at 7:59 PM Juli

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-05 Thread Julian Hyde

Wes, Thanks for this. I’ve added comments to the doc and to the PR. The biggest surprise is that this language does full relational operations. I was expecting that it would do fragments of the operations. Consider join. A distributed hybrid hash join needs to partition rows into output buffers

[DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-02 Thread Wes McKinney

hi folks, This idea came up in passing in the past -- given that there are multiple independent efforts to develop Arrow-native query engines (and surely many more to come), it seems like it would be valuable to have a way to enable user languages (like Java, Python, R, or Rust, for example) to co

RE: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Substrait compute IR initiative [was Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines]

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

[DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

23 matches

Site Navigation

Mail list logo

Footer information