Re: Apache Arrow adapter
Yes, it was a bit of a challenge to get working in the Linux and macOS development environments we've been using. This is why I temporarily checked in the jar, but this should certainly be removed before the PR is merged. -- Michael Mior mm...@apache.org Le sam. 10 avr. 2021 à 17:05, Julian Hyde a écrit : > > I've been trying to switch over to use the official Apache Arrow > Gandiva 3.0.0 jar at Maven central. (Which means we can remove the > 3.0.0-SNAPSHOT.jar that you had checked into arrow/libs.) That jar is > built for macOS, and is a little more tricky to get running than the > previous jar, which was built for Linux. I'll post to > https://issues.apache.org/jira/browse/ARROW-11135 as I discover > things. > > (Makes me glad we don't have any C++ code in Calcite. Making artifacts > that work on multiple operating systems seems to be really > challenging.) > > Julian > > On Sat, Apr 10, 2021 at 6:31 AM Michael Mior wrote: > > > > Thanks Julian! I really appreciate the help. I think beta would be > > accurate here but it would be great to have this pushed so people can > > start trying it out. > > > > -- > > Michael Mior > > mm...@apache.org > > > > Le ven. 9 avr. 2021 à 20:37, Julian Hyde a écrit : > > > > > > Yes, thanks to Michael and Karshit for their great work. > > > > > > I am reviewing now, and doing some fix up (e.g. lint, repositories) so > > > that we could get it into master as a "beta" component. I'll add > > > updates in https://issues.apache.org/jira/browse/CALCITE-2040. > > > > > > On Wed, Apr 7, 2021 at 9:37 PM Fan Liya wrote: > > > > > > > > Hi Michael, > > > > > > > > Thanks for sharing the great work. > > > > I believe it is important work for both communities. > > > > > > > > Best, > > > > Liya Fan > > > > > > > > > > > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior wrote: > > > > > > > > > Hi all, > > > > > > > > > > I wanted to share some work one of my (now former) students, Karshit > > > > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has > > > > > written an Arrow adapter that's able to perform filtering and > > > > > projections natively on Arrow data using Gandiva so these expressions > > > > > can be JITed using LLVM. The pull request[0] needs some cleanup, but > > > > > the code is in relatively good shape. > > > > > > > > > > Right now, the adapter only reads from files, but I think there are a > > > > > number of exciting extensions to this that are possible. For example, > > > > > Arrow has a client-server framework Flight which could be connected > > > > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on > > > > > this last year[1] although I'm not sure of the progress.) > > > > > > > > > > The biggest blocker on this is actually not the Calcite code, but the > > > > > availability of a suitably built Arrow dependency with Gandiva along > > > > > with the appropriate CI configuration. I opened a JIRA on the Arrow > > > > > project with some more details[2]. > > > > > > > > > > I'd love some thoughts on the approach and some help in pushing this > > > > > over the finish line. > > > > > > > > > > [0] https://github.com/apache/calcite/pull/2133 > > > > > [1] > > > > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e > > > > > [2] https://issues.apache.org/jira/browse/ARROW-11135 > > > > > -- > > > > > Michael Mior > > > > > mm...@apache.org > > > > >
Re: Apache Arrow adapter
I've been trying to switch over to use the official Apache Arrow Gandiva 3.0.0 jar at Maven central. (Which means we can remove the 3.0.0-SNAPSHOT.jar that you had checked into arrow/libs.) That jar is built for macOS, and is a little more tricky to get running than the previous jar, which was built for Linux. I'll post to https://issues.apache.org/jira/browse/ARROW-11135 as I discover things. (Makes me glad we don't have any C++ code in Calcite. Making artifacts that work on multiple operating systems seems to be really challenging.) Julian On Sat, Apr 10, 2021 at 6:31 AM Michael Mior wrote: > > Thanks Julian! I really appreciate the help. I think beta would be > accurate here but it would be great to have this pushed so people can > start trying it out. > > -- > Michael Mior > mm...@apache.org > > Le ven. 9 avr. 2021 à 20:37, Julian Hyde a écrit : > > > > Yes, thanks to Michael and Karshit for their great work. > > > > I am reviewing now, and doing some fix up (e.g. lint, repositories) so > > that we could get it into master as a "beta" component. I'll add > > updates in https://issues.apache.org/jira/browse/CALCITE-2040. > > > > On Wed, Apr 7, 2021 at 9:37 PM Fan Liya wrote: > > > > > > Hi Michael, > > > > > > Thanks for sharing the great work. > > > I believe it is important work for both communities. > > > > > > Best, > > > Liya Fan > > > > > > > > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior wrote: > > > > > > > Hi all, > > > > > > > > I wanted to share some work one of my (now former) students, Karshit > > > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has > > > > written an Arrow adapter that's able to perform filtering and > > > > projections natively on Arrow data using Gandiva so these expressions > > > > can be JITed using LLVM. The pull request[0] needs some cleanup, but > > > > the code is in relatively good shape. > > > > > > > > Right now, the adapter only reads from files, but I think there are a > > > > number of exciting extensions to this that are possible. For example, > > > > Arrow has a client-server framework Flight which could be connected > > > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on > > > > this last year[1] although I'm not sure of the progress.) > > > > > > > > The biggest blocker on this is actually not the Calcite code, but the > > > > availability of a suitably built Arrow dependency with Gandiva along > > > > with the appropriate CI configuration. I opened a JIRA on the Arrow > > > > project with some more details[2]. > > > > > > > > I'd love some thoughts on the approach and some help in pushing this > > > > over the finish line. > > > > > > > > [0] https://github.com/apache/calcite/pull/2133 > > > > [1] > > > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e > > > > [2] https://issues.apache.org/jira/browse/ARROW-11135 > > > > -- > > > > Michael Mior > > > > mm...@apache.org > > > >
Re: Apache Arrow adapter
Thanks Julian! I really appreciate the help. I think beta would be accurate here but it would be great to have this pushed so people can start trying it out. -- Michael Mior mm...@apache.org Le ven. 9 avr. 2021 à 20:37, Julian Hyde a écrit : > > Yes, thanks to Michael and Karshit for their great work. > > I am reviewing now, and doing some fix up (e.g. lint, repositories) so > that we could get it into master as a "beta" component. I'll add > updates in https://issues.apache.org/jira/browse/CALCITE-2040. > > On Wed, Apr 7, 2021 at 9:37 PM Fan Liya wrote: > > > > Hi Michael, > > > > Thanks for sharing the great work. > > I believe it is important work for both communities. > > > > Best, > > Liya Fan > > > > > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior wrote: > > > > > Hi all, > > > > > > I wanted to share some work one of my (now former) students, Karshit > > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has > > > written an Arrow adapter that's able to perform filtering and > > > projections natively on Arrow data using Gandiva so these expressions > > > can be JITed using LLVM. The pull request[0] needs some cleanup, but > > > the code is in relatively good shape. > > > > > > Right now, the adapter only reads from files, but I think there are a > > > number of exciting extensions to this that are possible. For example, > > > Arrow has a client-server framework Flight which could be connected > > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on > > > this last year[1] although I'm not sure of the progress.) > > > > > > The biggest blocker on this is actually not the Calcite code, but the > > > availability of a suitably built Arrow dependency with Gandiva along > > > with the appropriate CI configuration. I opened a JIRA on the Arrow > > > project with some more details[2]. > > > > > > I'd love some thoughts on the approach and some help in pushing this > > > over the finish line. > > > > > > [0] https://github.com/apache/calcite/pull/2133 > > > [1] > > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e > > > [2] https://issues.apache.org/jira/browse/ARROW-11135 > > > -- > > > Michael Mior > > > mm...@apache.org > > >
Re: Apache Arrow adapter
Yes, thanks to Michael and Karshit for their great work. I am reviewing now, and doing some fix up (e.g. lint, repositories) so that we could get it into master as a "beta" component. I'll add updates in https://issues.apache.org/jira/browse/CALCITE-2040. On Wed, Apr 7, 2021 at 9:37 PM Fan Liya wrote: > > Hi Michael, > > Thanks for sharing the great work. > I believe it is important work for both communities. > > Best, > Liya Fan > > > On Thu, Apr 8, 2021 at 3:30 AM Michael Mior wrote: > > > Hi all, > > > > I wanted to share some work one of my (now former) students, Karshit > > Shah, has done with integrating Apache Arrow into Calcite. Karshit has > > written an Arrow adapter that's able to perform filtering and > > projections natively on Arrow data using Gandiva so these expressions > > can be JITed using LLVM. The pull request[0] needs some cleanup, but > > the code is in relatively good shape. > > > > Right now, the adapter only reads from files, but I think there are a > > number of exciting extensions to this that are possible. For example, > > Arrow has a client-server framework Flight which could be connected > > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on > > this last year[1] although I'm not sure of the progress.) > > > > The biggest blocker on this is actually not the Calcite code, but the > > availability of a suitably built Arrow dependency with Gandiva along > > with the appropriate CI configuration. I opened a JIRA on the Arrow > > project with some more details[2]. > > > > I'd love some thoughts on the approach and some help in pushing this > > over the finish line. > > > > [0] https://github.com/apache/calcite/pull/2133 > > [1] > > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e > > [2] https://issues.apache.org/jira/browse/ARROW-11135 > > -- > > Michael Mior > > mm...@apache.org > >
Re: Apache Arrow adapter
Hi Michael, Thanks for sharing the great work. I believe it is important work for both communities. Best, Liya Fan On Thu, Apr 8, 2021 at 3:30 AM Michael Mior wrote: > Hi all, > > I wanted to share some work one of my (now former) students, Karshit > Shah, has done with integrating Apache Arrow into Calcite. Karshit has > written an Arrow adapter that's able to perform filtering and > projections natively on Arrow data using Gandiva so these expressions > can be JITed using LLVM. The pull request[0] needs some cleanup, but > the code is in relatively good shape. > > Right now, the adapter only reads from files, but I think there are a > number of exciting extensions to this that are possible. For example, > Arrow has a client-server framework Flight which could be connected > with Calcite, perhaps via Avatica. (Andy Grove was doing some work on > this last year[1] although I'm not sure of the progress.) > > The biggest blocker on this is actually not the Calcite code, but the > availability of a suitably built Arrow dependency with Gandiva along > with the appropriate CI configuration. I opened a JIRA on the Arrow > project with some more details[2]. > > I'd love some thoughts on the approach and some help in pushing this > over the finish line. > > [0] https://github.com/apache/calcite/pull/2133 > [1] > https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e > [2] https://issues.apache.org/jira/browse/ARROW-11135 > -- > Michael Mior > mm...@apache.org >
Apache Arrow adapter
Hi all, I wanted to share some work one of my (now former) students, Karshit Shah, has done with integrating Apache Arrow into Calcite. Karshit has written an Arrow adapter that's able to perform filtering and projections natively on Arrow data using Gandiva so these expressions can be JITed using LLVM. The pull request[0] needs some cleanup, but the code is in relatively good shape. Right now, the adapter only reads from files, but I think there are a number of exciting extensions to this that are possible. For example, Arrow has a client-server framework Flight which could be connected with Calcite, perhaps via Avatica. (Andy Grove was doing some work on this last year[1] although I'm not sure of the progress.) The biggest blocker on this is actually not the Calcite code, but the availability of a suitably built Arrow dependency with Gandiva along with the appropriate CI configuration. I opened a JIRA on the Arrow project with some more details[2]. I'd love some thoughts on the approach and some help in pushing this over the finish line. [0] https://github.com/apache/calcite/pull/2133 [1] https://mail-archives.apache.org/mod_mbox/calcite-dev/202002.mbox/%3cCAJEf=X5xvXLQpJkX_VjJk=TnNRwT52v0=p28sczmid1tyce...@mail.gmail.com%3e [2] https://issues.apache.org/jira/browse/ARROW-11135 -- Michael Mior mm...@apache.org