Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-03-06 Thread Andrew Lamb
The blog post is now live at https://arrow.apache.org/blog/2024/03/06/comet-donation/ On Thu, Feb 29, 2024 at 9:32 AM Andrew Lamb wrote: > In case anyone is interested, we are working on a blog post related to > this donation here [1]. All feedback more than welcome. > > [1] https://github.com/a

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-29 Thread Andrew Lamb
In case anyone is interested, we are working on a blog post related to this donation here [1]. All feedback more than welcome. [1] https://github.com/apache/arrow-site/pull/479 On Mon, Feb 12, 2024 at 1:37 AM Chao Sun wrote: > > Thank you all for the great support and interest on this project! >

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Chao Sun
Thank you all for the great support and interest on this project! On Sun, Feb 11, 2024 at 12:51 PM Wes McKinney wrote: > > Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in > this way and to bring the work under the ASF umbrella. > > On Sun, Feb 11, 2024 at 5:02 AM Andrew L

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Wes McKinney
Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in this way and to bring the work under the ASF umbrella. On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb wrote: > As a follow up here the acceptance vote [1] has passed, the IP Clearance > Process is complete [2] and the code PR

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Andrew Lamb
As a follow up here the acceptance vote [1] has passed, the IP Clearance Process is complete [2] and the code PR is merged[3]! It is a very exciting time! Congratulations to all involved Andrew [1]: https://lists.apache.org/thread/cyfyb96sssmpr73hhm7vh8jcdjbz8rsp [2]: https://github.com/apache/a

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-24 Thread Jacques Nadeau
For those that are interested wrt lang types/lines... Language files blankcomment code Rust

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-24 Thread Chao Sun
Thanks Jacques and everyone here for the feedback! We just created a PR https://github.com/apache/arrow-datafusion-comet/pull/1 for the donation vote and IP clearance. Please take a look there and provide your valuable comments. Best, Chao On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau wrote: >

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-18 Thread Jacques Nadeau
Yes, that was roughly what I was requesting (I was suggesting a single PR with many commits that would be merged with the history). It's hard to provide a more concrete opinion on this without seeing the quantity and complexity of the code. If it's 5,000 lines of code, it probably doesn't matter.

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Chao Sun
Hi Jacques, Do you mean instead of a single PR, we modify (e.g., git commit amend) all the commits that we have internally to remove any sensitive information, and open PRs for them against the above repo? I understand this will help readability and maintenance of the code, but it will be a lot o

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
Thanks for the quick response Chao. My experience on these things is that maintaining commit history for large codebases can be invaluable for tracking down issues. (Hey, why is this code written this way-- oh, it was part of x patch that was trying to achieve y). In the past, I've used git commi

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Chao Sun
Hi Andy and Jacques, Thanks for setting the repo up. Yes we are working on cleaning up the internal repo and preparing to open a PR in the next few days. It's a bit difficult to retain the original commit history in the PR though since some of them contain internal info which we need to remove up

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
Hey Chao, it would be great for you to share the code some place with commit history. (PR to the repo that Andy made or something else.) On Mon, Jan 15, 2024 at 7:38 AM Andy Grove wrote: > Hi Chao, > > I have created https://github.com/apache/arrow-datafusion-comet and you > should be able to cr

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-15 Thread Andy Grove
Hi Chao, I have created https://github.com/apache/arrow-datafusion-comet and you should be able to create a PR against the repo. Thanks, Andy. Andy. On Fri, Jan 12, 2024 at 3:45 PM Chao Sun wrote: > Thanks all for the positive support! > > Andy, we plan to name the project Comet (BTW if you

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-12 Thread Chao Sun
Thanks all for the positive support! Andy, we plan to name the project Comet (BTW if you have better suggestions please let us know). Could you help to create a repo named arrow-datafusion-comet or arrow-comet? We'll clean up our internal repo and prepare for the donation in the next few days. Tha

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-12 Thread Andy Grove
I think the next step here would be to create a new repo so that Chao can create a PR for the contribution, and then we can proceed to a vote. Chao - do you have a proposal for the name of the project? Given that this is being donated to Apache Arrow, the repo name will start with "arrow-". Also,

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread Albert
Like Andrew Lamb mentioned, blaze-rs has similar goals, I'd really be interested to know some comparisons when the donations are made. All in all, I look forward to the new native project for spark acceleration. On Thu, Jan 11, 2024 at 9:50 PM Andrew Lamb wrote: > I am very supportive of this do

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread Micah Kornfield
It sounds like there is likely enough support for this to move forward, I'd guess next steps are to work on the donation process/vote. Probably someone more involved with DataFusion should help drive this effort? On Thu, Jan 11, 2024 at 12:55 PM L. C. Hsieh wrote: > Spark as a widely used compu

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread L. C. Hsieh
Spark as a widely used computation engine in industry, has its momentum from developers and users. I believe that the integration with DataFusion, not only can help drive Spark through next level high performance with a new native execution engine, but also can attract more developer attention int

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread Parth Chandra
Full disclosure: I worked on the original value vector implementation that became Apache arrow and currently work with Chao, et al on the native engine that is being discussed. I believe that integration of DataFusion with Spark will drive both development and user interest in arrow-rs and DataFusi

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-11 Thread Andrew Lamb
I am very supportive of this donation. I know of at least one other DataFusion-based project, blaze-rs[1], which has the same design goal and bringing this project into the ASF may help consolidate these efforts As Andy said, I believe it was very valuable to have a major consumer project (e.g. Da

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-10 Thread Andy Grove
Hi Chao, This sounds like a really interesting project. I am interested in seeing how it compares to Spark RAPIDS (the project that I work on at NVIDIA) and Intel's Gluten project (that works with Velox). I can see the following benefits of having this project being under Apache Arrow governance:

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-10 Thread Chao Sun
Thanks Micah for the quick response. > Would Spark itself not be a reasonable place for this work? We considered Spark as well but decided it is a better place to be under Arrow given the project itself heavily tied with DataFusion. A lot of the work in this project is to convert Spark physical p

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-10 Thread Micah Kornfield
Hi Chao, Very cool. I think this is something that a lot of people are interested in. I think the main questions I have are: 1. Would Spark itself not be a reasonable place for this work? 2. Do you anticipate this would move with DataFusion to its own top-level project [1] if that happens or sta