Re: [DISCUSS] Proposal to expand Arrow Communications

2024-02-07 Thread Jean-Baptiste Onofré
Hi Matt

Thanks for sharing. It looks interesting and I like the idea.

Let me review the document and eventually add comments.

Thanks !
Regards
JB

On Sat, Feb 3, 2024 at 12:22 AM Matt Topol  wrote:
>
> Hey all,
>
> In my current work I've been experimenting and playing around with
> utilizing Arrow and non-cpu memory data. While the creation of the
> ArrowDeviceArray struct and the enhancements to the Arrow library Device
> abstractions were necessary, there is also a need to extend the
> communications specs we utilize, i.e. Flight.
>
> Currently there is no real way to utilize Arrow Flight with shared memory
> or with non-CPU memory (without an expensive Device -> Host copy first). To
> this end I've done a bunch of research and toying around and came up with a
> protocol to propose and a reference implementation using UCX[1]. Attached
> to the proposal is also a couple extensions for Flight itself to make it
> easier for users to still use Flight for metadata / dataset information and
> then point consumers elsewhere to actually retrieve the data. The idea here
> is that this would be a new specification for how to transport Arrow data
> across these high-performance transports such as UCX / libfabric / shared
> memory / etc. We wouldn't necessarily expose / directly add implementations
> of the spec to the Arrow libraries, just provide reference/example
> implementations.
>
> I've written the proposal up on a google doc[2] that everyone should be
> able to comment on. Once we get some community discussion on there, if
> everyone is okay with it I'd like eventually do a vote on adopting this
> spec and if we do, I'll then make a PR to start adding it to the Arrow
> documentation, etc.
>
> Anyways, thank you everyone in advance for your feedback and comments!
>
> --Matt
>
> [1]: https://github.com/openucx/ucx/
> [2]:
> https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit?usp=sharing


Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-02-07 Thread Jean-Baptiste Onofré
Hi all,

FYI, I'm back on Arrow (sorry I have been busy with other stuff).

I'm working on the ODBC donation branch, I should have the PR ready by
the end of the week.

Sorry again for the delay, it's now one of my top priority ;)

Regards
JB

On Tue, Dec 12, 2023 at 9:45 PM Alina Li  wrote:
>
> David you bring a good point. Regarding the IP clearance for Timestream ODBC 
> driver, we are still looking to get the necessary paperwork from Amazon. 
> We're also considering using the Ignite ODBC Driver seed [1] as a replacement 
> to the Timestream seed if it shows that we're unable to obtain paperwork from 
> Amazon; we are still discussing this internally and will get back to the 
> community afterwards.
>
> Regarding paperwork for the Dremio code, thank you Laurent for offering your 
> help. Please do let us know if there's anything we can do to help as well.
>
> [1]: https://github.com/apache/ignite/tree/master/modules/platforms/cpp
> 
> From: Laurent Goujon 
> Sent: Friday, December 8, 2023 11:01 AM
> To: dev@arrow.apache.org 
> Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for 
> Flight SQL ODBC driver
>
> Am I reading the ticket correctly that this is also about importing some of
> the Dremio code into Arrow project (namely
> https://github.com/dremio/flightsql-odbc/). If it is the case, let me check
> how my company can provide the documentation for the project?
>
> On Fri, Dec 8, 2023 at 8:41 AM David Li  wrote:
>
> > Thanks for the clarification. That does sound like a nontrivial amount of
> > code.
> >
> > My worry is that we might not be able to get all the paperwork necessary
> > from Amazon/Amazon contributors for the Timestream part. The
> > document/guidelines are here [1]. Does that look doable from your end?
> >
> > [1]: https://incubator.apache.org/ip-clearance/
> >
> > On Thu, Dec 7, 2023, at 14:30, Alina Li wrote:
> > > Hi David. To be one the safer side, I suggest going through IP
> > > clearance for [3] the Timestream ODBC driver project, and more code
> > > than entry_points.cpp will be used. We have initially plan to use the
> > > Timestream's entry points code, but it includes more than just
> > > entry_points.cpp (code such as [5] odbc.cpp, [6] odbc.h and some other
> > > files are part of the entry points), and besides the entry points,
> > > we're planning to use Timestream's installers and DSN window as well.
> > > Sorry for the confusion.
> > >
> > > [5]:
> > >
> > https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/odbc.cpp
> > > [6]:
> > >
> > https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/include/timestream/odbc.h
> > > 
> > > From: David Li 
> > > Sent: Wednesday, December 6, 2023 6:09 AM
> > > To: dev@arrow.apache.org 
> > > Subject: Re: [Discussion][C++][FlightRPC] What stage to submit a PR for
> > > Flight SQL ODBC driver
> > >
> > > Thanks for the update, Alina. This sounds good, my only question for
> > > the broader community is whether there is enough imported code that we
> > > should go through the IP clearance process [1]. It's never been clear
> > > to me what exactly the threshold for this is. flightsql-odbc [2] is
> > > already quite large on its own and probably we should go through
> > > clearance? It's not clear to me how much of the Timestream project [3]
> > > would be involved here, if you mean literally only entry_points.cpp [4]
> > > (that's probably OK without clearance?) or more code than that.
> > >
> > > [1]: https://incubator.apache.org/ip-clearance/
> > > [2]: https://github.com/dremio/flightsql-odbc
> > > [3]: https://github.com/awslabs/amazon-timestream-odbc-driver
> > > [4]:
> > >
> > https://github.com/awslabs/amazon-timestream-odbc-driver/blob/main/src/odbc/src/entry_points.cpp
> > >
> > > On Tue, Dec 5, 2023, at 18:25, Alina Li wrote:
> > >> Hi community,
> > >>
> > >> I wanted to start a discussion regarding the development of Flight SQL
> > >> ODBC driver. Regarding the seed usage to my previous email, our initial
> > >> plan is that flightsql-odbc will be mostly used as-is other than
> > >> changes to conforming to Arrow coding guidelines, and for Amazon
> > >> Timestream driver, only its ODBC function entry code will be used and
> > >> adapted to call into flightsql-odbc classes. Please let me know if
> > >> there are any concerns around this.
> > >>
> > >> And from my discussion with David Li at
> > >> GH-30622, my
> > >> understanding is that the PR submission should be as early as possible.
> > >>  We plan to send out a PR that adds Timestream ODBC driver and
> > >> flightsql-odbc seeds into Arrow if there's no concerns. The seed
> > >> drivers might not be able to compile, but the community would then be
> > >> able to start the IP scanning process.
> > >>
> > >> Your feedback would be appreciated,
> > >>
> > >> Alina Li
> > >>
> > >> __

Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-07 Thread Jacob Wujciak-Jens
Congrats 🎉

Raúl Cumplido  schrieb am Mi., 7. Feb. 2024, 14:02:

> Congratulations Jeffrey!
>
> El mié, 7 feb 2024 a las 14:00, Andrew Lamb ()
> escribió:
> >
> > Congratulations Jeffrey! Well deserved!
> >
> > On Tue, Feb 6, 2024 at 1:30 PM Raphael Taylor-Davies
> >  wrote:
> >
> > > On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> > > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > > and thank you for your contributions!
> > >
> > > Raphael Taylor-Davies
> > >
> > >
>


Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-07 Thread Raúl Cumplido
Congratulations Jeffrey!

El mié, 7 feb 2024 a las 14:00, Andrew Lamb () escribió:
>
> Congratulations Jeffrey! Well deserved!
>
> On Tue, Feb 6, 2024 at 1:30 PM Raphael Taylor-Davies
>  wrote:
>
> > On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > and thank you for your contributions!
> >
> > Raphael Taylor-Davies
> >
> >


Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-07 Thread Andrew Lamb
Congratulations Jeffrey! Well deserved!

On Tue, Feb 6, 2024 at 1:30 PM Raphael Taylor-Davies
 wrote:

> On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> accepted an invitation to become a committer on Apache Arrow. Welcome,
> and thank you for your contributions!
>
> Raphael Taylor-Davies
>
>


Re: [DISCUSS] Proposal to expand Arrow Communications

2024-02-07 Thread Antoine Pitrou



I think we should find a proper descriptive name for the 
"high-performance protocol", because "high-performance" is vague and 
context-dependent, and also spreads unnecessary confusion about existing 
alternatives such as regular Arrow IPC.


I would for example propose "Dissociated Arrow IPC" to stress the idea 
that metadata and data can be on separate transports.



Le 03/02/2024 à 00:22, Matt Topol a écrit :

Hey all,

In my current work I've been experimenting and playing around with
utilizing Arrow and non-cpu memory data. While the creation of the
ArrowDeviceArray struct and the enhancements to the Arrow library Device
abstractions were necessary, there is also a need to extend the
communications specs we utilize, i.e. Flight.

Currently there is no real way to utilize Arrow Flight with shared memory
or with non-CPU memory (without an expensive Device -> Host copy first). To
this end I've done a bunch of research and toying around and came up with a
protocol to propose and a reference implementation using UCX[1]. Attached
to the proposal is also a couple extensions for Flight itself to make it
easier for users to still use Flight for metadata / dataset information and
then point consumers elsewhere to actually retrieve the data. The idea here
is that this would be a new specification for how to transport Arrow data
across these high-performance transports such as UCX / libfabric / shared
memory / etc. We wouldn't necessarily expose / directly add implementations
of the spec to the Arrow libraries, just provide reference/example
implementations.

I've written the proposal up on a google doc[2] that everyone should be
able to comment on. Once we get some community discussion on there, if
everyone is okay with it I'd like eventually do a vote on adopting this
spec and if we do, I'll then make a PR to start adding it to the Arrow
documentation, etc.

Anyways, thank you everyone in advance for your feedback and comments!

--Matt

[1]: https://github.com/openucx/ucx/
[2]:
https://docs.google.com/document/d/1zHbnyK1r6KHpMOtEdIg1EZKNzHx-MVgUMOzB87GuXyk/edit?usp=sharing