Re: [DISCUSS] C Data Interface, take 2
Thanks Jacques. I agree that none of the ways forward on this problem are wholly satisfactory. We should encourage users of this C API to prefer emitting byte-aligned / 0-offset in line with the IPC spec wherever possible. It will be interesting to see after a period of time how downstream projects are able to leverage this interface as part of their overall Arrow adoption. On Tue, Jan 21, 2020 at 4:05 PM Jacques Nadeau wrote: > > Upon further reflection (and as I've noted on the PR), I think merging the > ABI as a general feature of Arrow is preferable to making this be a > subinterface of the C++ part of the project. While the offset field is > awkward given its absence from the IPC spec, it's better to avoid > fragmenting the community based on that fields absence or existence. > > Thanks for the lively discussion Antoine, Wes and others! > > J > > On Mon, Jan 20, 2020 at 11:09 AM Wes McKinney wrote: > > > Independent of the particulars of the discussion, the C++ project > > needs to be free to create a C API for itself. If you want to try to > > block the C++ contributors from doing this we may be barreling toward > > a governance crisis in the project. I'm stepping back from this > > discussion for a time now to allow others to catch up on the > > discussion and to weigh in as needed > > > > On Mon, Jan 20, 2020 at 1:00 PM Jacques Nadeau wrote: > > > > > > I don't see this as an endogenous concern of the C++ project. I > > appreciate > > > your goal with saying so but I think this has broader ramifications > > around > > > fragmentation of the project. > > > > > > The core challenge that we're dealing with is we introduced foundational > > > concepts in some implementations that go beyond the spec and then > > provided > > > useful features based on them (in this case, the offset concept). > > Ideally, > > > those concepts are first introduced at the specification level so there > > > aren't inconsistent viewpoints of what Arrow is (which I believe is what > > is > > > happening here). Having a cross-language specification for in-memory > > > processing is a new concept so it isn't surprising that we're going to > > > learn these things along the way. > > > > > > Without this, we create a slippery slope of fragmentation between the > > > specifications and the implementations. I understand that the toothpaste > > is > > > out of the tube in this particular case. We can respond in two ways: stop > > > the slip or continue to slide down the slope. I'm inclined to stop the > > slip. > > > > > > As I said on the GitHub, I'm struggling with how much of this should be > > > solved in the project. I'm going to pause a bit on responding to reflect > > > further about this as well to reduce the likelihood that this devolves > > into > > > a flame war (which is always a risk with complex issues such as these). > > > > > > > > > > > > On Mon, Jan 20, 2020 at 9:59 AM Wes McKinney > > wrote: > > > > > > > hi Jacques, > > > > > > > > Taking a step back from the discussion, the original problem statement > > > > was to enable third party projects to produce the data structure used > > > > by C++ Array classes in C without depending on the C++ code > > > > > > > > That's the ArrayData class here > > > > > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 > > > > > > > > It is important for us simplify the programming interface with the C++ > > > > library, so I think that we should address this as an endogenous > > > > concern of the C++ project, namely providing a "C API for the C++ > > > > project". The C API for the C++ library needs to mirror what's in the > > > > C++ project (i.e. the ArrayData data structure). We should not > > > > advertise this as being a part of the project specification. > > > > > > > > - Wes > > > > > > > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > > > > wrote: > > > > > > > > > > As I noted on the pull request, I think fundamentally this work is at > > > > odds > > > > > with the Arrow specification and being used to introduce a shadow > > > > > specification. > > > > > > > > > > I don't think our intentions about how people should use something > > really > > > > > influence how people will actually use or perceive it. They'll just > > find > > > > > supported Arrow code and expose things based on it and call it "Arrow > > > > > compatible". In other words, I don't think people in the outside > > world > > > > will > > > > > be able to perceive the distinction between "Arrow C++ compatible" > > and > > > > > "Arrow compatible". > > > > > > > > > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney > > > > wrote: > > > > > > > > > > > hi folks, > > > > > > > > > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > > > > > that I wanted to surface here on the mailing list. > > > > > > > > > > > > It seems that to reach consensus for a C interface that is > > intended to > > > > > > be broadly used by multiple programming
Re: [DISCUSS] C Data Interface, take 2
Upon further reflection (and as I've noted on the PR), I think merging the ABI as a general feature of Arrow is preferable to making this be a subinterface of the C++ part of the project. While the offset field is awkward given its absence from the IPC spec, it's better to avoid fragmenting the community based on that fields absence or existence. Thanks for the lively discussion Antoine, Wes and others! J On Mon, Jan 20, 2020 at 11:09 AM Wes McKinney wrote: > Independent of the particulars of the discussion, the C++ project > needs to be free to create a C API for itself. If you want to try to > block the C++ contributors from doing this we may be barreling toward > a governance crisis in the project. I'm stepping back from this > discussion for a time now to allow others to catch up on the > discussion and to weigh in as needed > > On Mon, Jan 20, 2020 at 1:00 PM Jacques Nadeau wrote: > > > > I don't see this as an endogenous concern of the C++ project. I > appreciate > > your goal with saying so but I think this has broader ramifications > around > > fragmentation of the project. > > > > The core challenge that we're dealing with is we introduced foundational > > concepts in some implementations that go beyond the spec and then > provided > > useful features based on them (in this case, the offset concept). > Ideally, > > those concepts are first introduced at the specification level so there > > aren't inconsistent viewpoints of what Arrow is (which I believe is what > is > > happening here). Having a cross-language specification for in-memory > > processing is a new concept so it isn't surprising that we're going to > > learn these things along the way. > > > > Without this, we create a slippery slope of fragmentation between the > > specifications and the implementations. I understand that the toothpaste > is > > out of the tube in this particular case. We can respond in two ways: stop > > the slip or continue to slide down the slope. I'm inclined to stop the > slip. > > > > As I said on the GitHub, I'm struggling with how much of this should be > > solved in the project. I'm going to pause a bit on responding to reflect > > further about this as well to reduce the likelihood that this devolves > into > > a flame war (which is always a risk with complex issues such as these). > > > > > > > > On Mon, Jan 20, 2020 at 9:59 AM Wes McKinney > wrote: > > > > > hi Jacques, > > > > > > Taking a step back from the discussion, the original problem statement > > > was to enable third party projects to produce the data structure used > > > by C++ Array classes in C without depending on the C++ code > > > > > > That's the ArrayData class here > > > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 > > > > > > It is important for us simplify the programming interface with the C++ > > > library, so I think that we should address this as an endogenous > > > concern of the C++ project, namely providing a "C API for the C++ > > > project". The C API for the C++ library needs to mirror what's in the > > > C++ project (i.e. the ArrayData data structure). We should not > > > advertise this as being a part of the project specification. > > > > > > - Wes > > > > > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > > > wrote: > > > > > > > > As I noted on the pull request, I think fundamentally this work is at > > > odds > > > > with the Arrow specification and being used to introduce a shadow > > > > specification. > > > > > > > > I don't think our intentions about how people should use something > really > > > > influence how people will actually use or perceive it. They'll just > find > > > > supported Arrow code and expose things based on it and call it "Arrow > > > > compatible". In other words, I don't think people in the outside > world > > > will > > > > be able to perceive the distinction between "Arrow C++ compatible" > and > > > > "Arrow compatible". > > > > > > > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney > > > wrote: > > > > > > > > > hi folks, > > > > > > > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > > > > that I wanted to surface here on the mailing list. > > > > > > > > > > It seems that to reach consensus for a C interface that is > intended to > > > > > be broadly used by multiple programming languages, we may make some > > > > > compromises that harm or outright undermine some of the use cases > that > > > > > motivated the creation of the C interface in the first place. That > > > > > does not seem good. I wonder if it would be more productive to > reduce > > > > > the scope of the project to merely providing a C-header-based data > > > > > interface to the C++ project only. That was the original problem > > > > > statement and it seems in attempting to make it useful beyond C++ > has > > > > > made it difficult to reach consensus. > > > > > > > > > > Thanks > > > > > Wes > > > > > > > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques
Re: [DISCUSS] C Data Interface, take 2
Independent of the particulars of the discussion, the C++ project needs to be free to create a C API for itself. If you want to try to block the C++ contributors from doing this we may be barreling toward a governance crisis in the project. I'm stepping back from this discussion for a time now to allow others to catch up on the discussion and to weigh in as needed On Mon, Jan 20, 2020 at 1:00 PM Jacques Nadeau wrote: > > I don't see this as an endogenous concern of the C++ project. I appreciate > your goal with saying so but I think this has broader ramifications around > fragmentation of the project. > > The core challenge that we're dealing with is we introduced foundational > concepts in some implementations that go beyond the spec and then provided > useful features based on them (in this case, the offset concept). Ideally, > those concepts are first introduced at the specification level so there > aren't inconsistent viewpoints of what Arrow is (which I believe is what is > happening here). Having a cross-language specification for in-memory > processing is a new concept so it isn't surprising that we're going to > learn these things along the way. > > Without this, we create a slippery slope of fragmentation between the > specifications and the implementations. I understand that the toothpaste is > out of the tube in this particular case. We can respond in two ways: stop > the slip or continue to slide down the slope. I'm inclined to stop the slip. > > As I said on the GitHub, I'm struggling with how much of this should be > solved in the project. I'm going to pause a bit on responding to reflect > further about this as well to reduce the likelihood that this devolves into > a flame war (which is always a risk with complex issues such as these). > > > > On Mon, Jan 20, 2020 at 9:59 AM Wes McKinney wrote: > > > hi Jacques, > > > > Taking a step back from the discussion, the original problem statement > > was to enable third party projects to produce the data structure used > > by C++ Array classes in C without depending on the C++ code > > > > That's the ArrayData class here > > > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 > > > > It is important for us simplify the programming interface with the C++ > > library, so I think that we should address this as an endogenous > > concern of the C++ project, namely providing a "C API for the C++ > > project". The C API for the C++ library needs to mirror what's in the > > C++ project (i.e. the ArrayData data structure). We should not > > advertise this as being a part of the project specification. > > > > - Wes > > > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > > wrote: > > > > > > As I noted on the pull request, I think fundamentally this work is at > > odds > > > with the Arrow specification and being used to introduce a shadow > > > specification. > > > > > > I don't think our intentions about how people should use something really > > > influence how people will actually use or perceive it. They'll just find > > > supported Arrow code and expose things based on it and call it "Arrow > > > compatible". In other words, I don't think people in the outside world > > will > > > be able to perceive the distinction between "Arrow C++ compatible" and > > > "Arrow compatible". > > > > > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney > > wrote: > > > > > > > hi folks, > > > > > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > > > that I wanted to surface here on the mailing list. > > > > > > > > It seems that to reach consensus for a C interface that is intended to > > > > be broadly used by multiple programming languages, we may make some > > > > compromises that harm or outright undermine some of the use cases that > > > > motivated the creation of the C interface in the first place. That > > > > does not seem good. I wonder if it would be more productive to reduce > > > > the scope of the project to merely providing a C-header-based data > > > > interface to the C++ project only. That was the original problem > > > > statement and it seems in attempting to make it useful beyond C++ has > > > > made it difficult to reach consensus. > > > > > > > > Thanks > > > > Wes > > > > > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau > > wrote: > > > > > > > > > > Thanks for addressing my comments. I'm actively reviewing the > > proposal. > > > > It > > > > > is taking me more time than I would like given the time of the year > > but I > > > > > want to make sure that you know that I'm looking at it and hope to > > > > provide > > > > > additional feedback beyond that which I've provided thus far on the > > PR. > > > > > Will update soon. > > > > > > > > > > Thanks for your patience. > > > > > > > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > Following Jacques's feedback, I drafted a new version of the C
Re: [DISCUSS] C Data Interface, take 2
I don't see this as an endogenous concern of the C++ project. I appreciate your goal with saying so but I think this has broader ramifications around fragmentation of the project. The core challenge that we're dealing with is we introduced foundational concepts in some implementations that go beyond the spec and then provided useful features based on them (in this case, the offset concept). Ideally, those concepts are first introduced at the specification level so there aren't inconsistent viewpoints of what Arrow is (which I believe is what is happening here). Having a cross-language specification for in-memory processing is a new concept so it isn't surprising that we're going to learn these things along the way. Without this, we create a slippery slope of fragmentation between the specifications and the implementations. I understand that the toothpaste is out of the tube in this particular case. We can respond in two ways: stop the slip or continue to slide down the slope. I'm inclined to stop the slip. As I said on the GitHub, I'm struggling with how much of this should be solved in the project. I'm going to pause a bit on responding to reflect further about this as well to reduce the likelihood that this devolves into a flame war (which is always a risk with complex issues such as these). On Mon, Jan 20, 2020 at 9:59 AM Wes McKinney wrote: > hi Jacques, > > Taking a step back from the discussion, the original problem statement > was to enable third party projects to produce the data structure used > by C++ Array classes in C without depending on the C++ code > > That's the ArrayData class here > > https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 > > It is important for us simplify the programming interface with the C++ > library, so I think that we should address this as an endogenous > concern of the C++ project, namely providing a "C API for the C++ > project". The C API for the C++ library needs to mirror what's in the > C++ project (i.e. the ArrayData data structure). We should not > advertise this as being a part of the project specification. > > - Wes > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > wrote: > > > > As I noted on the pull request, I think fundamentally this work is at > odds > > with the Arrow specification and being used to introduce a shadow > > specification. > > > > I don't think our intentions about how people should use something really > > influence how people will actually use or perceive it. They'll just find > > supported Arrow code and expose things based on it and call it "Arrow > > compatible". In other words, I don't think people in the outside world > will > > be able to perceive the distinction between "Arrow C++ compatible" and > > "Arrow compatible". > > > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney > wrote: > > > > > hi folks, > > > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > > that I wanted to surface here on the mailing list. > > > > > > It seems that to reach consensus for a C interface that is intended to > > > be broadly used by multiple programming languages, we may make some > > > compromises that harm or outright undermine some of the use cases that > > > motivated the creation of the C interface in the first place. That > > > does not seem good. I wonder if it would be more productive to reduce > > > the scope of the project to merely providing a C-header-based data > > > interface to the C++ project only. That was the original problem > > > statement and it seems in attempting to make it useful beyond C++ has > > > made it difficult to reach consensus. > > > > > > Thanks > > > Wes > > > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau > wrote: > > > > > > > > Thanks for addressing my comments. I'm actively reviewing the > proposal. > > > It > > > > is taking me more time than I would like given the time of the year > but I > > > > want to make sure that you know that I'm looking at it and hope to > > > provide > > > > additional feedback beyond that which I've provided thus far on the > PR. > > > > Will update soon. > > > > > > > > Thanks for your patience. > > > > > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou > > > > wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > Following Jacques's feedback, I drafted a new version of the C data > > > > > interface spec. > > > > > > > > > > The spec PR is here: > > > > > https://github.com/apache/arrow/pull/6040 > > > > > Direct link to the RST file: > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > > > > > > > There is also a C++ implementation, together with a Python <-> R > > > > > bridge demonstrating the functionality: > > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > > > The main change from the previous spec is that there are now two C > > > > > structures; one for the type or schema
Re: [DISCUSS] C Data Interface, take 2
hi Jacques, Taking a step back from the discussion, the original problem statement was to enable third party projects to produce the data structure used by C++ Array classes in C without depending on the C++ code That's the ArrayData class here https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L232 It is important for us simplify the programming interface with the C++ library, so I think that we should address this as an endogenous concern of the C++ project, namely providing a "C API for the C++ project". The C API for the C++ library needs to mirror what's in the C++ project (i.e. the ArrayData data structure). We should not advertise this as being a part of the project specification. - Wes On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau wrote: > > As I noted on the pull request, I think fundamentally this work is at odds > with the Arrow specification and being used to introduce a shadow > specification. > > I don't think our intentions about how people should use something really > influence how people will actually use or perceive it. They'll just find > supported Arrow code and expose things based on it and call it "Arrow > compatible". In other words, I don't think people in the outside world will > be able to perceive the distinction between "Arrow C++ compatible" and > "Arrow compatible". > > On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney wrote: > > > hi folks, > > > > I just made a comment in https://github.com/apache/arrow/pull/6026 > > that I wanted to surface here on the mailing list. > > > > It seems that to reach consensus for a C interface that is intended to > > be broadly used by multiple programming languages, we may make some > > compromises that harm or outright undermine some of the use cases that > > motivated the creation of the C interface in the first place. That > > does not seem good. I wonder if it would be more productive to reduce > > the scope of the project to merely providing a C-header-based data > > interface to the C++ project only. That was the original problem > > statement and it seems in attempting to make it useful beyond C++ has > > made it difficult to reach consensus. > > > > Thanks > > Wes > > > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau wrote: > > > > > > Thanks for addressing my comments. I'm actively reviewing the proposal. > > It > > > is taking me more time than I would like given the time of the year but I > > > want to make sure that you know that I'm looking at it and hope to > > provide > > > additional feedback beyond that which I've provided thus far on the PR. > > > Will update soon. > > > > > > Thanks for your patience. > > > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou > > wrote: > > > > > > > > > > > Hello, > > > > > > > > Following Jacques's feedback, I drafted a new version of the C data > > > > interface spec. > > > > > > > > The spec PR is here: > > > > https://github.com/apache/arrow/pull/6040 > > > > Direct link to the RST file: > > > > > > > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > > > > > There is also a C++ implementation, together with a Python <-> R > > > > bridge demonstrating the functionality: > > > > https://github.com/apache/arrow/pull/6026 > > > > > > > > The main change from the previous spec is that there are now two C > > > > structures; one for the type or schema information, one for the > > > > array or record batch data. This allows exchanging both kinds of > > > > information independently (and so, potentially, to exchange schema once > > > > and then multiple arrays or record batches). > > > > > > > > Comments and questions welcome. > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > > >
Re: [DISCUSS] C Data Interface, take 2
As I noted on the pull request, I think fundamentally this work is at odds with the Arrow specification and being used to introduce a shadow specification. I don't think our intentions about how people should use something really influence how people will actually use or perceive it. They'll just find supported Arrow code and expose things based on it and call it "Arrow compatible". In other words, I don't think people in the outside world will be able to perceive the distinction between "Arrow C++ compatible" and "Arrow compatible". On Mon, Jan 20, 2020 at 9:28 AM Wes McKinney wrote: > hi folks, > > I just made a comment in https://github.com/apache/arrow/pull/6026 > that I wanted to surface here on the mailing list. > > It seems that to reach consensus for a C interface that is intended to > be broadly used by multiple programming languages, we may make some > compromises that harm or outright undermine some of the use cases that > motivated the creation of the C interface in the first place. That > does not seem good. I wonder if it would be more productive to reduce > the scope of the project to merely providing a C-header-based data > interface to the C++ project only. That was the original problem > statement and it seems in attempting to make it useful beyond C++ has > made it difficult to reach consensus. > > Thanks > Wes > > On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau wrote: > > > > Thanks for addressing my comments. I'm actively reviewing the proposal. > It > > is taking me more time than I would like given the time of the year but I > > want to make sure that you know that I'm looking at it and hope to > provide > > additional feedback beyond that which I've provided thus far on the PR. > > Will update soon. > > > > Thanks for your patience. > > > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou > wrote: > > > > > > > > Hello, > > > > > > Following Jacques's feedback, I drafted a new version of the C data > > > interface spec. > > > > > > The spec PR is here: > > > https://github.com/apache/arrow/pull/6040 > > > Direct link to the RST file: > > > > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > > > There is also a C++ implementation, together with a Python <-> R > > > bridge demonstrating the functionality: > > > https://github.com/apache/arrow/pull/6026 > > > > > > The main change from the previous spec is that there are now two C > > > structures; one for the type or schema information, one for the > > > array or record batch data. This allows exchanging both kinds of > > > information independently (and so, potentially, to exchange schema once > > > and then multiple arrays or record batches). > > > > > > Comments and questions welcome. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > >
Re: [DISCUSS] C Data Interface, take 2
hi folks, I just made a comment in https://github.com/apache/arrow/pull/6026 that I wanted to surface here on the mailing list. It seems that to reach consensus for a C interface that is intended to be broadly used by multiple programming languages, we may make some compromises that harm or outright undermine some of the use cases that motivated the creation of the C interface in the first place. That does not seem good. I wonder if it would be more productive to reduce the scope of the project to merely providing a C-header-based data interface to the C++ project only. That was the original problem statement and it seems in attempting to make it useful beyond C++ has made it difficult to reach consensus. Thanks Wes On Sat, Dec 21, 2019 at 4:38 PM Jacques Nadeau wrote: > > Thanks for addressing my comments. I'm actively reviewing the proposal. It > is taking me more time than I would like given the time of the year but I > want to make sure that you know that I'm looking at it and hope to provide > additional feedback beyond that which I've provided thus far on the PR. > Will update soon. > > Thanks for your patience. > > On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou wrote: > > > > > Hello, > > > > Following Jacques's feedback, I drafted a new version of the C data > > interface spec. > > > > The spec PR is here: > > https://github.com/apache/arrow/pull/6040 > > Direct link to the RST file: > > > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > > > There is also a C++ implementation, together with a Python <-> R > > bridge demonstrating the functionality: > > https://github.com/apache/arrow/pull/6026 > > > > The main change from the previous spec is that there are now two C > > structures; one for the type or schema information, one for the > > array or record batch data. This allows exchanging both kinds of > > information independently (and so, potentially, to exchange schema once > > and then multiple arrays or record batches). > > > > Comments and questions welcome. > > > > Regards > > > > Antoine. > > > > > >
Re: [DISCUSS] C Data Interface, take 2
Thanks for addressing my comments. I'm actively reviewing the proposal. It is taking me more time than I would like given the time of the year but I want to make sure that you know that I'm looking at it and hope to provide additional feedback beyond that which I've provided thus far on the PR. Will update soon. Thanks for your patience. On Tue, Dec 17, 2019 at 11:16 AM Antoine Pitrou wrote: > > Hello, > > Following Jacques's feedback, I drafted a new version of the C data > interface spec. > > The spec PR is here: > https://github.com/apache/arrow/pull/6040 > Direct link to the RST file: > > https://github.com/apache/arrow/blob/5d8669d371401f9db12326b079e13c0058ba972b/docs/source/format/CDataInterface.rst > > There is also a C++ implementation, together with a Python <-> R > bridge demonstrating the functionality: > https://github.com/apache/arrow/pull/6026 > > The main change from the previous spec is that there are now two C > structures; one for the type or schema information, one for the > array or record batch data. This allows exchanging both kinds of > information independently (and so, potentially, to exchange schema once > and then multiple arrays or record batches). > > Comments and questions welcome. > > Regards > > Antoine. > > >