RE: JDBC Adapter for Apache-Arrow

2018-02-13 Thread Atul Dambalkar
Hi Uwe,

Sorry for late response on this thread. We have started some discussions 
internally. I wanted to know what help you would need specifically on the JDBC 
Adapter front, we would be happy to collaborate. At this time, we were mainly 
trying to model it around the C++ work that has gone in. Are there any 
particular use-cases/requirements you have in mind?

-Atul

-Original Message-
From: Jacques Nadeau [mailto:jacq...@apache.org] 
Sent: Tuesday, January 09, 2018 7:41 PM
To: dev@arrow.apache.org
Subject: Re: JDBC Adapter for Apache-Arrow

We have some stuff I  Dremio that we've planned on open sourcing but haven't 
yet done so. We should try to get that out for others to consume.

On Jan 7, 2018 11:49 AM, "Uwe L. Korn" <uw...@xhochy.com> wrote:

> Has anyone made progress on the JDBC adapter yet?
>
> I recently came across a lot projects with good JDBC drivers but not 
> so good drivers in Python. Having an Arrow-JDBC adaptor would make 
> these query engines much more useful to the Python community. Being an 
> Arrow committer and one of the turbodbc authors, I have quite some 
> knowledge in this area but my Java is a bit rusty and I have never 
> dealt with JDBC, so I‘m looking for someone to collaborate on this feature.
>
> Also this might be my ultimate chance to also get contributing to the 
> Java part of Apache Arrow.
>
> Uwe
>
> > Am 07.11.2017 um 20:01 schrieb Julian Hyde <jh...@apache.org>:
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I 
> > logged it within Calcite because this makes more sense that this is 
> > an Arrow adapter within Calcite than a Calcite adapter within Arrow).
> >
> > Note the last paragraph about
> > https://issues.apache.org/jira/browse/CALCITE-2025 and 
> > bioinformatics file formats. Readers for these formats would be 
> > useful extensions to Arrow regardless of whether the data was 
> > ultimately going to be queried using SQL. (Contributions welcome!) 
> > Calcite's bio adapter would build upon the Arrow readers in two 
> > respects:  (1) to read metadata from these files (e.g. are there any 
> > extra fields?) and (2) to push down processing (filters, projects) into the 
> > reader.
> >
> > Julian
> >
> >
> > On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar 
> > <atul.dambal...@xoriant.com> wrote:
> >> Hi,
> >>
> >> Don' t mean to interrupt the current discussion threads. But, based 
> >> on
> the discussions so far on the JDBC Adapter piece, are we in a position 
> to create a JIRA ticket for this as well as the other piece about 
> adding a direct Arrow objects creation support from JDBC drivers? If 
> yes, I can certainly go ahead and create JIRA for JDBC Adapter work.
> >>
> >> Julian, would you like to create the JIRA for the other item that 
> >> you
> proposed.
> >>
> >> -Atul
> >>
> >> -Original Message-
> >> From: Atul Dambalkar
> >> Sent: Thursday, November 02, 2017 2:59 PM
> >> To: dev@arrow.apache.org
> >> Subject: RE: JDBC Adapter for Apache-Arrow
> >>
> >> I also like the approach of adding an interface and making it art 
> >> of
> Arrow, so any specific JDBC driver can implement that interface to 
> directly expose Arrow objects without having to create JDBC objects in 
> the first place. One such implementation could be for Avatica itself 
> what Julian was suggesting earlier.
> >>
> >> -Original Message-
> >> From: Julian Hyde [mailto:jh...@apache.org]
> >> Sent: Tuesday, October 31, 2017 4:28 PM
> >> To: dev@arrow.apache.org
> >> Subject: Re: JDBC Adapter for Apache-Arrow
> >>
> >> Yeah, I agree, it should be an interface defined as part of Arrow. 
> >> Not
> driver-specific.
> >>
> >>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com>
> wrote:
> >>>
> >>> I really like Julian's idea of unwrapping Arrow objects out of the 
> >>> JDBC ResultSet, but I wonder if the unwrap class has to be 
> >>> specific to the driver and if an interface can be designed to be 
> >>> used by multiple
> drivers:
> >>> for drivers based on Arrow, it means you could totally skip the 
> >>> serialization/deserialization from/to JDBC records.
> >>> If such an interface exists, I would propose to add it to the 
> >>> Arrow project, with Arrow product/projects in charge of adding 
> >>> support for it in their own JDBC driver.
> >>>
> >>> Laurent
> >>

Re: JDBC Adapter for Apache-Arrow

2018-01-09 Thread Jacques Nadeau
We have some stuff I  Dremio that we've planned on open sourcing but
haven't yet done so. We should try to get that out for others to consume.

On Jan 7, 2018 11:49 AM, "Uwe L. Korn" <uw...@xhochy.com> wrote:

> Has anyone made progress on the JDBC adapter yet?
>
> I recently came across a lot projects with good JDBC drivers but not so
> good drivers in Python. Having an Arrow-JDBC adaptor would make these query
> engines much more useful to the Python community. Being an Arrow committer
> and one of the turbodbc authors, I have quite some knowledge in this area
> but my Java is a bit rusty and I have never dealt with JDBC, so I‘m looking
> for someone to collaborate on this feature.
>
> Also this might be my ultimate chance to also get contributing to the Java
> part of Apache Arrow.
>
> Uwe
>
> > Am 07.11.2017 um 20:01 schrieb Julian Hyde <jh...@apache.org>:
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I
> > logged it within Calcite because this makes more sense that this is an
> > Arrow adapter within Calcite than a Calcite adapter within Arrow).
> >
> > Note the last paragraph about
> > https://issues.apache.org/jira/browse/CALCITE-2025 and bioinformatics
> > file formats. Readers for these formats would be useful extensions to
> > Arrow regardless of whether the data was ultimately going to be
> > queried using SQL. (Contributions welcome!) Calcite's bio adapter
> > would build upon the Arrow readers in two respects:  (1) to read
> > metadata from these files (e.g. are there any extra fields?) and (2)
> > to push down processing (filters, projects) into the reader.
> >
> > Julian
> >
> >
> > On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar
> > <atul.dambal...@xoriant.com> wrote:
> >> Hi,
> >>
> >> Don' t mean to interrupt the current discussion threads. But, based on
> the discussions so far on the JDBC Adapter piece, are we in a position to
> create a JIRA ticket for this as well as the other piece about adding a
> direct Arrow objects creation support from JDBC drivers? If yes, I can
> certainly go ahead and create JIRA for JDBC Adapter work.
> >>
> >> Julian, would you like to create the JIRA for the other item that you
> proposed.
> >>
> >> -Atul
> >>
> >> -Original Message-
> >> From: Atul Dambalkar
> >> Sent: Thursday, November 02, 2017 2:59 PM
> >> To: dev@arrow.apache.org
> >> Subject: RE: JDBC Adapter for Apache-Arrow
> >>
> >> I also like the approach of adding an interface and making it art of
> Arrow, so any specific JDBC driver can implement that interface to directly
> expose Arrow objects without having to create JDBC objects in the first
> place. One such implementation could be for Avatica itself what Julian was
> suggesting earlier.
> >>
> >> -Original Message-
> >> From: Julian Hyde [mailto:jh...@apache.org]
> >> Sent: Tuesday, October 31, 2017 4:28 PM
> >> To: dev@arrow.apache.org
> >> Subject: Re: JDBC Adapter for Apache-Arrow
> >>
> >> Yeah, I agree, it should be an interface defined as part of Arrow. Not
> driver-specific.
> >>
> >>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com>
> wrote:
> >>>
> >>> I really like Julian's idea of unwrapping Arrow objects out of the
> >>> JDBC ResultSet, but I wonder if the unwrap class has to be specific to
> >>> the driver and if an interface can be designed to be used by multiple
> drivers:
> >>> for drivers based on Arrow, it means you could totally skip the
> >>> serialization/deserialization from/to JDBC records.
> >>> If such an interface exists, I would propose to add it to the Arrow
> >>> project, with Arrow product/projects in charge of adding support for
> >>> it in their own JDBC driver.
> >>>
> >>> Laurent
> >>>
> >>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar
> >>> <atul.dambal...@xoriant.com>
> >>> wrote:
> >>>
> >>>> Thanks for your thoughts Julian. I think, adding support for Arrow
> >>>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be
> >>>> certainly taken up as another activity. And you are right, we will
> >>>> have to look at specific JDBC driver to really optimize it
> individually.
> >>>>
> >>>> I would be curious if there are any further inputs/comments from
> >>>> other Dev fol

Re: JDBC Adapter for Apache-Arrow

2018-01-07 Thread Uwe L. Korn
Has anyone made progress on the JDBC adapter yet? 

I recently came across a lot projects with good JDBC drivers but not so good 
drivers in Python. Having an Arrow-JDBC adaptor would make these query engines 
much more useful to the Python community. Being an Arrow committer and one of 
the turbodbc authors, I have quite some knowledge in this area but my Java is a 
bit rusty and I have never dealt with JDBC, so I‘m looking for someone to 
collaborate on this feature.

Also this might be my ultimate chance to also get contributing to the Java part 
of Apache Arrow.

Uwe

> Am 07.11.2017 um 20:01 schrieb Julian Hyde <jh...@apache.org>:
> 
> I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I
> logged it within Calcite because this makes more sense that this is an
> Arrow adapter within Calcite than a Calcite adapter within Arrow).
> 
> Note the last paragraph about
> https://issues.apache.org/jira/browse/CALCITE-2025 and bioinformatics
> file formats. Readers for these formats would be useful extensions to
> Arrow regardless of whether the data was ultimately going to be
> queried using SQL. (Contributions welcome!) Calcite's bio adapter
> would build upon the Arrow readers in two respects:  (1) to read
> metadata from these files (e.g. are there any extra fields?) and (2)
> to push down processing (filters, projects) into the reader.
> 
> Julian
> 
> 
> On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar
> <atul.dambal...@xoriant.com> wrote:
>> Hi,
>> 
>> Don' t mean to interrupt the current discussion threads. But, based on the 
>> discussions so far on the JDBC Adapter piece, are we in a position to create 
>> a JIRA ticket for this as well as the other piece about adding a direct 
>> Arrow objects creation support from JDBC drivers? If yes, I can certainly go 
>> ahead and create JIRA for JDBC Adapter work.
>> 
>> Julian, would you like to create the JIRA for the other item that you 
>> proposed.
>> 
>> -Atul
>> 
>> -Original Message-
>> From: Atul Dambalkar
>> Sent: Thursday, November 02, 2017 2:59 PM
>> To: dev@arrow.apache.org
>> Subject: RE: JDBC Adapter for Apache-Arrow
>> 
>> I also like the approach of adding an interface and making it art of Arrow, 
>> so any specific JDBC driver can implement that interface to directly expose 
>> Arrow objects without having to create JDBC objects in the first place. One 
>> such implementation could be for Avatica itself what Julian was suggesting 
>> earlier.
>> 
>> -Original Message-
>> From: Julian Hyde [mailto:jh...@apache.org]
>> Sent: Tuesday, October 31, 2017 4:28 PM
>> To: dev@arrow.apache.org
>> Subject: Re: JDBC Adapter for Apache-Arrow
>> 
>> Yeah, I agree, it should be an interface defined as part of Arrow. Not 
>> driver-specific.
>> 
>>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
>>> 
>>> I really like Julian's idea of unwrapping Arrow objects out of the
>>> JDBC ResultSet, but I wonder if the unwrap class has to be specific to
>>> the driver and if an interface can be designed to be used by multiple 
>>> drivers:
>>> for drivers based on Arrow, it means you could totally skip the
>>> serialization/deserialization from/to JDBC records.
>>> If such an interface exists, I would propose to add it to the Arrow
>>> project, with Arrow product/projects in charge of adding support for
>>> it in their own JDBC driver.
>>> 
>>> Laurent
>>> 
>>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar
>>> <atul.dambal...@xoriant.com>
>>> wrote:
>>> 
>>>> Thanks for your thoughts Julian. I think, adding support for Arrow
>>>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be
>>>> certainly taken up as another activity. And you are right, we will
>>>> have to look at specific JDBC driver to really optimize it individually.
>>>> 
>>>> I would be curious if there are any further inputs/comments from
>>>> other Dev folks, on the JDBC adapter aspect.
>>>> 
>>>> -Atul
>>>> 
>>>> -Original Message-
>>>> From: Julian Hyde [mailto:jh...@apache.org]
>>>> Sent: Tuesday, October 31, 2017 11:12 AM
>>>> To: dev@arrow.apache.org
>>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>>> 
>>>> Sorry I didn’t read your email thoroughly enough. I was talking about
>>>> the inverse (JDBC reading from Arrow) whereas you are talking about
>&g

Re: JDBC Adapter for Apache-Arrow

2017-11-07 Thread Julian Hyde
I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I
logged it within Calcite because this makes more sense that this is an
Arrow adapter within Calcite than a Calcite adapter within Arrow).

Note the last paragraph about
https://issues.apache.org/jira/browse/CALCITE-2025 and bioinformatics
file formats. Readers for these formats would be useful extensions to
Arrow regardless of whether the data was ultimately going to be
queried using SQL. (Contributions welcome!) Calcite's bio adapter
would build upon the Arrow readers in two respects:  (1) to read
metadata from these files (e.g. are there any extra fields?) and (2)
to push down processing (filters, projects) into the reader.

Julian


On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar
<atul.dambal...@xoriant.com> wrote:
> Hi,
>
> Don' t mean to interrupt the current discussion threads. But, based on the 
> discussions so far on the JDBC Adapter piece, are we in a position to create 
> a JIRA ticket for this as well as the other piece about adding a direct Arrow 
> objects creation support from JDBC drivers? If yes, I can certainly go ahead 
> and create JIRA for JDBC Adapter work.
>
> Julian, would you like to create the JIRA for the other item that you 
> proposed.
>
> -Atul
>
> -Original Message-
> From: Atul Dambalkar
> Sent: Thursday, November 02, 2017 2:59 PM
> To: dev@arrow.apache.org
> Subject: RE: JDBC Adapter for Apache-Arrow
>
> I also like the approach of adding an interface and making it art of Arrow, 
> so any specific JDBC driver can implement that interface to directly expose 
> Arrow objects without having to create JDBC objects in the first place. One 
> such implementation could be for Avatica itself what Julian was suggesting 
> earlier.
>
> -Original Message-
> From: Julian Hyde [mailto:jh...@apache.org]
> Sent: Tuesday, October 31, 2017 4:28 PM
> To: dev@arrow.apache.org
> Subject: Re: JDBC Adapter for Apache-Arrow
>
> Yeah, I agree, it should be an interface defined as part of Arrow. Not 
> driver-specific.
>
>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
>>
>> I really like Julian's idea of unwrapping Arrow objects out of the
>> JDBC ResultSet, but I wonder if the unwrap class has to be specific to
>> the driver and if an interface can be designed to be used by multiple 
>> drivers:
>> for drivers based on Arrow, it means you could totally skip the
>> serialization/deserialization from/to JDBC records.
>> If such an interface exists, I would propose to add it to the Arrow
>> project, with Arrow product/projects in charge of adding support for
>> it in their own JDBC driver.
>>
>> Laurent
>>
>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar
>> <atul.dambal...@xoriant.com>
>> wrote:
>>
>>> Thanks for your thoughts Julian. I think, adding support for Arrow
>>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be
>>> certainly taken up as another activity. And you are right, we will
>>> have to look at specific JDBC driver to really optimize it individually.
>>>
>>> I would be curious if there are any further inputs/comments from
>>> other Dev folks, on the JDBC adapter aspect.
>>>
>>> -Atul
>>>
>>> -Original Message-
>>> From: Julian Hyde [mailto:jh...@apache.org]
>>> Sent: Tuesday, October 31, 2017 11:12 AM
>>> To: dev@arrow.apache.org
>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>>
>>> Sorry I didn’t read your email thoroughly enough. I was talking about
>>> the inverse (JDBC reading from Arrow) whereas you are talking about
>>> Arrow reading from JDBC. Your proposal makes perfect sense.
>>>
>>> JDBC is quite a chatty interface (a call for every column of every
>>> row, plus an occasional call to find out whether values are null, and
>>> objects such as strings and timestamps become a Java heap object) so
>>> for specific JDBC drivers it may be possible to optimize. For
>>> example, the Avatica remove driver receives row sets in an RPC
>>> response in protobuf format. It may be useful if the JDBC driver were
>>> able to expose a direct path from protobuf to Arrow. 
>>> "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>>> might be one way to achieve this.
>>>
>>> Julian
>>>
>>>
>>>
>>>
>>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar
>>>> <atul.dambal...@xoriant.com>
>>> wrote:
>>>>
>>>> Hi Julian,
>>>>
>>>> 

RE: JDBC Adapter for Apache-Arrow

2017-11-07 Thread Atul Dambalkar
Hi,

Don' t mean to interrupt the current discussion threads. But, based on the 
discussions so far on the JDBC Adapter piece, are we in a position to create a 
JIRA ticket for this as well as the other piece about adding a direct Arrow 
objects creation support from JDBC drivers? If yes, I can certainly go ahead 
and create JIRA for JDBC Adapter work.

Julian, would you like to create the JIRA for the other item that you proposed.

-Atul

-Original Message-
From: Atul Dambalkar 
Sent: Thursday, November 02, 2017 2:59 PM
To: dev@arrow.apache.org
Subject: RE: JDBC Adapter for Apache-Arrow

I also like the approach of adding an interface and making it art of Arrow, so 
any specific JDBC driver can implement that interface to directly expose Arrow 
objects without having to create JDBC objects in the first place. One such 
implementation could be for Avatica itself what Julian was suggesting earlier.

-Original Message-
From: Julian Hyde [mailto:jh...@apache.org]
Sent: Tuesday, October 31, 2017 4:28 PM
To: dev@arrow.apache.org
Subject: Re: JDBC Adapter for Apache-Arrow

Yeah, I agree, it should be an interface defined as part of Arrow. Not 
driver-specific.

> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
> 
> I really like Julian's idea of unwrapping Arrow objects out of the 
> JDBC ResultSet, but I wonder if the unwrap class has to be specific to 
> the driver and if an interface can be designed to be used by multiple drivers:
> for drivers based on Arrow, it means you could totally skip the 
> serialization/deserialization from/to JDBC records.
> If such an interface exists, I would propose to add it to the Arrow 
> project, with Arrow product/projects in charge of adding support for 
> it in their own JDBC driver.
> 
> Laurent
> 
> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar 
> <atul.dambal...@xoriant.com>
> wrote:
> 
>> Thanks for your thoughts Julian. I think, adding support for Arrow 
>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be 
>> certainly taken up as another activity. And you are right, we will 
>> have to look at specific JDBC driver to really optimize it individually.
>> 
>> I would be curious if there are any further inputs/comments from 
>> other Dev folks, on the JDBC adapter aspect.
>> 
>> -Atul
>> 
>> -Original Message-
>> From: Julian Hyde [mailto:jh...@apache.org]
>> Sent: Tuesday, October 31, 2017 11:12 AM
>> To: dev@arrow.apache.org
>> Subject: Re: JDBC Adapter for Apache-Arrow
>> 
>> Sorry I didn’t read your email thoroughly enough. I was talking about 
>> the inverse (JDBC reading from Arrow) whereas you are talking about 
>> Arrow reading from JDBC. Your proposal makes perfect sense.
>> 
>> JDBC is quite a chatty interface (a call for every column of every 
>> row, plus an occasional call to find out whether values are null, and 
>> objects such as strings and timestamps become a Java heap object) so 
>> for specific JDBC drivers it may be possible to optimize. For 
>> example, the Avatica remove driver receives row sets in an RPC 
>> response in protobuf format. It may be useful if the JDBC driver were 
>> able to expose a direct path from protobuf to Arrow. 
>> "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>> might be one way to achieve this.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar 
>>> <atul.dambal...@xoriant.com>
>> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> Thanks for your response. If I understand correctly (looking at 
>>> other
>> adapters), Calcite-Arrow adapter would provide SQL front end for 
>> in-memory Arrow data objects/structures. So from that perspective, 
>> are you suggesting building the Calcite-Arrow adapter?
>>> 
>>> In this case, what we are saying is to provide a mechanism for 
>>> upstream
>> apps to be able to get/create Arrow objects/structures from a 
>> relational database. This would also mean converting row like data 
>> from a SQL Database to columnar Arrow data structures. The utility 
>> may be, can make use of JDBC's MetaData features to figure out the 
>> underlying DB schema and define Arrow columnar schema. Also 
>> underlying database in this case would be any relational DB and hence 
>> would be persisted to the disk, but the Arrow objects being in-memory can be 
>> ephemeral.
>>> 
>>> Please correct me if I am missing anything.
>>> 
>>> -Atul
>>> 
>>> -Original Message-
>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com]

RE: JDBC Adapter for Apache-Arrow

2017-11-02 Thread Atul Dambalkar
I also like the approach of adding an interface and making it art of Arrow, so 
any specific JDBC driver can implement that interface to directly expose Arrow 
objects without having to create JDBC objects in the first place. One such 
implementation could be for Avatica itself what Julian was suggesting earlier.

-Original Message-
From: Julian Hyde [mailto:jh...@apache.org] 
Sent: Tuesday, October 31, 2017 4:28 PM
To: dev@arrow.apache.org
Subject: Re: JDBC Adapter for Apache-Arrow

Yeah, I agree, it should be an interface defined as part of Arrow. Not 
driver-specific.

> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
> 
> I really like Julian's idea of unwrapping Arrow objects out of the 
> JDBC ResultSet, but I wonder if the unwrap class has to be specific to 
> the driver and if an interface can be designed to be used by multiple drivers:
> for drivers based on Arrow, it means you could totally skip the 
> serialization/deserialization from/to JDBC records.
> If such an interface exists, I would propose to add it to the Arrow 
> project, with Arrow product/projects in charge of adding support for 
> it in their own JDBC driver.
> 
> Laurent
> 
> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar 
> <atul.dambal...@xoriant.com>
> wrote:
> 
>> Thanks for your thoughts Julian. I think, adding support for Arrow 
>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be 
>> certainly taken up as another activity. And you are right, we will 
>> have to look at specific JDBC driver to really optimize it individually.
>> 
>> I would be curious if there are any further inputs/comments from 
>> other Dev folks, on the JDBC adapter aspect.
>> 
>> -Atul
>> 
>> -Original Message-
>> From: Julian Hyde [mailto:jh...@apache.org]
>> Sent: Tuesday, October 31, 2017 11:12 AM
>> To: dev@arrow.apache.org
>> Subject: Re: JDBC Adapter for Apache-Arrow
>> 
>> Sorry I didn’t read your email thoroughly enough. I was talking about 
>> the inverse (JDBC reading from Arrow) whereas you are talking about 
>> Arrow reading from JDBC. Your proposal makes perfect sense.
>> 
>> JDBC is quite a chatty interface (a call for every column of every 
>> row, plus an occasional call to find out whether values are null, and 
>> objects such as strings and timestamps become a Java heap object) so 
>> for specific JDBC drivers it may be possible to optimize. For 
>> example, the Avatica remove driver receives row sets in an RPC 
>> response in protobuf format. It may be useful if the JDBC driver were 
>> able to expose a direct path from protobuf to Arrow. 
>> "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>> might be one way to achieve this.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar 
>>> <atul.dambal...@xoriant.com>
>> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> Thanks for your response. If I understand correctly (looking at 
>>> other
>> adapters), Calcite-Arrow adapter would provide SQL front end for 
>> in-memory Arrow data objects/structures. So from that perspective, 
>> are you suggesting building the Calcite-Arrow adapter?
>>> 
>>> In this case, what we are saying is to provide a mechanism for 
>>> upstream
>> apps to be able to get/create Arrow objects/structures from a 
>> relational database. This would also mean converting row like data 
>> from a SQL Database to columnar Arrow data structures. The utility 
>> may be, can make use of JDBC's MetaData features to figure out the 
>> underlying DB schema and define Arrow columnar schema. Also 
>> underlying database in this case would be any relational DB and hence 
>> would be persisted to the disk, but the Arrow objects being in-memory can be 
>> ephemeral.
>>> 
>>> Please correct me if I am missing anything.
>>> 
>>> -Atul
>>> 
>>> -Original Message-
>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
>>> Sent: Monday, October 30, 2017 7:50 PM
>>> To: dev@arrow.apache.org
>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>> 
>>> How about writing an Arrow adapter for Calcite? I think it amounts 
>>> to
>> the same thing - you would inherit Calcite’s SQL parser and Avatica 
>> JDBC stack.
>>> 
>>> Would this database be ephemeral (i.e. would the data go away when 
>>> you
>> close the connection)? If not, how would you know where to load the 
>> data from?
>>> 
>>> Juli

Re: JDBC Adapter for Apache-Arrow

2017-11-01 Thread Julian Hyde
http://lmgtfy.com/?q=unsubscribe+apache+arrow 
<http://lmgtfy.com/?q=unsubscribe+apache+arrow> 

> On Oct 31, 2017, at 5:20 PM, 丁锦祥 <vence...@gmail.com> wrote:
> 
> unsubscribe
> 
> On Tue, Oct 31, 2017 at 4:28 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> Yeah, I agree, it should be an interface defined as part of Arrow. Not
>> driver-specific.
>> 
>>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
>>> 
>>> I really like Julian's idea of unwrapping Arrow objects out of the JDBC
>>> ResultSet, but I wonder if the unwrap class has to be specific to the
>>> driver and if an interface can be designed to be used by multiple
>> drivers:
>>> for drivers based on Arrow, it means you could totally skip the
>>> serialization/deserialization from/to JDBC records.
>>> If such an interface exists, I would propose to add it to the Arrow
>>> project, with Arrow product/projects in charge of adding support for it
>> in
>>> their own JDBC driver.
>>> 
>>> Laurent
>>> 
>>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <
>> atul.dambal...@xoriant.com>
>>> wrote:
>>> 
>>>> Thanks for your thoughts Julian. I think, adding support for Arrow
>> objects
>>>> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly
>> taken
>>>> up as another activity. And you are right, we will have to look at
>> specific
>>>> JDBC driver to really optimize it individually.
>>>> 
>>>> I would be curious if there are any further inputs/comments from other
>> Dev
>>>> folks, on the JDBC adapter aspect.
>>>> 
>>>> -Atul
>>>> 
>>>> -Original Message-
>>>> From: Julian Hyde [mailto:jh...@apache.org]
>>>> Sent: Tuesday, October 31, 2017 11:12 AM
>>>> To: dev@arrow.apache.org
>>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>>> 
>>>> Sorry I didn’t read your email thoroughly enough. I was talking about
>> the
>>>> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
>>>> reading from JDBC. Your proposal makes perfect sense.
>>>> 
>>>> JDBC is quite a chatty interface (a call for every column of every row,
>>>> plus an occasional call to find out whether values are null, and objects
>>>> such as strings and timestamps become a Java heap object) so for
>> specific
>>>> JDBC drivers it may be possible to optimize. For example, the Avatica
>>>> remove driver receives row sets in an RPC response in protobuf format.
>> It
>>>> may be useful if the JDBC driver were able to expose a direct path from
>>>> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>>>> might be one way to achieve this.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <
>> atul.dambal...@xoriant.com>
>>>> wrote:
>>>>> 
>>>>> Hi Julian,
>>>>> 
>>>>> Thanks for your response. If I understand correctly (looking at other
>>>> adapters), Calcite-Arrow adapter would provide SQL front end for
>> in-memory
>>>> Arrow data objects/structures. So from that perspective, are you
>> suggesting
>>>> building the Calcite-Arrow adapter?
>>>>> 
>>>>> In this case, what we are saying is to provide a mechanism for upstream
>>>> apps to be able to get/create Arrow objects/structures from a relational
>>>> database. This would also mean converting row like data from a SQL
>> Database
>>>> to columnar Arrow data structures. The utility may be, can make use of
>>>> JDBC's MetaData features to figure out the underlying DB schema and
>> define
>>>> Arrow columnar schema. Also underlying database in this case would be
>> any
>>>> relational DB and hence would be persisted to the disk, but the Arrow
>>>> objects being in-memory can be ephemeral.
>>>>> 
>>>>> Please correct me if I am missing anything.
>>>>> 
>>>>> -Atul
>>>>> 
>>>>> -Original Message-
>>>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
>>>>> Sent: Monday, October 30, 2017 7:50 PM
>>>>> To: dev@arrow.apache.org
>>>>>

Re: JDBC Adapter for Apache-Arrow

2017-10-31 Thread Laurent Goujon
I really like Julian's idea of unwrapping Arrow objects out of the JDBC
ResultSet, but I wonder if the unwrap class has to be specific to the
driver and if an interface can be designed to be used by multiple drivers:
for drivers based on Arrow, it means you could totally skip the
serialization/deserialization from/to JDBC records.
If such an interface exists, I would propose to add it to the Arrow
project, with Arrow product/projects in charge of adding support for it in
their own JDBC driver.

Laurent

On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
wrote:

> Thanks for your thoughts Julian. I think, adding support for Arrow objects
> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly taken
> up as another activity. And you are right, we will have to look at specific
> JDBC driver to really optimize it individually.
>
> I would be curious if there are any further inputs/comments from other Dev
> folks, on the JDBC adapter aspect.
>
> -Atul
>
> -Original Message-
> From: Julian Hyde [mailto:jh...@apache.org]
> Sent: Tuesday, October 31, 2017 11:12 AM
> To: dev@arrow.apache.org
> Subject: Re: JDBC Adapter for Apache-Arrow
>
> Sorry I didn’t read your email thoroughly enough. I was talking about the
> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
> reading from JDBC. Your proposal makes perfect sense.
>
> JDBC is quite a chatty interface (a call for every column of every row,
> plus an occasional call to find out whether values are null, and objects
> such as strings and timestamps become a Java heap object) so for specific
> JDBC drivers it may be possible to optimize. For example, the Avatica
> remove driver receives row sets in an RPC response in protobuf format. It
> may be useful if the JDBC driver were able to expose a direct path from
> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
> might be one way to achieve this.
>
> Julian
>
>
>
>
> > On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com>
> wrote:
> >
> > Hi Julian,
> >
> > Thanks for your response. If I understand correctly (looking at other
> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory
> Arrow data objects/structures. So from that perspective, are you suggesting
> building the Calcite-Arrow adapter?
> >
> > In this case, what we are saying is to provide a mechanism for upstream
> apps to be able to get/create Arrow objects/structures from a relational
> database. This would also mean converting row like data from a SQL Database
> to columnar Arrow data structures. The utility may be, can make use of
> JDBC's MetaData features to figure out the underlying DB schema and define
> Arrow columnar schema. Also underlying database in this case would be any
> relational DB and hence would be persisted to the disk, but the Arrow
> objects being in-memory can be ephemeral.
> >
> > Please correct me if I am missing anything.
> >
> > -Atul
> >
> > -Original Message-
> > From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
> > Sent: Monday, October 30, 2017 7:50 PM
> > To: dev@arrow.apache.org
> > Subject: Re: JDBC Adapter for Apache-Arrow
> >
> > How about writing an Arrow adapter for Calcite? I think it amounts to
> the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC
> stack.
> >
> > Would this database be ephemeral (i.e. would the data go away when you
> close the connection)? If not, how would you know where to load the data
> from?
> >
> > Julian
> >
> >> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I wanted to open up a conversation here regarding developing a
> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
> discussion with Wes McKinney and Siddharth Teotia on this a couple weeks
> earlier.
> >>
> >> Basically at a high level (over-simplified) this adapter/API will allow
> upstream apps to query RDBMS data over JDBC and get the JDBC objects
> converted to Arrow in-memory (JVM) objects/structures. The upstream utility
> can then work with Arrow objects/structures with usual performance
> benefits. The utility will be very much similar to C++ implementation of
> "Convert a vector of row-wise data into an Arrow table" as described here -
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
> >>
> >> How useful this adapter would be and which other Apache projects would
> benefit from this? Based on the usability we can open a JIRA for this
> activity and start looking into the implementation details.
> >>
> >> Regards,
> >> -Atul Dambalkar
> >>
> >>
>
>


RE: JDBC Adapter for Apache-Arrow

2017-10-31 Thread Atul Dambalkar
Thanks for your thoughts Julian. I think, adding support for Arrow objects for 
Avatica Remote Driver (AvaticaToArrowConverter) can be certainly taken up as 
another activity. And you are right, we will have to look at specific JDBC 
driver to really optimize it individually.

I would be curious if there are any further inputs/comments from other Dev 
folks, on the JDBC adapter aspect.

-Atul

-Original Message-
From: Julian Hyde [mailto:jh...@apache.org] 
Sent: Tuesday, October 31, 2017 11:12 AM
To: dev@arrow.apache.org
Subject: Re: JDBC Adapter for Apache-Arrow

Sorry I didn’t read your email thoroughly enough. I was talking about the 
inverse (JDBC reading from Arrow) whereas you are talking about Arrow reading 
from JDBC. Your proposal makes perfect sense.

JDBC is quite a chatty interface (a call for every column of every row, plus an 
occasional call to find out whether values are null, and objects such as 
strings and timestamps become a Java heap object) so for specific JDBC drivers 
it may be possible to optimize. For example, the Avatica remove driver receives 
row sets in an RPC response in protobuf format. It may be useful if the JDBC 
driver were able to expose a direct path from protobuf to Arrow. 
"ResultSet.unwrap(AvaticaToArrowConverter.class)” might be one way to achieve 
this.

Julian




> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com> 
> wrote:
> 
> Hi Julian,
> 
> Thanks for your response. If I understand correctly (looking at other 
> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory 
> Arrow data objects/structures. So from that perspective, are you suggesting 
> building the Calcite-Arrow adapter? 
> 
> In this case, what we are saying is to provide a mechanism for upstream apps 
> to be able to get/create Arrow objects/structures from a relational database. 
> This would also mean converting row like data from a SQL Database to columnar 
> Arrow data structures. The utility may be, can make use of JDBC's MetaData 
> features to figure out the underlying DB schema and define Arrow columnar 
> schema. Also underlying database in this case would be any relational DB and 
> hence would be persisted to the disk, but the Arrow objects being in-memory 
> can be ephemeral. 
> 
> Please correct me if I am missing anything. 
> 
> -Atul
> 
> -Original Message-
> From: Julian Hyde [mailto:jhyde.apa...@gmail.com] 
> Sent: Monday, October 30, 2017 7:50 PM
> To: dev@arrow.apache.org
> Subject: Re: JDBC Adapter for Apache-Arrow
> 
> How about writing an Arrow adapter for Calcite? I think it amounts to the 
> same thing - you would inherit Calcite’s SQL parser and Avatica JDBC stack. 
> 
> Would this database be ephemeral (i.e. would the data go away when you close 
> the connection)? If not, how would you know where to load the data from?
> 
> Julian
> 
>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com> 
>> wrote:
>> 
>> Hi all,
>> 
>> I wanted to open up a conversation here regarding developing a Java-based 
>> JDBC Adapter for Apache Arrow. I have had a preliminary discussion with Wes 
>> McKinney and Siddharth Teotia on this a couple weeks earlier.
>> 
>> Basically at a high level (over-simplified) this adapter/API will allow 
>> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
>> converted to Arrow in-memory (JVM) objects/structures. The upstream utility 
>> can then work with Arrow objects/structures with usual performance benefits. 
>> The utility will be very much similar to C++ implementation of "Convert a 
>> vector of row-wise data into an Arrow table" as described here - 
>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
>> 
>> How useful this adapter would be and which other Apache projects would 
>> benefit from this? Based on the usability we can open a JIRA for this 
>> activity and start looking into the implementation details.
>> 
>> Regards,
>> -Atul Dambalkar
>> 
>> 



Re: JDBC Adapter for Apache-Arrow

2017-10-31 Thread Julian Hyde
Sorry I didn’t read your email thoroughly enough. I was talking about the 
inverse (JDBC reading from Arrow) whereas you are talking about Arrow reading 
from JDBC. Your proposal makes perfect sense.

JDBC is quite a chatty interface (a call for every column of every row, plus an 
occasional call to find out whether values are null, and objects such as 
strings and timestamps become a Java heap object) so for specific JDBC drivers 
it may be possible to optimize. For example, the Avatica remove driver receives 
row sets in an RPC response in protobuf format. It may be useful if the JDBC 
driver were able to expose a direct path from protobuf to Arrow. 
"ResultSet.unwrap(AvaticaToArrowConverter.class)” might be one way to achieve 
this.

Julian




> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com> 
> wrote:
> 
> Hi Julian,
> 
> Thanks for your response. If I understand correctly (looking at other 
> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory 
> Arrow data objects/structures. So from that perspective, are you suggesting 
> building the Calcite-Arrow adapter? 
> 
> In this case, what we are saying is to provide a mechanism for upstream apps 
> to be able to get/create Arrow objects/structures from a relational database. 
> This would also mean converting row like data from a SQL Database to columnar 
> Arrow data structures. The utility may be, can make use of JDBC's MetaData 
> features to figure out the underlying DB schema and define Arrow columnar 
> schema. Also underlying database in this case would be any relational DB and 
> hence would be persisted to the disk, but the Arrow objects being in-memory 
> can be ephemeral. 
> 
> Please correct me if I am missing anything. 
> 
> -Atul
> 
> -Original Message-
> From: Julian Hyde [mailto:jhyde.apa...@gmail.com] 
> Sent: Monday, October 30, 2017 7:50 PM
> To: dev@arrow.apache.org
> Subject: Re: JDBC Adapter for Apache-Arrow
> 
> How about writing an Arrow adapter for Calcite? I think it amounts to the 
> same thing - you would inherit Calcite’s SQL parser and Avatica JDBC stack. 
> 
> Would this database be ephemeral (i.e. would the data go away when you close 
> the connection)? If not, how would you know where to load the data from?
> 
> Julian
> 
>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com> 
>> wrote:
>> 
>> Hi all,
>> 
>> I wanted to open up a conversation here regarding developing a Java-based 
>> JDBC Adapter for Apache Arrow. I have had a preliminary discussion with Wes 
>> McKinney and Siddharth Teotia on this a couple weeks earlier.
>> 
>> Basically at a high level (over-simplified) this adapter/API will allow 
>> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
>> converted to Arrow in-memory (JVM) objects/structures. The upstream utility 
>> can then work with Arrow objects/structures with usual performance benefits. 
>> The utility will be very much similar to C++ implementation of "Convert a 
>> vector of row-wise data into an Arrow table" as described here - 
>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
>> 
>> How useful this adapter would be and which other Apache projects would 
>> benefit from this? Based on the usability we can open a JIRA for this 
>> activity and start looking into the implementation details.
>> 
>> Regards,
>> -Atul Dambalkar
>> 
>> 



RE: JDBC Adapter for Apache-Arrow

2017-10-31 Thread Atul Dambalkar
Hi Julian,

Thanks for your response. If I understand correctly (looking at other 
adapters), Calcite-Arrow adapter would provide SQL front end for in-memory 
Arrow data objects/structures. So from that perspective, are you suggesting 
building the Calcite-Arrow adapter? 

In this case, what we are saying is to provide a mechanism for upstream apps to 
be able to get/create Arrow objects/structures from a relational database. This 
would also mean converting row like data from a SQL Database to columnar Arrow 
data structures. The utility may be, can make use of JDBC's MetaData features 
to figure out the underlying DB schema and define Arrow columnar schema. Also 
underlying database in this case would be any relational DB and hence would be 
persisted to the disk, but the Arrow objects being in-memory can be ephemeral. 

Please correct me if I am missing anything. 

-Atul

-Original Message-
From: Julian Hyde [mailto:jhyde.apa...@gmail.com] 
Sent: Monday, October 30, 2017 7:50 PM
To: dev@arrow.apache.org
Subject: Re: JDBC Adapter for Apache-Arrow

How about writing an Arrow adapter for Calcite? I think it amounts to the same 
thing - you would inherit Calcite’s SQL parser and Avatica JDBC stack. 

Would this database be ephemeral (i.e. would the data go away when you close 
the connection)? If not, how would you know where to load the data from?

Julian

> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com> 
> wrote:
> 
> Hi all,
> 
> I wanted to open up a conversation here regarding developing a Java-based 
> JDBC Adapter for Apache Arrow. I have had a preliminary discussion with Wes 
> McKinney and Siddharth Teotia on this a couple weeks earlier.
> 
> Basically at a high level (over-simplified) this adapter/API will allow 
> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
> converted to Arrow in-memory (JVM) objects/structures. The upstream utility 
> can then work with Arrow objects/structures with usual performance benefits. 
> The utility will be very much similar to C++ implementation of "Convert a 
> vector of row-wise data into an Arrow table" as described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
> 
> How useful this adapter would be and which other Apache projects would 
> benefit from this? Based on the usability we can open a JIRA for this 
> activity and start looking into the implementation details.
> 
> Regards,
> -Atul Dambalkar
> 
> 


Re: JDBC Adapter for Apache-Arrow

2017-10-30 Thread Julian Hyde
How about writing an Arrow adapter for Calcite? I think it amounts to the same 
thing - you would inherit Calcite’s SQL parser and Avatica JDBC stack. 

Would this database be ephemeral (i.e. would the data go away when you close 
the connection)? If not, how would you know where to load the data from?

Julian

> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar  
> wrote:
> 
> Hi all,
> 
> I wanted to open up a conversation here regarding developing a Java-based 
> JDBC Adapter for Apache Arrow. I have had a preliminary discussion with Wes 
> McKinney and Siddharth Teotia on this a couple weeks earlier.
> 
> Basically at a high level (over-simplified) this adapter/API will allow 
> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
> converted to Arrow in-memory (JVM) objects/structures. The upstream utility 
> can then work with Arrow objects/structures with usual performance benefits. 
> The utility will be very much similar to C++ implementation of "Convert a 
> vector of row-wise data into an Arrow table" as described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
> 
> How useful this adapter would be and which other Apache projects would 
> benefit from this? Based on the usability we can open a JIRA for this 
> activity and start looking into the implementation details.
> 
> Regards,
> -Atul Dambalkar
> 
>