[ 
https://issues.apache.org/jira/browse/ARROW-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369457#comment-16369457
 ] 

Atul Dambalkar edited comment on ARROW-1780 at 2/19/18 7:51 PM:
----------------------------------------------------------------

Comments from Uwe Korn on Slack channel - 

My main plan was to make JDBC drivers accessible very fast from Python / Pandas 
programs. Currently, you either have the option for most DBs to either use 
ODBC/python-native drivers that are quite often awful or use JDBC ones but have 
a high cost of serialization between the JVM and the Python objects. By using 
Arrow, we should be able to use the good JDBC drivers from Python without the 
normal serialization overhead.

We’re looking at SQL-engines that work on distributed filesystem in general at 
the moment (Apache Drill and Presto are two best candidates at the moment) and 
the common pattern is that they have good JDBC drivers but the other connectors 
are not so well maintained or really slow. Currently, Presto is the one of 
biggest interest for me.

For me it seems that having a JDBC<->Arrow adapter already yields a significant 
performance impact in comparison to the current situation. And it will also 
give the speedup independent of the underlying DB.

 


was (Author: atul_dambalkar):
Comments from Uwe Korn - 

My main plan was to make JDBC drivers accessible very fast from Python / Pandas 
programs. Currently, you either have the option for most DBs to either use 
ODBC/python-native drivers that are quite often awful or use JDBC ones but have 
a high cost of serialization between the JVM and the Python objects. By using 
Arrow, we should be able to use the good JDBC drivers from Python without the 
normal serialization overhead.

We’re looking at SQL-engines that work on distributed filesystem in general at 
the moment (Apache Drill and Presto are two best candidates at the moment) and 
the common pattern is that they have good JDBC drivers but the other connectors 
are not so well maintained or really slow. Currently, Presto is the one of 
biggest interest for me.

For me it seems that having a JDBC<->Arrow adapter already yields a significant 
performance impact in comparison to the current situation. And it will also 
give the speedup independent of the underlying DB.

 

> JDBC Adapter for Apache Arrow
> -----------------------------
>
>                 Key: ARROW-1780
>                 URL: https://issues.apache.org/jira/browse/ARROW-1780
>             Project: Apache Arrow
>          Issue Type: New Feature
>            Reporter: Atul Dambalkar
>            Priority: Major
>
> At a high level the JDBC Adapter will allow upstream apps to query RDBMS data 
> over JDBC and get the JDBC objects converted to Arrow objects/structures. The 
> upstream utility can then work with Arrow objects/structures with usual 
> performance benefits. The utility will be very much similar to C++ 
> implementation of "Convert a vector of row-wise data into an Arrow table" as 
> described here - 
> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> The utility will read data from RDBMS and covert the data into Arrow 
> objects/structures. So from that perspective this will Read data from RDBMS, 
> If the utility can push Arrow objects to RDBMS is something need to be 
> discussed and will be out of scope for this utility for now. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to